PresentationPDF Available

Khiops Interpretation

Authors:

Abstract

This a talk about some insights of technical aspects of Khiops Interpretation for the Inria Team 'Lacodam' - March 2021 you may find other details about this tool on http://vincentlemaire-labs.fr/iki.html
Khiops Interpretation
For LACODAM
03/29/2021
Vincent Lemaire
(draft slides just for discussion)
2
Outline
Interpretation ?
Khiops ?
Which measure ?
The used one and it’s variants
The special case of the Naive Bayes
Counterfactual ?
Link with the invited talk at EGC MJ Lesot
On the interest of an « actionable » output (file)
Thank you
3
Interpretation
Global, by Groups, Local ?
Variable
importance for the
“model”
Variable
importance for a
given population
(to define)
Variable
importance a
specific instance
(this talk)
but…
4
Interpretation
WHY and HOW
(two different notions)
Why the model outputs
this value?
How “one” could change
the model’s outputs?
(counterfactual?)
Inside Khiops
Interpretation
(parts of this ‘talk”)
There are several ways to do local interpretation with 2
main trends based on variables important for the decision
or from examples. Here it is specified that we are interested
in variable based methods more details later
5
Outline
Interpretation ?
Khiops ?
Which measure ?
The used one and it’s variants
The special case of the Naive Bayes
Counterfactual ?
Link with the invited talk at EGC MJ Lesot
On the interest of a « actionable » output (file)
Thank you
6
Khiops
Two main steps
1. Preparation step:
1. Discretization of numerical attributes
2. Values grouping for categorical attributes
(for both a preparation cost is computed)
So after this step the Naïve Bayes classifieur sees intervals and value groups
Base Iris
0
2
4
6
8
10
12
14
2.0 2.5 3.0 3.5 4.0
Sepal width
Instances
Versicolor
Virginica
Setosa
RED
YELLOW
BUFF
PINK
BROWN
GRAY
GREEN
PURPLE
WHITE
CINNAMON
G_RED G_BROWN
G_GRAY
G_GREEN
G_WHITE
Discretization :
“M. Boullé. MODL: a Bayes optimal discretization method for continuous attributes. In Machine Learning, 65(1):131-165, 2006
Value grouping :
M. Boullé. A Grouping Method for Categorical Attributes Having Very Large Number of Values.
In Proceedings of the Fourth International Conference on Machine Learning and Data Mining in Pattern Recognition,
P. Perner, A. Imiya (eds.), LNAI, Volume 3587, Pages 228-242, 2005
7
Khiops
Two main steps
1. Preparation step:
1. Discretization of numerical attributes
2. Values grouping for categorical attributes
(for both a preparation cost is computed)
So after this step the Naïve Bayes classifieur sees intervals and value groups
1. Weighing of attributes
in a “Selective Naive Bayes”
Base Iris
0
2
4
6
8
10
12
14
2.0 2.5 3.0 3.5 4.0
Sepal width
Instances
Versicolor
Virginica
Setosa
RED
YELLOW
BUFF
PINK
BROWN
GRAY
GREEN
PURPLE
WHITE
CINNAMON
G_RED G_BROWN
G_GRAY
G_GREEN
G_WHITE
Weights computed with:
M. Boullé. Compression-Based Averaging of Selective Naive Bayes Classifiers. Journal of Machine Learning Research, 8:1659-1685, 2007.
8
WHY ?
9
Outline
Interpretation ?
Khiops ?
Which measure ?
The used one and it’s variants
The special case of the Naive Bayes
Counterfactual ?
Link with the invited talk at EGC MJ Lesot
On the interest of a « actionable » output (file)
Thank you
10
Saliency (what if simulation)
Principle of the sensitivity analysis :
We observe the difference between
the output of the model for an
instance and the output of the model
by varying the values of variable of
interest
sensitivity analysis methods are applicable to all models
Main methods for measuring sensitivity:
Replacement by the mean of the variable
Calculation or approximation of the partial derivative of
the variable
Difference between the Min and Max of the variable
Integral of the variations of the variable
Integral of the variations of the distribution of the variable
11
Saliency (what if simulation)
Main methods for measuring sensitivity:
Replacement ()
Replacement by the mean of the variable () (…), or by 0 () (no importance)
Calculation or approximation of the partial derivative of the variable () (positive importance)
Difference between the Min and Max of the variable () (no importance)
Integral of the variations of the variable
Integral of the variations of the distribution of the variable
12
Saliency (what if simulation)
Main methods for measuring sensitivity:
Replacement ()
Replacement by the mean of the variable () (…), or by 0 () (no importance)
Calculation or approximation of the partial derivative of the variable () (positive importance)
Difference between the Min and Max of the variable () (no importance)
Integral of the variations of the variable
Integral of the variations of the distribution of the variable
The integral of the variations of
the variable allows to identify the
importance of the variable, if the
distribution of the variations is
uniform
13
Saliency (what if simulation)
Main methods for measuring sensitivity:
Replacement ()
Replacement by the mean of the variable () (…), or by 0 () (no importance)
Calculation or approximation of the partial derivative of the variable () (positive importance)
Difference between the Min and Max of the variable () (no importance)
Integral of the variations of the variable
Integral of the variations of the distribution of the variable
The integral of the variations of
the variable allows to identify the
importance of the variable, if the
distribution of the variations is
uniform
14
Outline
Interpretation ?
Khiops ?
Which measure ?
The used one and it’s variant
The special case of the Naive Bayes
Counterfactual ?
Link with the invited talk at EGC MJ Lesot
On the interest of an « actionable » output (file)
Thank you
15
The used ‘saliency’
The integral of the variations of
the variable allows to identify the
importance of the variable, if the
distribution of the variations is
uniform
Something like the difference of model output with and
‘without’ the variable of interest
complexity !
complexity ! But
we need
hopefully thank to preprocessing we have them
16
The used ‘saliency’ variable importance in
khiops interpretation
How to use this quantity ? SOTA … a lot of possibilities
How to use the ‘Output’ without Vj ?
which one is the best ?
Depend on YOUR sensitivity ?
17
The used ‘saliency’ variable importance in
Khiops interpretation
By default in KI :
R=A/B
Normalization of the odds ratio (R) Two main possibilities:
P = A/(A+B) = R/(R+1) which is always between 0 and 1
Q = (A-B)/(A+B) = (R-1)/(R+1) which is always between -1 and +1
Q in KI : Because if you take the "Weight of Evidence" of Moore (range of value between ]-
,+ [) and you pass it into a Sigmoid function (to have a range between ]-1,+1[ then you
obtain the Q indicator. By this way you keep the symmetry of this indicator (negative value
= negative influence, positive value= positive influence, which seems more ‘interpretable’).
18
The used ‘saliency’ variable importance in
khiops interpretation
By default in KI :
But feel free to change …
(11 indicators are in KI)
All of them are in the doc… of the tool
19
Outline
Interpretation ?
Khiops ?
Which measure ?
The used one and it’s variant
The special case of the Naive Bayes
Counterfactual ?
Link with the invited talk at EGC MJ Lesot
On the interest of a « actionable » output (file)
Thank you
20
The special case of the Naive Bayes (exact computation)
  
 
 

21
The special case of the Naive Bayes
  
 
 

      
 
 
 

22
The special case of the Naive Bayes
  
 
 

      
 
 
 

    =   



    =   
23
The special case of the Naive Bayes
  
 
 

      
 
 
 

    =   



    =   
    
   
 
 =  
24
Why ? Khiops interpretation output
25
Outline
Interpretation ?
Khiops ?
Which measure ?
The used one and it’s variant
The special case of the Naive Bayes
Counterfactual ?
Link with the invited talk at EGC MJ Lesot
On the interest of a « actionable » output (file)
Thank you
26
HOW ?
27
Counterfactual ?
[2009] "Correlation Explorations in a Classification Model", Vincent Lemaire, Carine Hue, Olivier Bernier in the
workshop Data Mining Case Studies and Practice Prize, SIGKDD 2009
[2010] "Correlation Analysis in Classifiers", Vincent Lemaire, Carine Hue, Olivier Bernier, in "Handbook of Research
on Data Mining in Public and Private Sectors: Organizational and Government Applications"
We failed to use the right term (‘counterfactual’), too bad!
28
Counterfactual ?
Here the objective is to obtain a
prediction which will different
Def.: “A counterfactual
explanation of a prediction
describes the smallest change
to the feature values that
changes the prediction to a
predefined output”
We will relax a little bit this
objective later in this talk
We do not give here a SOTA see
publications on that topic …
29
Counterfactual ?
Khiops Interpretation :
uniform cost
actionable attributs has to
be defined in the interface
Correlations
exact (?) optimization thank
to the NB
30
Counterfactual ?
For a specific instance.
third instance in the example.
For the targeted class.
class A in the example.
V1 V2 V3 Class
A=0.99
B=0.87
A=0.53
A=0.67
Explanatory
variables Classifier
output
Example for the third instance and the explanatory variable
'V2'
31
Counterfactual ?
For a specific instance.
third instance in the example.
For the targeted class.
class A in the example.
V1 V2 V3 Class
A=0.99
B=0.87
A=0.53
A=0.67
Explanatory
variables Classifier
output
Example for the third instance and the explanatory variable
'V2'
The score is computed:
for all explanatory variables
for all possible values of the
selected explanatory variable.
In the example:
for V1 :
for V2 :
for V3 :
V1 V2 V3 Class
A=0.99
B=0.87
A=0.51
A=0.67
V1 V2 V3 Class
A=0.99
B=0.87
A=0.67
A=0.67
V2 V2
32
Counterfactual ?
V1 V2 V3 Class
A=0.99
B=0.87
A=0.53
A=0.67
Explanatory
variables Classifier
output
Example for the third instance and the explanatory variable
'V2'
V1 V2 V3 Class
A=0.99
B=0.87
A=0.51
A=0.67
V1 V2 V3 Class
A=0.99
B=0.87
A=0.67
A=0.67
V2 V2
The set of values leading to
a score increase are
retained.
In the example for the variable V2:
A=0.53 A=0.67
33
Counterfactual ?
Thanks to the preprocessing:
a limited number of values to test
number of intervals (numerical variables), number of groups (categorical variables)
Thanks to the Selective Naïve Bayes” :
A ‘controlled’ complexity (as for importance computation above)
  
 

 where   
  

So Changing the value of one variable to another results in one subtraction and one addition
34
Outline
Interpretation ?
Khiops ?
Which measure ?
The used one and it’s variant
The special case of the Naive Bayes
Counterfactual ?
Link with the invited talk at EGC MJ Lesot
On the interest of an « actionable » output (file)
Thank you
35
Link with the invited talk at EGC MJ Lesot (2d part of her talk)
(https://www.youtube.com/watch?v=hOHrt80HKeM&list=UUvNbyyty3s9iVXHYic3AKGw&index=19)
(1)
    
 =  
(1)
   
 
 
 
Everything is based on the value of  =  
or the change of a value of  =  
Only […] (intervals or values group) are tested which come from the preprocessing step
Preprocessing step provides intervals (or group) piecewise constant given the target ‘class’.
From the ‘MODL’ framework see for example the supervised discretization of numerical attributes :
M. Boullé. MODL: a Bayes optimal discretization method for continuous attributes. Machine Learning, 65(1):131-165, 2006”
we do not test ‘empty parts of the distribution’….
A good characteristic thanks to the preprocessing
36
Link with the invited talk at EGC MJ Lesot (2d part of her talk)
(https://www.youtube.com/watch?v=hOHrt80HKeM&list=UUvNbyyty3s9iVXHYic3AKGw&index=19)
Khiops (Naïve Bayes)
Khiops Interpretation
37
Link with the invited talk at EGC MJ Lesot (2d part of her talk)
(https://www.youtube.com/watch?v=hOHrt80HKeM&list=UUvNbyyty3s9iVXHYic3AKGw&index=19)
Khiops
Khiops Interpretation
we need the classifier
38
Link with the invited talk at EGC MJ Lesot
“Lever variable” ?
V2 V3 Class
A=0.99
B=0.87
A=0.53
A=0.67
Explanatory
variables
The algorithm of correlations exploration allows the
discovery of the important variables for the
target class.
But
In most cases, changing the values of some
explanatory variables (such as sex or age) is
indeed impossible.
The user of the algorithm has to define the 'lever
variables', the important variables which can be
changed.
39
Potential “textual” interpretation
A potential “textual” interpretation
This example belongs to the class ‘A’
because its more important variable are
….
….
But if its variable ‘….’ which has the value ‘…’ change to be
….
Then …
Details in the next slides…
40
Outline
Interpretation ?
Khiops ?
Which measure ?
The used one and it’s variant
The special case of the Naive Bayes
Counterfactual ?
Link with the invited talk at EGC MJ Le Sot
On the interest of an « actionable » output (file)
Thank you
41
Do your best to have actionnable information
Analysis of the results : the output file structure
42
Do your best to have actionnable information
Analysis of the results : the output file structure
For each variable, 4 columns :
the name of the variable
the value of this variable which
increase the probability of the
class to reinforce
the new value of the probability of the
class to reinforce
a “binary value” which indicates if
the line change of “predicted class”
43
Do your best to have actionable information
Analysis of the results : the output file structure
line 2 : the initial probability is 0.65 but if
“Capital Gain” took a value in the interval ]5119;5316.5]
then the probability would be 1
however this line does not change of “predicted class” (the
line was already predicted as “more”)
44
Do your best to have actionable information
Reactive Action (counterfactual) :
An example is detected as belonging to the
class A, the methodology indicates:
which explanatory variables to modify,
the desired values for these variables,
the expected gain in terms of probability of
occurrence of the target class B.
Preventive Action (relaxed counterfactual) :
An example is detected as belonging to the
class B but it is near the boundary, the
methodology indicates:
which explanatory variables to modify,
the desired values for these variables,
the expected gain in terms of probability of
occurrence of the target class B.
Class A
Class B
Class A
Class B
45
Khiops Interpretation outputs
46
Interpretation
Global, by Groups, Local ?
Variable
importance for the
“model”
Variable
importance for a
given population
(to define)
Variable
importance a
specific instance
(this talk)
Idea clustering of examples using their
‘importance’ of ‘counterfactual…
47
Interpretation
Idea clustering of examples using their
‘importance’ of ‘counterfactual…
Variable
Importance
to define
the churn
Variable
Importance
to define
the churn
Cluster 5:
3.64%
Global
Population
48
Time for a demo ?
49
Outline
Interpretation ?
Khiops ?
Which measure ?
The used one and it’s variant
The special case of the Naive Bayes
Counterfactual ?
Link with the invited talk at EGC MJ Le Sot
On the interest of an « actionable » output (file)
Thank you
50
Thank you Our ‘old’ publications (a new interest with the GDPR?)
[2004] "An Input Variable Importance Definition based on Empirical Data Probability and Its Use en Variable
Selection", Vincent Lemaire and Fabrice Clérot, in International Joint Conference on Neural Networks
(IJCNN), 2004
[2008] "Contact Personalization Using a Score Understanding Method", Vincent Lemaire, Raphaël Féraud
and Nicolas Voisine, in International Joint Conference on Neural Networks (IJCNN), 2008
[2009] "Correlation Explorations in a Classification Model", Vincent Lemaire, Carine Hue, Olivier Bernier in
the workshop Data Mining Case Studies and Practice Prize, SIGKDD 2009
[2010] "Correlation Analysis in Classifiers", Vincent Lemaire, Carine Hue, Olivier Bernier, in "Handbook of
Research on Data Mining in Public and Private Sectors: Organizational and Government Applications"
[2010] Exhibition : "KAWAB : Un outil pour explorer les corrélations existantes dans un classifieur naïf de
Bayes", Vincent Lemaire, in RFIA (Reconnaissance des Formes et Intelligence Artificielle), Caen, Fevrier
2010
[2010] "A method to build a representation using a classifier and its use in a K Nearest Neighbors-based
deployment", Vincent Lemaire, Marc Boullé, Fabrice Clérot and Pascal Gouzien, in International Joint
Conference on Neural Networks (IJCNN), 2010
[2012] "A Complete Data Mining process to Manage the QoS of ADSL Services", Françoise Fessant &
Vincent Lemaire in the workshop WAITS 2012 (Workshop on Artificial Intelligence for Telecommunications &
Sensor Networks held at ECAI 2012, Montpellier)
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
This paper presents a method to interpret the output of a classification (or regression) model. The interpretation is based on two concepts: the variable importance and the value importance of the variable. Unlike most of the state of art interpretation methods, our approach allows the interpretation of the model output for every instance. Understanding the score given by a model for one instance can for example lead to an immediate decision in a customer relational management (CRM) system. Moreover the proposed method does not depend on a particular model and is therefore usable for any model or software used to produce the scores.
Conference Paper
Full-text available
Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. We propose a new method to score subsets of variables according to their usefulness for the performance of a given model. This method is applicable on every kind of model and on classification or regression task. We assess the efficiency of the method with our results on the NIPS 2003 feature selection challenge and with an example of a real application.
Olivier Bernier in the workshop Data Mining Case Studies and Practice Prize
  • Vincent Lemaire
  • Carine Hue
[2009] "Correlation Explorations in a Classification Model", Vincent Lemaire, Carine Hue, Olivier Bernier in the workshop Data Mining Case Studies and Practice Prize, SIGKDD 2009