Content uploaded by Vincent Lemaire

Author content

All content in this area was uploaded by Vincent Lemaire on Mar 30, 2021

Content may be subject to copyright.

Khiops Interpretation

For LACODAM

03/29/2021

Vincent Lemaire

(draft slides just for discussion)

2

Outline

•Interpretation ?

•Khiops ?

•Which measure ?

•The used one and it’s variants

•The special case of the Naive Bayes

•Counterfactual ?

•Link with the invited talk at EGC MJ Lesot

•On the interest of an « actionable » output (file)

•Thank you

3

Interpretation

Global, by Groups, Local ?

Variable

importance for the

“model”

Variable

importance for a

given population

(to define)

Variable

importance a

specific instance

(this talk)

but…

4

Interpretation

WHY and HOW

(two different notions)

Why the model outputs

this value?

How “one” could change

the model’s outputs?

(counterfactual?)

Inside Khiops

Interpretation

(parts of this ‘talk”)

There are several ways to do local interpretation with 2

main trends based on variables important for the decision

or from examples. Here it is specified that we are interested

in variable based methods – more details later

5

Outline

•Interpretation ?

•Khiops ?

•Which measure ?

•The used one and it’s variants

•The special case of the Naive Bayes

•Counterfactual ?

•Link with the invited talk at EGC MJ Lesot

•On the interest of a « actionable » output (file)

•Thank you

6

Khiops

•Two main steps

1. Preparation step:

1. Discretization of numerical attributes

2. Values grouping for categorical attributes

(for both a preparation cost is computed)

So after this step the Naïve Bayes classifieur sees intervals and value groups

Base Iris

0

2

4

6

8

10

12

14

2.0 2.5 3.0 3.5 4.0

Sepal width

Instances

Versicolor

Virginica

Setosa

RED

YELLOW

BUFF

PINK

BROWN

GRAY

GREEN

PURPLE

WHITE

CINNAMON

G_RED G_BROWN

G_GRAY

G_GREEN

G_WHITE

Discretization :

“M. Boullé. MODL: a Bayes optimal discretization method for continuous attributes. In Machine Learning, 65(1):131-165, 2006”

Value grouping :

M. Boullé. A Grouping Method for Categorical Attributes Having Very Large Number of Values.

In Proceedings of the Fourth International Conference on Machine Learning and Data Mining in Pattern Recognition,

P. Perner, A. Imiya (eds.), LNAI, Volume 3587, Pages 228-242, 2005

7

Khiops

•Two main steps

1. Preparation step:

1. Discretization of numerical attributes

2. Values grouping for categorical attributes

(for both a preparation cost is computed)

So after this step the Naïve Bayes classifieur sees intervals and value groups

1. Weighing of attributes

in a “Selective Naive Bayes”

Base Iris

0

2

4

6

8

10

12

14

2.0 2.5 3.0 3.5 4.0

Sepal width

Instances

Versicolor

Virginica

Setosa

RED

YELLOW

BUFF

PINK

BROWN

GRAY

GREEN

PURPLE

WHITE

CINNAMON

G_RED G_BROWN

G_GRAY

G_GREEN

G_WHITE

Weights computed with:

M. Boullé. Compression-Based Averaging of Selective Naive Bayes Classifiers. Journal of Machine Learning Research, 8:1659-1685, 2007.

8

WHY ?

9

Outline

•Interpretation ?

•Khiops ?

•Which measure ?

•The used one and it’s variants

•The special case of the Naive Bayes

•Counterfactual ?

•Link with the invited talk at EGC MJ Lesot

•On the interest of a « actionable » output (file)

•Thank you

10

Saliency (what if simulation)

Principle of the sensitivity analysis :

We observe the difference between

the output of the model for an

instance and the output of the model

by varying the values of variable of

interest

sensitivity analysis methods are applicable to all models

Main methods for measuring sensitivity:

•Replacement by the mean of the variable

•Calculation or approximation of the partial derivative of

the variable

•Difference between the Min and Max of the variable

•Integral of the variations of the variable

•Integral of the variations of the distribution of the variable

11

Saliency (what if simulation)

Main methods for measuring sensitivity:

•Replacement ()

•Replacement by the mean of the variable () (…), or by 0 () (no importance)

•Calculation or approximation of the partial derivative of the variable () (positive importance)

•Difference between the Min and Max of the variable () (no importance)

•Integral of the variations of the variable

•Integral of the variations of the distribution of the variable

12

Saliency (what if simulation)

Main methods for measuring sensitivity:

•Replacement ()

•Replacement by the mean of the variable () (…), or by 0 () (no importance)

•Calculation or approximation of the partial derivative of the variable () (positive importance)

•Difference between the Min and Max of the variable () (no importance)

•Integral of the variations of the variable

•Integral of the variations of the distribution of the variable

The integral of the variations of

the variable allows to identify the

importance of the variable, if the

distribution of the variations is

uniform

13

Saliency (what if simulation)

Main methods for measuring sensitivity:

•Replacement ()

•Replacement by the mean of the variable () (…), or by 0 () (no importance)

•Calculation or approximation of the partial derivative of the variable () (positive importance)

•Difference between the Min and Max of the variable () (no importance)

•Integral of the variations of the variable

•Integral of the variations of the distribution of the variable

The integral of the variations of

the variable allows to identify the

importance of the variable, if the

distribution of the variations is

uniform

14

Outline

•Interpretation ?

•Khiops ?

•Which measure ?

•The used one and it’s variant

•The special case of the Naive Bayes

•Counterfactual ?

•Link with the invited talk at EGC MJ Lesot

•On the interest of an « actionable » output (file)

•Thank you

15

The used ‘saliency’

The integral of the variations of

the variable allows to identify the

importance of the variable, if the

distribution of the variations is

uniform

Something like the difference of model output with and

‘without’ the variable of interest

complexity !

complexity ! But

we need

hopefully thank to preprocessing we have them

16

The used ‘saliency’ variable importance in

khiops interpretation

How to use this quantity ? SOTA … a lot of possibilities

How to use the ‘Output’ without Vj ?

which one is the best ?

Depend on YOUR sensitivity ?

17

The used ‘saliency’ variable importance in

Khiops interpretation

By default in KI :

R=A/B

Normalization of the odds ratio (R) Two main possibilities:

P = A/(A+B) = R/(R+1) which is always between 0 and 1

Q = (A-B)/(A+B) = (R-1)/(R+1) which is always between -1 and +1

Q in KI : Because if you take the "Weight of Evidence" of Moore (range of value between ]-

,+ [) and you pass it into a Sigmoid function (to have a range between ]-1,+1[ then you

obtain the Q indicator. By this way you keep the symmetry of this indicator (negative value

= negative influence, positive value= positive influence, which seems more ‘interpretable’).

18

The used ‘saliency’ variable importance in

khiops interpretation

By default in KI :

But feel free to change …

(11 indicators are in KI)

All of them are in the doc… of the tool

19

Outline

•Interpretation ?

•Khiops ?

•Which measure ?

•The used one and it’s variant

•The special case of the Naive Bayes

•Counterfactual ?

•Link with the invited talk at EGC MJ Lesot

•On the interest of a « actionable » output (file)

•Thank you

20

The special case of the Naive Bayes (exact computation)

21

The special case of the Naive Bayes

22

The special case of the Naive Bayes

=

=

23

The special case of the Naive Bayes

=

=

=

24

Why ? Khiops interpretation output

25

Outline

•Interpretation ?

•Khiops ?

•Which measure ?

•The used one and it’s variant

•The special case of the Naive Bayes

•Counterfactual ?

•Link with the invited talk at EGC MJ Lesot

•On the interest of a « actionable » output (file)

•Thank you

26

HOW ?

27

Counterfactual ?

[2009] "Correlation Explorations in a Classification Model", Vincent Lemaire, Carine Hue, Olivier Bernier in the

workshop Data Mining Case Studies and Practice Prize, SIGKDD 2009

[2010] "Correlation Analysis in Classifiers", Vincent Lemaire, Carine Hue, Olivier Bernier, in "Handbook of Research

on Data Mining in Public and Private Sectors: Organizational and Government Applications"

We failed to use the right term (‘counterfactual’), too bad!

28

Counterfactual ?

Here the objective is to obtain a

prediction which will different

Def.: “A counterfactual

explanation of a prediction

describes the smallest change

to the feature values that

changes the prediction to a

predefined output”

We will relax a little bit this

objective later in this talk

We do not give here a SOTA see

publications on that topic …

29

Counterfactual ?

Khiops Interpretation :

•uniform cost

•actionable attributs has to

be defined in the interface

•Correlations

•exact (?) optimization thank

to the NB

30

Counterfactual ?

For a specific instance.

third instance in the example.

For the targeted class.

class A in the example.

V1 V2 V3 Class

A=0.99

B=0.87

A=0.53

A=0.67

Explanatory

variables Classifier

output

Example for the third instance and the explanatory variable

'V2'

31

Counterfactual ?

For a specific instance.

third instance in the example.

For the targeted class.

class A in the example.

V1 V2 V3 Class

A=0.99

B=0.87

A=0.53

A=0.67

Explanatory

variables Classifier

output

Example for the third instance and the explanatory variable

'V2'

The score is computed:

for all explanatory variables

for all possible values of the

selected explanatory variable.

In the example:

for V1 :

for V2 :

for V3 :

V1 V2 V3 Class

A=0.99

B=0.87

A=0.51

A=0.67

V1 V2 V3 Class

A=0.99

B=0.87

A=0.67

A=0.67

V2 V2

32

Counterfactual ?

V1 V2 V3 Class

A=0.99

B=0.87

A=0.53

A=0.67

Explanatory

variables Classifier

output

Example for the third instance and the explanatory variable

'V2'

V1 V2 V3 Class

A=0.99

B=0.87

A=0.51

A=0.67

V1 V2 V3 Class

A=0.99

B=0.87

A=0.67

A=0.67

V2 V2

The set of values leading to

a score increase are

retained.

In the example for the variable V2:

A=0.53 A=0.67

33

Counterfactual ?

Thanks to the preprocessing:

a limited number of values to test

•number of intervals (numerical variables), number of groups (categorical variables)

Thanks to the ‘Selective Naïve Bayes” :

A ‘controlled’ complexity (as for importance computation above)

where

So Changing the value of one variable to another results in one subtraction and one addition

34

Outline

•Interpretation ?

•Khiops ?

•Which measure ?

•The used one and it’s variant

•The special case of the Naive Bayes

•Counterfactual ?

•Link with the invited talk at EGC MJ Lesot

•On the interest of an « actionable » output (file)

•Thank you

35

Link with the invited talk at EGC MJ Lesot (2d part of her talk)

(https://www.youtube.com/watch?v=hOHrt80HKeM&list=UUvNbyyty3s9iVXHYic3AKGw&index=19)

(1)

=

(1)

Everything is based on the value of =

or the change of a value of =

Only […] (intervals or values group) are tested which come from the preprocessing step

Preprocessing step provides intervals (or group) piecewise constant given the target ‘class’.

From the ‘MODL’ framework see for example the supervised discretization of numerical attributes :

“M. Boullé. MODL: a Bayes optimal discretization method for continuous attributes. Machine Learning, 65(1):131-165, 2006”

… we do not test ‘empty parts of the distribution’….

A good characteristic thanks to the preprocessing

36

Link with the invited talk at EGC MJ Lesot (2d part of her talk)

(https://www.youtube.com/watch?v=hOHrt80HKeM&list=UUvNbyyty3s9iVXHYic3AKGw&index=19)

Khiops (Naïve Bayes)

Khiops Interpretation

37

Link with the invited talk at EGC MJ Lesot (2d part of her talk)

(https://www.youtube.com/watch?v=hOHrt80HKeM&list=UUvNbyyty3s9iVXHYic3AKGw&index=19)

Khiops

Khiops Interpretation

we need the classifier

38

Link with the invited talk at EGC MJ Lesot

•“Lever variable” ?

V2 V3 Class

A=0.99

B=0.87

A=0.53

A=0.67

Explanatory

variables

The algorithm of correlations exploration allows the

discovery of the important variables for the

target class.

But…

In most cases, changing the values of some

explanatory variables (such as sex or age) is

indeed impossible.

The user of the algorithm has to define the 'lever

variables', the important variables which can be

changed.

39

Potential “textual” interpretation

•A potential “textual” interpretation

•This example belongs to the class ‘A’

•because its more important variable are

•….

•….

•But if its variable ‘….’ which has the value ‘…’ change to be

•….

•Then …

•Details in the next slides…

40

Outline

•Interpretation ?

•Khiops ?

•Which measure ?

•The used one and it’s variant

•The special case of the Naive Bayes

•Counterfactual ?

•Link with the invited talk at EGC MJ Le Sot

•On the interest of an « actionable » output (file)

•Thank you

41

Do your best to have actionnable information

Analysis of the results : the output file structure

42

Do your best to have actionnable information

Analysis of the results : the output file structure

For each variable, 4 columns :

•the name of the variable

•the value of this variable which

increase the probability of the

class to reinforce

•the new value of the probability of the

class to reinforce

• a “binary value” which indicates if

the line change of “predicted class”

43

Do your best to have actionable information

Analysis of the results : the output file structure

•line 2 : the initial probability is 0.65 but if

•“Capital Gain” took a value in the interval ]5119;5316.5]

•then the probability would be 1

•however this line does not change of “predicted class” (the

line was already predicted as “more”)

44

Do your best to have actionable information

Reactive Action (counterfactual) :

An example is detected as belonging to the

class A, the methodology indicates:

which explanatory variables to modify,

the desired values for these variables,

the expected gain in terms of probability of

occurrence of the target class B.

Preventive Action (relaxed counterfactual) :

An example is detected as belonging to the

class B but it is near the boundary, the

methodology indicates:

which explanatory variables to modify,

the desired values for these variables,

the expected gain in terms of probability of

occurrence of the target class B.

Class A

Class B

Class A

Class B

45

Khiops Interpretation outputs

46

Interpretation

Global, by Groups, Local ?

Variable

importance for the

“model”

Variable

importance for a

given population

(to define)

Variable

importance a

specific instance

(this talk)

Idea clustering of examples using their

‘importance’ of ‘counterfactual…

47

Interpretation

Idea clustering of examples using their

‘importance’ of ‘counterfactual…

Variable

Importance

to define

the churn

Variable

Importance

to define

the churn

Cluster 5:

3.64%

Global

Population

48

Time for a demo ?

49

Outline

•Interpretation ?

•Khiops ?

•Which measure ?

•The used one and it’s variant

•The special case of the Naive Bayes

•Counterfactual ?

•Link with the invited talk at EGC MJ Le Sot

•On the interest of an « actionable » output (file)

•Thank you

50

Thank you – Our ‘old’ publications (a new interest with the GDPR?)

[2004] "An Input Variable Importance Definition based on Empirical Data Probability and Its Use en Variable

Selection", Vincent Lemaire and Fabrice Clérot, in International Joint Conference on Neural Networks

(IJCNN), 2004

[2008] "Contact Personalization Using a Score Understanding Method", Vincent Lemaire, Raphaël Féraud

and Nicolas Voisine, in International Joint Conference on Neural Networks (IJCNN), 2008

[2009] "Correlation Explorations in a Classification Model", Vincent Lemaire, Carine Hue, Olivier Bernier in

the workshop Data Mining Case Studies and Practice Prize, SIGKDD 2009

[2010] "Correlation Analysis in Classifiers", Vincent Lemaire, Carine Hue, Olivier Bernier, in "Handbook of

Research on Data Mining in Public and Private Sectors: Organizational and Government Applications"

[2010] Exhibition : "KAWAB : Un outil pour explorer les corrélations existantes dans un classifieur naïf de

Bayes", Vincent Lemaire, in RFIA (Reconnaissance des Formes et Intelligence Artificielle), Caen, Fevrier

2010

[2010] "A method to build a representation using a classifier and its use in a K Nearest Neighbors-based

deployment", Vincent Lemaire, Marc Boullé, Fabrice Clérot and Pascal Gouzien, in International Joint

Conference on Neural Networks (IJCNN), 2010

[2012] "A Complete Data Mining process to Manage the QoS of ADSL Services", Françoise Fessant &

Vincent Lemaire in the workshop WAITS 2012 (Workshop on Artificial Intelligence for Telecommunications &

Sensor Networks held at ECAI 2012, Montpellier)