Conference PaperPDF Available

Contextual Importance and Utility in R: the 'ciu' Package

Authors:

Abstract and Figures

Contextual Importance and Utility (CIU) are concepts for outcome explanation of any regression or classification model. CIU is model-agnostic, produces explanations without building any intermediate interpretable model and supports explanations on any level of abstraction. CIU is also theoretically simple and it is computation-efficient. However, the lack of CIU software that is easy to install and use appears to be a main obstacle for the interest in and adoption of CIU by the Explainable AI community. The ciu package in R that is described in this paper is intended to overcome that obstacle at least for problems involving tabular data. Contextual Importance and Utility (CIU) were proposed by Kary Främling in 1995 as concepts that would allow explaining recommendations or outcomes of decision support systems (DSS) to domain specialists as well as to non-specialists (Främling and Graillot 1995; Främling 1996). The real-world use case considered was to select a waste disposal site for ultimate industrial waste in the region of Rhône-Alpes in France, among the thousands of possible candidates. A first challenge to solve was how to build a good DSS that represents a good compromise between the preferences of all stakeholders. The next challenge was how the results of the DSS could be justified and explained to decision makers as well as inhabitants who were living close to the proposed site. Several DSS models with different underlying principles were developed and tested, notably weighted sum, Analytic Hierarchy Process (AHP), Elec-tre I and a classical rule-based system. An approach using Neural Networks (NN) was also used, where the goal was to build the DSS model based on learning from data from existing sites, which then became a challenge of explaining black-box behaviour (Främling 2020a). The experience gained from that project allowed to identify the following requirements for the Explainable AI (XAI) method to develop: 1. It has to be model-agnostic. 2. It has to support different levels of abstraction because all detailed inputs used by the DSS do not make sense to all target explainees. 3. The vocabulary used for producing explanations has to be independent from the internal operation of the black-box
Content may be subject to copyright.
Contextual Importance and Utility in R: the ‘ciu’ Package
Kary Fr¨
amling12
1Department of Computing Science, Ume˚
a University, Mit-huset, 901 87 Ume˚
a, Sweden
2Department of Computer Science, Aalto University, Konemiehentie 1, 02150 Espoo, Finland
kary.framling@umu.se, kary.framling@aalto.fi
Abstract
Contextual Importance and Utility (CIU) are concepts for
outcome explanation of any regression or classification
model. CIU is model-agnostic, produces explanations with-
out building any intermediate interpretable model and sup-
ports explanations on any level of abstraction. CIU is also
theoretically simple and it is computation-efficient. However,
the lack of CIU software that is easy to install and use ap-
pears to be a main obstacle for the interest in and adoption of
CIU by the Explainable AI community. The ciu package in
R that is described in this paper is intended to overcome that
obstacle at least for problems involving tabular data.
Contextual Importance and Utility (CIU) were proposed by
Kary Fr¨
amling in 1995 as concepts that would allow ex-
plaining recommendations or outcomes of decision support
systems (DSS) to domain specialists as well as to non-
specialists (Fr¨
amling and Graillot 1995; Fr¨
amling 1996).
The real-world use case considered was to select a waste
disposal site for ultimate industrial waste in the region of
Rhˆ
one-Alpes in France, among the thousands of possible
candidates. A first challenge to solve was how to build a
good DSS that represents a good compromise between the
preferences of all stakeholders. The next challenge was how
the results of the DSS could be justified and explained to de-
cision makers as well as inhabitants who were living close
to the proposed site. Several DSS models with different
underlying principles were developed and tested, notably
weighted sum, Analytic Hierarchy Process (AHP), Elec-
tre I and a classical rule-based system. An approach using
Neural Networks (NN) was also used, where the goal was
to build the DSS model based on learning from data from
existing sites, which then became a challenge of explain-
ing black-box behaviour (Fr¨
amling 2020a). The experience
gained from that project allowed to identify the following re-
quirements for the Explainable AI (XAI) method to develop:
1. It has to be model-agnostic.
2. It has to support different levels of abstraction because all
detailed inputs used by the DSS do not make sense to all
target explainees.
3. The vocabulary used for producing explanations has to be
independent from the internal operation of the black-box
Copyright © 2021, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
because different users have different background knowl-
edge, so the used vocabulary, visualisation or other ways
of explanation should be adapted accordingly.
4. Inputs are typically not independent of each others in real-
world cases, i.e. the value of one input can modify the
importance of other inputs, as well as the utility of values
of other inputs. Therefore, the method must be situation-
or context-aware.
The Collins dictionary defines importance as follows:
‘The importance of something is its quality of being sig-
nificant, valued, or necessary in a particular situation’. The
Context in CIU corresponds to the ‘in a particular situa-
tion’ part of this definition. So, by definition, ‘importance’ is
context-dependent. When dealing with black-box systems,
the word ‘something’ signifies an input feature or combina-
tion of input features. Therefore, CI could also have been
called ‘Contextual Feature Importance’.
However, ‘importance’ does NOT express positive or neg-
ative judgements. Something like ‘good importance’, ‘bad
importance’, ‘typical importance’, etc., does not exist. Ad-
jectives such as ‘good’, ‘bad’, ‘typical’, ‘favorable’, etc., are
used for expressing judgments about feature values, which
leads us to the notion of value utility and utility function.
The Collins dictionary defines a utility function as ‘a func-
tion relating specific goods and services in an economy to
individual preferences’. In CIU, CU expresses to what ex-
tent the current feature value(s) contribute to a high output
value, i.e what is the utility of the input value for achiev-
ing a high output value. Therefore, CU could also have been
called ‘Contextual Value Utility’.
The next section provides the definition of CIU and im-
plementation details. The following sections show results
on classification and regression tasks with continuous- and
discrete-valued inputs. Section 5 shows how to construct ex-
planations using vocabularies and different levels of abstrac-
tion, followed by Conclusion.
1 Contextual Importance and Utility (CIU)
This presentation of CIU is based on the one in (Fr¨
amling
2020b) and reuses the definitions found there, together
with extensions and implementation details for the ‘ciu’
package. The ‘ciu’ package is available at https://github.
com/KaryFramling/ciu1. The most recent development ver-
sion can be installed from there directly by the usual
devtools() commands, as explained in the repository.
Version 0.1.0, which is described here, is available from
CRAN since November 20th, 2020, at https://cran.r-project.
org/web/packages/ciu and can be installed as an ordinary R
package using the install_packages() function. We
begin with basic definitions for describing CIU.
Definition 1 (Black-box model).A black-box model is a
mathematical transformation fthat maps inputs #»
xto out-
puts #»
yaccording to #»
y=f(#»
x).
Definition 2 (Context).A Context
#»
Cdefines the input val-
ues #»
xthat describe the current situation or instance to be
explained.
Definition 3 (Pre-defined output range).The value range
[absminj, absmaxj]that an output yjcan take by defini-
tion.
In classification tasks, the Pre-defined output range is typ-
ically [0,1]. In regression tasks the minimum and maximum
output values in a training set often provide a good estimate
of [absminj, absmaxj].
Definition 4 (Set of studied inputs for CIU).The index set
{i}defines the indices of inputs #»
xfor which CIU is calcu-
lated.
Definition 5 (Estimated output range).
[Cminj(
#»
C , {i}), Cmaxj(
#»
C , {i})] is the range of val-
ues that an output yjcan take in the Context
#»
Cwhen
modifying the values of inputs x{i}.
Definition 6 (Contextual Importance).Contextual Impor-
tance CIj(
#»
C , {i})expresses to what extent variations in one
or several inputs {i}affect the value of an output jof a
black-box model f, according to
CIj(
#»
C , {i}) = C maxj(
#»
C , {i})Cminj(
#»
C , {i})
absmaxjabsminj
(1)
Definition 7 (Contextual Utility).Contextual Utility
CUj(
#»
C , {i})expresses to what extent the current input val-
ues
#»
Care favorable for the output yj(
#»
C)of a black-box
model, according to
CUj(
#»
C , {i}) = yj(
#»
C)Cminj(
#»
C , {i})
Cmaxj(
#»
C , {i})Cminj(
#»
C , {i})(2)
CU values are in the range [0,1] by definition. CU = 0
signifies that the current x{i}value(s) is the least favorable
for the studied output and CU = 1 signifies that the x{i}
value(s) is the most favorable for the studied output. In the
CIU bar plots shown in the following sections, a ‘neutral’
CU of 0.5corresponds to a yellow colour.
1A Python CIU package is available at https://github.com/
TimKam/py-ciu and described in (Anjomshoae, Kampik, and
Fr¨
amling 2020). ”R” and Python implementations are developed
in the same team but independently of each other. The underlying
CIU method is identical but the packages provide different kinds
of visualisation and explanation mechanisms. Implementation de-
tails might also differ, such as how the ‘Set of representative input
vectors’ is generated.
Algorithm 1: Set of representative input vectors
Result: N×Mmatrix S(
#»
C , {i})
1begin
2forall discrete inputs do
3Dall possible value combinations for
discrete inputs {i};
4Randomize row order in D;
5if Dhas more rows than Nthen
6Set Nto number of rows in D;
7end
8end
9forall continuous-valued inputs do
10 Initialize N×Mmatrix Rwith current input
values
#»
C;
11 Rtwo rows per continuous-valued inputs
in {i}where the current value is replaced by
the values min{i}and max{i}respectively;
12 Rfill remaining rows to Nwith random
values from intervals [min{i}, max{i}];
13 end
14 S(
#»
C , {i})concatenation of
#»
C,Dand R,
where Dis repeated if needed to obtain Nrows;
15 end
Definition 8 (Intermediate Concept).An Intermediate Con-
cept names a given set of inputs {i}.
CIU can be estimated for any set of inputs {i}. Interme-
diate concepts make it possible to specify vocabularies that
can be used for producing explanations on any level of ab-
straction. In addition to using Intermediate Concepts for ex-
plaining yj(
#»
C)values, Intermediate Concept values can be
further explained using more specific Intermediate Concepts
or input features. The following defines Generalized Contex-
tual Importance for explaining Intermediate Concepts.
Definition 9 (Generalized Contextual Importance).
CIj(
#»
C , {i},{I}) = Cmaxj(
#»
C , {i})Cminj(
#»
C , {i})
Cmaxj(
#»
C , {I})Cminj(
#»
C , {I})
(3)
where {I}is the set of input indices that correspond to the
Intermediate Concept that we want to explain and {i} ∈
{I}.
Equation 3 is similar to Equation 1 when {I}is the set
of all inputs, i.e. the range [absminj, absmaxj]has been
replaced by the range [Cminj(
#»
C , {I}, Cmaxj(
#»
C , {I}].
Equation 2 for CU does not change by the introduction of
Intermediate Concepts. In other words, Equation 3 allows
the explanation of the outputs yj(
#»
C)as well as the explana-
tion of any Intermediate Concept that leads to yj(
#»
C).
The range [Cminj(
#»
C , {i}), Cmaxj(
#»
C , {i})] is the only
part of CIU that cannot be calculated directly. A model-
agnostic approach is to generate a Set of representative input
vectors.
Figure 1: CIU explanation for instance #100 of Iris data set,
including input/output plots for ‘Petal Length’.
Definition 10 (Set of representative input vectors).
S(
#»
C , {i})is an N×Mmatrix, where Mis the length of
#»
xand Nis a parameter that gives the number of input vec-
tors to generate for obtaining an adequate estimate of the
range [Cminj(
#»
C , {i}), Cmaxj(
#»
C , {i})].
Algorithm 1 shows how the Set of representative input
vectors is created in the current implementation. Nis the
only adjustable parameter with a default value N= 100,
which should be sufficient for most cases with only one in-
put. The choice of Nis a compromise between calculation
speed and desired accuracy.
If the input value intervals [min{i}, max{i}]are not pro-
vided as parameters, then they are retrieved from the training
data set by column-wise min and max operations. There
is an obvious risk that this approach generates input value
combinations that are impossible in reality. Whether that is
a problem or not depends on how the black-box model be-
haves in such cases. In practice, this has not been a problem
with the data sets studied so far, of which some are shown in
the next sections. If such problems occur, then they can be
dealt with in many different ways, however that remains out
of the scope of this paper.
2 Classification with continuous-valued
inputs
The Iris data set is used for this category mainly because the
limits between the different Iris classes require highly non-
linear models for correctly estimating the probability of the
three classes for each studied instance. Figure 1 shows a CIU
explanation generated for the ‘lda’ model and instance num-
ber 100 of Iris data set. The model was trained and figures
were generated by the following code:
# No t r a i n i n g /t e s t s e t n e ed e d h e r e
i r i s t r a i n <i r i s [ , 1 : 4 ]
i r i s l a b <i r i s $Species
model <l da ( i r i s t r a i n , i r i s l ab )
c i u <c i u . new (m od el , Species ˜. , i r i s )
c i u $ggplot . c o l . c i u ( i r i s [ 1 0 0 , 1 : 4 ] )
The output values as a function of one input were gen-
erated by calling ciu$plot.ciu(). All other presented
results have been generated in the same way.
The CIU explanation clearly shows that Petal
Length&Width are the most important features sepa-
rating ‘versicolor’ and ‘virginica’, where changing the value
of either one would also change the classification. This is
even the case for ‘setosa’, where a significantly smaller
Petal Length value would change the result to ‘setosa’.
3 Regression with continuous-valued inputs
Figure 2: CIU explanation for instance #370 of Boston
Housing data set, including input/output plots for features
‘lstat’, ‘rm’ and ‘crim’.
The Boston Housing data provides a task of estimating the
median value of owner-occupied homes in $1000’s, based
on continuous-valued inputs. It is a non-linear regression
task that is frequently used for demonstrating results of
XAI methods. Figure 2 shows CIU results using a Gradient
Boosting Machine (gbm) model. The studied instance #370
is a very expensive one (50 k$) so the bar plot visualisa-
tion should be dominantly green, i.e. have favorable values
at least for the most important features. This is indeed the
case. However, there are also some exceptions and notably
the number of rooms (‘rm’) is only average for this instance
(rm = 6.683), even though such a value would be very fa-
vorable for most cheaper homes.
Figure 3: CIU explanation for Car instance #1098.
4 Discrete inputs
The UCI Cars Evaluation data set (https://archive.ics.uci.
edu/ml/datasets/car+evaluation) evaluates how good differ-
ent cars are based on six discrete-valued input features.
There are four different output classes: ‘unacc’, ‘acc’, ‘good’
and ‘vgood’. This signifies that both inputs and output
are discrete-valued. Figure 3 shows the basic results for a
‘vgood’ car (instance #1098). The model is Random Forest.
CIU indicates that this car is ‘vgood’ because it has very
good values for all important criteria. Having only two doors
is less good but it is also a less important feature. In general,
the CIU visualisation is well in line with the output value for
all classes.
5 Abstraction Levels and Vocabularies
Using Intermediate Concepts, explanations can have differ-
ent levels of detail and use different vocabularies. The initial
results of the Cars data set were produced by the data set
authors using a rule set that uses the intermediate concepts
‘PRICE’, ‘COMFORT’ and ‘TECH’, as reported in (Bo-
hanec and Rajkoviˇ
c 1988). The corresponding vocabulary
is defined as follows in ‘ciu’
price < c ( 1 , 2 )
comfort < c ( 3 , 4 , 5 )
t e c h < c ( comfort , 6)
car < c ( p r i c e , t e c h )
voc < l i s t ( ” PRI CE ”= p r i c e , ”COMFORT”= c o m fo r t ,
”TECH” = t e c h , ”CAR”= c a r )
The vocabulary is provided as a parameter to
the ciu.new() function. Explaining the out-
put value using intermediate concepts PRICE and
TECH is done by giving the parameter value
concepts.to.explain=c("PRICE","TECH")to
the ggplot.col.ciu() method. If the explanation is
for an intermediate concept rather than for the final result,
then the ‘target.concept’ parameter is used as in
c i u $ggplot . c o l . c i u ( c a r s . i n s t ,
i n d . i n p u t s = vo c $PRICE ,
t a r g e t . c on c e pt = ” PRICE ” )
Figure 4: Car explanations using intermediate concepts.
Figure 5: Car explanations after modifications.
The corresponding CIU explanations are shown in Fig-
ure 4. Figure 5 shows that the result changes from ‘vgood’
to ‘acc’ when changing the value of ‘safety’ to ‘med’ to-
gether with the corresponding TECH explanation. It is clear
that if ‘safety’ is only ‘med’, then the car can’t be ‘vgood’,
no matter how good the other features are.
6 Conclusion
Contextual Importance and Utility make it possible to ex-
plain results of ‘any’ AI system without constructing an
intermediate, interpretable model for explanation. Further-
more, CIU can provide explanations with any level of ab-
straction and using semantics that are independent of (or
at least loosely-coupled with) the internal mechanisms of
the AI system. The ‘ciu’ package is intended to allow re-
searchers to apply CIU to all kinds of data sets and problems
and to assess how useful the explanations are.
Work is ongoing on the use of CIU for image recognition
and saliency maps and the results are promising. Those func-
tionalities will be published in a new version of this package,
or as separate packages.
7 Acknowledgments
The work is partially supported by the Wallenberg AI, Au-
tonomous Systems and Software Program (WASP) funded
by the Knut and Alice Wallenberg Foundation..
References
Anjomshoae, S.; Kampik, T.; and Fr¨
amling, K. 2020. Py-
CIU: A Python Library for Explaining Machine Learning
Predictions Using Contextual Importance and Utility. In
Proceedings :. URL https://sites.google.com/view/xai2020/
home. Conference postponed from July 2020 to preliminary
January 2021. .
Bohanec, M.; and Rajkoviˇ
c, V. 1988. Knowledge Acquisi-
tion and Explanation for Multi-Attribute Decision. In 8th
International Workshop on Expert Systems and Their Appli-
cations, Avignon, France, 59–78.
Fr¨
amling, K. 1996. Mod ´
elisation et apprentissage des
pr´
ef´
erences par r´
eseaux de neurones pour l’aide `
a la
d´
ecision multicrit`
ere. Phd thesis, INSA de Lyon. URL
https://tel.archives-ouvertes.fr/tel-00825854.
Fr¨
amling, K. 2020a. Decision Theory Meets Explainable
AI. In Calvaresi, D.; Najjar, A.; Winikoff, M.; and Fr¨
amling,
K., eds., Explainable, Transparent Autonomous Agents and
Multi-Agent Systems, 57–74. Cham: Springer International
Publishing. ISBN 978-3-030-51924-7.
Fr¨
amling, K. 2020b. Explainable AI without Interpretable
Model. URL https://arxiv.org/abs/2009.13996.
Fr¨
amling, K.; and Graillot, D. 1995. Extracting Expla-
nations from Neural Networks. In ICANN’95 Confer-
ence. Paris, France. URL https://hal-emse.ccsd.cnrs.fr/
emse-00857790.
Chapter
Plant diseases are one of the biggest challenges faced by the agricultural sector due to the damage and economic losses in crops. Despite the importance, crop disease diagnosis is challenging because of the limited-resources farmers have. Subsequently, the early diagnosis of plant diseases results in considerable improvement in product quality. The aim of the proposed work is to design an ML-powered mobile-based system to diagnose and provide an explanation based remedy for the diseases in grape leaves using image processing and explainable artificial intelligence. The proposed system will employ the computer vision empowered with Machine Learning (ML) for plant disease recognition and explains the predictions while providing remedy for it. The developed system uses Convolutional Neural networks (CNN) as an underlying machine/deep learning engine for classifying the top disease categories and Contextual Importance and Utility (CIU) for localizing the disease areas based on prediction. The user interface is developed as an IOS mobile app, allowing farmers to capture a photo of the infected grape leaves. The system has been evaluated using various performance metrics such as classification accuracy and processing time by comparing with different state-of-the-art algorithms. The proposed system is highly compatible with the Apple ecosystem by developing IOS app with high prediction and response time. The proposed system will act as a prototype for the plant disease detector robotic system.
Chapter
Full-text available
Different explainable AI (XAI) methods are based on different notions of ‘ground truth’. In order to trust explanations of AI systems, the ground truth has to provide fidelity towards the actual behaviour of the AI system. An explanation that has poor fidelity towards the AI system’s actual behaviour can not be trusted no matter how convincing the explanations appear to be for the users. The Contextual Importance and Utility (CIU) method differs from currently popular outcome explanation methods such as Local Interpretable Model-agnostic Explanations (LIME) and Shapley values in several ways. Notably, CIU does not build any intermediate interpretable model like LIME, and it does not make any assumption regarding linearity or additivity of the feature importance. CIU also introduces the value utility notion and a definition of feature importance that is different from LIME and Shapley values. We argue that LIME and Shapley values actually estimate ‘influence’ (rather than ‘importance’), which combines importance and utility. The paper compares the three methods in terms of validity of their ground truth assumption and fidelity towards the underlying model through a series of benchmark tasks. The results confirm that LIME results tend not to be coherent nor stable. CIU and Shapley values give rather similar results when limiting explanations to ‘influence’. However, by separating ‘importance’ and ‘utility’ elements, CIU can provide more expressive and flexible explanations than LIME and Shapley values.
Chapter
Full-text available
Many techniques have been proposed in recent years that attempt to explain results of image classifiers, notably for the case when the classifier is a deep neural network. This paper presents an implementation of the Contextual Importance and Utility method for explaining image classifications. It is an R package that can be used with the most usual image classification models. The paper shows results for typical benchmark images, as well as for a medical data set of gastro-enterological images. For comparison, results produced by the LIME method are included. Results show that CIU produces similar or better results than LIME with significantly shorter calculation times. However, the main purpose of this paper is to bring the existence of this package to general knowledge and use, rather than comparing with other explanation methods.
Conference Paper
Full-text available
In this paper, we present the Py-CIU library, a generic Python tool for applying the Contextual Importance and Utility (CIU) explainable machine learning method. CIU uses concepts from decision theory to explain a machine learning model's prediction specific to a given data point by investigating the importance and usefulness of individual features (or feature combinations) to a prediction. The explanations aim to be intelligible to machine learning experts as well as non-technical users. The library can be applied to any black-box model that outputs a prediction value for all classes.
Knowledge Acquisition and Explanation for Multi-Attribute Decision
  • M Bohanec
  • V Rajkovič
Bohanec, M.; and Rajkovič, V. 1988. Knowledge Acquisition and Explanation for Multi-Attribute Decision. In 8th International Workshop on Expert Systems and Their Applications, Avignon, France, 59-78.
Modélisation et apprentissage des préférences par réseaux de neurones pour l'aideà la décision multicritère
  • K Främling
Främling, K. 1996. Modélisation et apprentissage des préférences par réseaux de neurones pour l'aideà la décision multicritère. Phd thesis, INSA de Lyon. URL https://tel.archives-ouvertes.fr/tel-00825854.
Explainable AI without Interpretable Model
  • K Främling
Främling, K. 2020b. Explainable AI without Interpretable Model. URL https://arxiv.org/abs/2009.13996.
Extracting Explanations from Neural Networks
  • K Främling
  • D Graillot
Främling, K.; and Graillot, D. 1995. Extracting Explanations from Neural Networks. In ICANN'95 Conference. Paris, France. URL https://hal-emse.ccsd.cnrs.fr/ emse-00857790.