Content uploaded by Kary Främling
Author content
All content in this area was uploaded by Kary Främling on Feb 23, 2021
Content may be subject to copyright.
Contextual Importance and Utility in R: the ‘ciu’ Package
Kary Fr¨
amling12
1Department of Computing Science, Ume˚
a University, Mit-huset, 901 87 Ume˚
a, Sweden
2Department of Computer Science, Aalto University, Konemiehentie 1, 02150 Espoo, Finland
kary.framling@umu.se, kary.framling@aalto.fi
Abstract
Contextual Importance and Utility (CIU) are concepts for
outcome explanation of any regression or classification
model. CIU is model-agnostic, produces explanations with-
out building any intermediate interpretable model and sup-
ports explanations on any level of abstraction. CIU is also
theoretically simple and it is computation-efficient. However,
the lack of CIU software that is easy to install and use ap-
pears to be a main obstacle for the interest in and adoption of
CIU by the Explainable AI community. The ciu package in
R that is described in this paper is intended to overcome that
obstacle at least for problems involving tabular data.
Contextual Importance and Utility (CIU) were proposed by
Kary Fr¨
amling in 1995 as concepts that would allow ex-
plaining recommendations or outcomes of decision support
systems (DSS) to domain specialists as well as to non-
specialists (Fr¨
amling and Graillot 1995; Fr¨
amling 1996).
The real-world use case considered was to select a waste
disposal site for ultimate industrial waste in the region of
Rhˆ
one-Alpes in France, among the thousands of possible
candidates. A first challenge to solve was how to build a
good DSS that represents a good compromise between the
preferences of all stakeholders. The next challenge was how
the results of the DSS could be justified and explained to de-
cision makers as well as inhabitants who were living close
to the proposed site. Several DSS models with different
underlying principles were developed and tested, notably
weighted sum, Analytic Hierarchy Process (AHP), Elec-
tre I and a classical rule-based system. An approach using
Neural Networks (NN) was also used, where the goal was
to build the DSS model based on learning from data from
existing sites, which then became a challenge of explain-
ing black-box behaviour (Fr¨
amling 2020a). The experience
gained from that project allowed to identify the following re-
quirements for the Explainable AI (XAI) method to develop:
1. It has to be model-agnostic.
2. It has to support different levels of abstraction because all
detailed inputs used by the DSS do not make sense to all
target explainees.
3. The vocabulary used for producing explanations has to be
independent from the internal operation of the black-box
Copyright © 2021, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
because different users have different background knowl-
edge, so the used vocabulary, visualisation or other ways
of explanation should be adapted accordingly.
4. Inputs are typically not independent of each others in real-
world cases, i.e. the value of one input can modify the
importance of other inputs, as well as the utility of values
of other inputs. Therefore, the method must be situation-
or context-aware.
The Collins dictionary defines importance as follows:
‘The importance of something is its quality of being sig-
nificant, valued, or necessary in a particular situation’. The
Context in CIU corresponds to the ‘in a particular situa-
tion’ part of this definition. So, by definition, ‘importance’ is
context-dependent. When dealing with black-box systems,
the word ‘something’ signifies an input feature or combina-
tion of input features. Therefore, CI could also have been
called ‘Contextual Feature Importance’.
However, ‘importance’ does NOT express positive or neg-
ative judgements. Something like ‘good importance’, ‘bad
importance’, ‘typical importance’, etc., does not exist. Ad-
jectives such as ‘good’, ‘bad’, ‘typical’, ‘favorable’, etc., are
used for expressing judgments about feature values, which
leads us to the notion of value utility and utility function.
The Collins dictionary defines a utility function as ‘a func-
tion relating specific goods and services in an economy to
individual preferences’. In CIU, CU expresses to what ex-
tent the current feature value(s) contribute to a high output
value, i.e what is the utility of the input value for achiev-
ing a high output value. Therefore, CU could also have been
called ‘Contextual Value Utility’.
The next section provides the definition of CIU and im-
plementation details. The following sections show results
on classification and regression tasks with continuous- and
discrete-valued inputs. Section 5 shows how to construct ex-
planations using vocabularies and different levels of abstrac-
tion, followed by Conclusion.
1 Contextual Importance and Utility (CIU)
This presentation of CIU is based on the one in (Fr¨
amling
2020b) and reuses the definitions found there, together
with extensions and implementation details for the ‘ciu’
package. The ‘ciu’ package is available at https://github.
com/KaryFramling/ciu1. The most recent development ver-
sion can be installed from there directly by the usual
devtools() commands, as explained in the repository.
Version 0.1.0, which is described here, is available from
CRAN since November 20th, 2020, at https://cran.r-project.
org/web/packages/ciu and can be installed as an ordinary R
package using the install_packages() function. We
begin with basic definitions for describing CIU.
Definition 1 (Black-box model).A black-box model is a
mathematical transformation fthat maps inputs #»
xto out-
puts #»
yaccording to #»
y=f(#»
x).
Definition 2 (Context).A Context
#»
Cdefines the input val-
ues #»
xthat describe the current situation or instance to be
explained.
Definition 3 (Pre-defined output range).The value range
[absminj, absmaxj]that an output yjcan take by defini-
tion.
In classification tasks, the Pre-defined output range is typ-
ically [0,1]. In regression tasks the minimum and maximum
output values in a training set often provide a good estimate
of [absminj, absmaxj].
Definition 4 (Set of studied inputs for CIU).The index set
{i}defines the indices of inputs #»
xfor which CIU is calcu-
lated.
Definition 5 (Estimated output range).
[Cminj(
#»
C , {i}), Cmaxj(
#»
C , {i})] is the range of val-
ues that an output yjcan take in the Context
#»
Cwhen
modifying the values of inputs x{i}.
Definition 6 (Contextual Importance).Contextual Impor-
tance CIj(
#»
C , {i})expresses to what extent variations in one
or several inputs {i}affect the value of an output jof a
black-box model f, according to
CIj(
#»
C , {i}) = C maxj(
#»
C , {i})−Cminj(
#»
C , {i})
absmaxj−absminj
(1)
Definition 7 (Contextual Utility).Contextual Utility
CUj(
#»
C , {i})expresses to what extent the current input val-
ues
#»
Care favorable for the output yj(
#»
C)of a black-box
model, according to
CUj(
#»
C , {i}) = yj(
#»
C)−Cminj(
#»
C , {i})
Cmaxj(
#»
C , {i})−Cminj(
#»
C , {i})(2)
CU values are in the range [0,1] by definition. CU = 0
signifies that the current x{i}value(s) is the least favorable
for the studied output and CU = 1 signifies that the x{i}
value(s) is the most favorable for the studied output. In the
CIU bar plots shown in the following sections, a ‘neutral’
CU of 0.5corresponds to a yellow colour.
1A Python CIU package is available at https://github.com/
TimKam/py-ciu and described in (Anjomshoae, Kampik, and
Fr¨
amling 2020). ”R” and Python implementations are developed
in the same team but independently of each other. The underlying
CIU method is identical but the packages provide different kinds
of visualisation and explanation mechanisms. Implementation de-
tails might also differ, such as how the ‘Set of representative input
vectors’ is generated.
Algorithm 1: Set of representative input vectors
Result: N×Mmatrix S(
#»
C , {i})
1begin
2forall discrete inputs do
3D←all possible value combinations for
discrete inputs {i};
4Randomize row order in D;
5if Dhas more rows than Nthen
6Set Nto number of rows in D;
7end
8end
9forall continuous-valued inputs do
10 Initialize N×Mmatrix Rwith current input
values
#»
C;
11 R←two rows per continuous-valued inputs
in {i}where the current value is replaced by
the values min{i}and max{i}respectively;
12 R←fill remaining rows to Nwith random
values from intervals [min{i}, max{i}];
13 end
14 S(
#»
C , {i})←concatenation of
#»
C,Dand R,
where Dis repeated if needed to obtain Nrows;
15 end
Definition 8 (Intermediate Concept).An Intermediate Con-
cept names a given set of inputs {i}.
CIU can be estimated for any set of inputs {i}. Interme-
diate concepts make it possible to specify vocabularies that
can be used for producing explanations on any level of ab-
straction. In addition to using Intermediate Concepts for ex-
plaining yj(
#»
C)values, Intermediate Concept values can be
further explained using more specific Intermediate Concepts
or input features. The following defines Generalized Contex-
tual Importance for explaining Intermediate Concepts.
Definition 9 (Generalized Contextual Importance).
CIj(
#»
C , {i},{I}) = Cmaxj(
#»
C , {i})−Cminj(
#»
C , {i})
Cmaxj(
#»
C , {I})−Cminj(
#»
C , {I})
(3)
where {I}is the set of input indices that correspond to the
Intermediate Concept that we want to explain and {i} ∈
{I}.
Equation 3 is similar to Equation 1 when {I}is the set
of all inputs, i.e. the range [absminj, absmaxj]has been
replaced by the range [Cminj(
#»
C , {I}, Cmaxj(
#»
C , {I}].
Equation 2 for CU does not change by the introduction of
Intermediate Concepts. In other words, Equation 3 allows
the explanation of the outputs yj(
#»
C)as well as the explana-
tion of any Intermediate Concept that leads to yj(
#»
C).
The range [Cminj(
#»
C , {i}), Cmaxj(
#»
C , {i})] is the only
part of CIU that cannot be calculated directly. A model-
agnostic approach is to generate a Set of representative input
vectors.
Figure 1: CIU explanation for instance #100 of Iris data set,
including input/output plots for ‘Petal Length’.
Definition 10 (Set of representative input vectors).
S(
#»
C , {i})is an N×Mmatrix, where Mis the length of
#»
xand Nis a parameter that gives the number of input vec-
tors to generate for obtaining an adequate estimate of the
range [Cminj(
#»
C , {i}), Cmaxj(
#»
C , {i})].
Algorithm 1 shows how the Set of representative input
vectors is created in the current implementation. Nis the
only adjustable parameter with a default value N= 100,
which should be sufficient for most cases with only one in-
put. The choice of Nis a compromise between calculation
speed and desired accuracy.
If the input value intervals [min{i}, max{i}]are not pro-
vided as parameters, then they are retrieved from the training
data set by column-wise min and max operations. There
is an obvious risk that this approach generates input value
combinations that are impossible in reality. Whether that is
a problem or not depends on how the black-box model be-
haves in such cases. In practice, this has not been a problem
with the data sets studied so far, of which some are shown in
the next sections. If such problems occur, then they can be
dealt with in many different ways, however that remains out
of the scope of this paper.
2 Classification with continuous-valued
inputs
The Iris data set is used for this category mainly because the
limits between the different Iris classes require highly non-
linear models for correctly estimating the probability of the
three classes for each studied instance. Figure 1 shows a CIU
explanation generated for the ‘lda’ model and instance num-
ber 100 of Iris data set. The model was trained and figures
were generated by the following code:
# No t r a i n i n g /t e s t s e t n e ed e d h e r e
i r i s t r a i n <−i r i s [ , 1 : 4 ]
i r i s l a b <−i r i s $Species
model <−l da ( i r i s t r a i n , i r i s l ab )
c i u <−c i u . new (m od el , Species ˜. , i r i s )
c i u $ggplot . c o l . c i u ( i r i s [ 1 0 0 , 1 : 4 ] )
The output values as a function of one input were gen-
erated by calling ciu$plot.ciu(). All other presented
results have been generated in the same way.
The CIU explanation clearly shows that Petal
Length&Width are the most important features sepa-
rating ‘versicolor’ and ‘virginica’, where changing the value
of either one would also change the classification. This is
even the case for ‘setosa’, where a significantly smaller
Petal Length value would change the result to ‘setosa’.
3 Regression with continuous-valued inputs
Figure 2: CIU explanation for instance #370 of Boston
Housing data set, including input/output plots for features
‘lstat’, ‘rm’ and ‘crim’.
The Boston Housing data provides a task of estimating the
median value of owner-occupied homes in $1000’s, based
on continuous-valued inputs. It is a non-linear regression
task that is frequently used for demonstrating results of
XAI methods. Figure 2 shows CIU results using a Gradient
Boosting Machine (gbm) model. The studied instance #370
is a very expensive one (50 k$) so the bar plot visualisa-
tion should be dominantly green, i.e. have favorable values
at least for the most important features. This is indeed the
case. However, there are also some exceptions and notably
the number of rooms (‘rm’) is only average for this instance
(rm = 6.683), even though such a value would be very fa-
vorable for most cheaper homes.
Figure 3: CIU explanation for Car instance #1098.
4 Discrete inputs
The UCI Cars Evaluation data set (https://archive.ics.uci.
edu/ml/datasets/car+evaluation) evaluates how good differ-
ent cars are based on six discrete-valued input features.
There are four different output classes: ‘unacc’, ‘acc’, ‘good’
and ‘vgood’. This signifies that both inputs and output
are discrete-valued. Figure 3 shows the basic results for a
‘vgood’ car (instance #1098). The model is Random Forest.
CIU indicates that this car is ‘vgood’ because it has very
good values for all important criteria. Having only two doors
is less good but it is also a less important feature. In general,
the CIU visualisation is well in line with the output value for
all classes.
5 Abstraction Levels and Vocabularies
Using Intermediate Concepts, explanations can have differ-
ent levels of detail and use different vocabularies. The initial
results of the Cars data set were produced by the data set
authors using a rule set that uses the intermediate concepts
‘PRICE’, ‘COMFORT’ and ‘TECH’, as reported in (Bo-
hanec and Rajkoviˇ
c 1988). The corresponding vocabulary
is defined as follows in ‘ciu’
price <− c ( 1 , 2 )
comfort <− c ( 3 , 4 , 5 )
t e c h <− c ( comfort , 6)
car <− c ( p r i c e , t e c h )
voc <− l i s t ( ” PRI CE ”= p r i c e , ”COMFORT”= c o m fo r t ,
”TECH” = t e c h , ”CAR”= c a r )
The vocabulary is provided as a parameter to
the ciu.new() function. Explaining the out-
put value using intermediate concepts PRICE and
TECH is done by giving the parameter value
concepts.to.explain=c("PRICE","TECH")to
the ggplot.col.ciu() method. If the explanation is
for an intermediate concept rather than for the final result,
then the ‘target.concept’ parameter is used as in
c i u $ggplot . c o l . c i u ( c a r s . i n s t ,
i n d . i n p u t s = vo c $PRICE ,
t a r g e t . c on c e pt = ” PRICE ” )
Figure 4: Car explanations using intermediate concepts.
Figure 5: Car explanations after modifications.
The corresponding CIU explanations are shown in Fig-
ure 4. Figure 5 shows that the result changes from ‘vgood’
to ‘acc’ when changing the value of ‘safety’ to ‘med’ to-
gether with the corresponding TECH explanation. It is clear
that if ‘safety’ is only ‘med’, then the car can’t be ‘vgood’,
no matter how good the other features are.
6 Conclusion
Contextual Importance and Utility make it possible to ex-
plain results of ‘any’ AI system without constructing an
intermediate, interpretable model for explanation. Further-
more, CIU can provide explanations with any level of ab-
straction and using semantics that are independent of (or
at least loosely-coupled with) the internal mechanisms of
the AI system. The ‘ciu’ package is intended to allow re-
searchers to apply CIU to all kinds of data sets and problems
and to assess how useful the explanations are.
Work is ongoing on the use of CIU for image recognition
and saliency maps and the results are promising. Those func-
tionalities will be published in a new version of this package,
or as separate packages.
7 Acknowledgments
The work is partially supported by the Wallenberg AI, Au-
tonomous Systems and Software Program (WASP) funded
by the Knut and Alice Wallenberg Foundation..
References
Anjomshoae, S.; Kampik, T.; and Fr¨
amling, K. 2020. Py-
CIU: A Python Library for Explaining Machine Learning
Predictions Using Contextual Importance and Utility. In
Proceedings :. URL https://sites.google.com/view/xai2020/
home. Conference postponed from July 2020 to preliminary
January 2021. .
Bohanec, M.; and Rajkoviˇ
c, V. 1988. Knowledge Acquisi-
tion and Explanation for Multi-Attribute Decision. In 8th
International Workshop on Expert Systems and Their Appli-
cations, Avignon, France, 59–78.
Fr¨
amling, K. 1996. Mod ´
elisation et apprentissage des
pr´
ef´
erences par r´
eseaux de neurones pour l’aide `
a la
d´
ecision multicrit`
ere. Phd thesis, INSA de Lyon. URL
https://tel.archives-ouvertes.fr/tel-00825854.
Fr¨
amling, K. 2020a. Decision Theory Meets Explainable
AI. In Calvaresi, D.; Najjar, A.; Winikoff, M.; and Fr¨
amling,
K., eds., Explainable, Transparent Autonomous Agents and
Multi-Agent Systems, 57–74. Cham: Springer International
Publishing. ISBN 978-3-030-51924-7.
Fr¨
amling, K. 2020b. Explainable AI without Interpretable
Model. URL https://arxiv.org/abs/2009.13996.
Fr¨
amling, K.; and Graillot, D. 1995. Extracting Expla-
nations from Neural Networks. In ICANN’95 Confer-
ence. Paris, France. URL https://hal-emse.ccsd.cnrs.fr/
emse-00857790.