PreprintPDF Available

Contrastive Explanations with Local Foil Trees

Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Recent advances in interpretable Machine Learning (iML) and eXplainable AI (XAI) construct explanations based on the importance of features in classification tasks. However, in a high-dimensional feature space this approach may become unfeasible without restraining the set of important features. We propose to utilize the human tendency to ask questions like "Why this output (the fact) instead of that output (the foil)?" to reduce the number of features to those that play a main role in the asked contrast. Our proposed method utilizes locally trained one-versus-all decision trees to identify the disjoint set of rules that causes the tree to classify data points as the foil and not as the fact. In this study we illustrate this approach on three benchmark classification tasks.
No caption available
Content may be subject to copyright.
Contrastive Explanations with Local Foil Trees
Jasper van der Waa *12 Marcel Robeer *13 Jurriaan van Diggelen 1Matthieu Brinkhuis 3Mark Neerincx 1 2
Recent advances in interpretable Machine Learn-
ing (iML) and eXplainable AI (XAI) construct
explanations based on the importance of fea-
tures in classification tasks. However, in a high-
dimensional feature space this approach may be-
come unfeasible without restraining the set of
important features. We propose to utilize the hu-
man tendency to ask questions like “Why this
output (the fact) instead of that output (the foil)?”
to reduce the number of features to those that
play a main role in the asked contrast. Our pro-
posed method utilizes locally trained one-versus-
all decision trees to identify the disjoint set of
rules that causes the tree to classify data points
as the foil and not as the fact. In this study we
illustrate this approach on three benchmark clas-
sification tasks.
1. Introduction
The research field of making Machine Learning (ML) mod-
els more interpretable is receiving much attention. One of
the main reasons for this is the advance in such ML models
and their applications to high-risk domains. Interpretabil-
ity in ML can be applied for the following purposes: (i)
transparency in the model to facilitate understanding by
users (Herman,2017); (ii) the detection of biased views
in a model (Crawford,2016;Caliskan et al.,2017); (iii) the
identification of situations in which the model works ad-
equately and safely (Barocas & Selbst,2016;Coglianese
& Lehr,2016;Friedler et al.,2018); (iv) the construction
of accurate explanations that explain the underlying causal
phenomena (Lipton,2016); and (v) to build tools that allow
*Equal contribution 1Perceptual and Cognitive Systems, Dutch
Research Organization for Applied Research (TNO), Soesterberg,
The Netherlands 2Interactive Intelligence group, Technical Uni-
versity of Delft, Delft, The Netherlands 3Department of Infor-
mation and Computing Sciences, Utrecht University, Utrecht,
The Netherlands. Correspondence to: Jasper van der Waa
2018 ICML Workshop on Human Interpretability in Machine
Learning (WHI 2018), Stockholm, Sweden. Copyright by the au-
model engineers to build better models and debug existing
models (Kulesza et al.,2011;2015).
The existing methods in iML focus on different approaches
of how the information for an explanation can be obtained
and how the explanation itself can be constructed. See for
example for an overview the review papers of Guidotti et al.
(2018) and Chakraborty et al. (2017). A number of exam-
ples of common methods are: ordering the feature’s con-
tribution to an output (Datta et al.,2016;Lei et al.,2016;
Ribeiro et al.,2016), attention maps and saliency of the
features (Selvaraju et al.,2016;Montavon et al.,2017;Sun-
dararajan et al.,2017;Zhang et al.,2017), prototype selec-
tion, construction and presentation (Nguyen et al.,2016),
word annotations (Hendricks et al.,2016;Ehsan et al.,
2017), and summaries with decision trees (Krishnan et al.,
1999;Thiagarajan et al.,2016;Zhou & Hooker,2016) and
decision rules (Hein et al.,2017;Malioutov et al.,2017;
Puri et al.,2017;Wang et al.,2017). In this study we fo-
cus on feature-based explanations. Such explanations tend
to be long when based on all features or use an arbitrary
cutoff point. We propose a model-agnostic method to limit
the explanation length with the help of contrastive explana-
tions. The method also adds information of how that fea-
ture contributes to the output in the form of decision rules.
Throughout this paper, the main reason for explanations is
to offer transparency in the model’s given output based on
which features play a role and what that role is. A few
methods that offer similar explanations are LIME (Ribeiro
et al.,2016), QII (Datta et al.,2016), STREAK (Elenberg
et al.,2017) and SHAP (Lundberg & Lee,2016). Each of
these approaches answers the question “Why this output?”
in some way by providing a subset of features or an or-
dered list of all features, either visualized or structured in a
text template. However, when humans answer such ques-
tions to each other they tend to limit their explanations to
a few vital points (Pacer & Lombrozo,2017). This human
tendency for simplicity also shows in iML: when multiple
explanations hold we should pick the simplest explanation
that is consistent with the data (Huysmans et al.,2011).
The mentioned approaches do this by either thresholding
the contribution parameter to a fixed value, presenting the
entire ordered list or by applying it only to low-dimensional
arXiv:1806.07470v1 [stat.ML] 19 Jun 2018
Contrastive Explanations with Local Foil Trees
This study offers a more human-like way of limiting
the list of contributing features by setting a contrast be-
tween two outputs. The proposed contrastive explanations
present only the information that causes some data point
to be classified as some class instead of another (Miller
et al.,2017). Recently, Dhurandhar et al. (2018) have
proposed constructing explanations by finding contrastive
perturbations—minimal changes required to change the
current classification to any arbitrary other class. Instead,
our approach creates contrastive targeted explanations by
first defining the output of interest. In other words, our con-
trastive explanations answer the question “Why this output
instead of that output?”. The contrast is made between the
fact, the given output, and the foil, the output of interest.
A relative straightforward way to construct contrastive ex-
planations given a foil based on feature contributions, is to
compare the two ordered feature lists and see how much
some feature differs in their ranking. However, a feature
may have the same rank in both ordered lists but can be
used in entirely different ways for the fact and foil classes.
To mitigate this problem we propose a more meaningful
comparison based on how a feature is used to distinct the
foil from the fact. We train an arbitrary model to distin-
guish between fact and foil that is more accessible. From
that model we distill two sets of rules; one used to identify
data points as a fact and the other to identify data points as
a foil. Given these two sets, we subtract the factual rule set
from the foil rule set. This relative complement of the fact
rules in the foil rules is used to construct our contrastive
explanation. See Figure 1for an illustration.
Figure 1. This figure shows the general idea of our approach to
contrastive explanations. Given a set of rules that define data
points as either the fact or foil, we take the relative complement
of the fact rules in the foil rules to obtain a description how the
foil differs from the fact in terms of features.
The method we propose in this study obtains this comple-
ment by training a one-versus-all decision tree to recognize
the foil class. We refer to this decision tree as the Foil Tree.
Next, we identify the fact-leaf—the leaf in which the cur-
rent questioned data point resides. Followed by identifying
the foil-leaf, which is obtained by searching the tree with
some strategy. Currently our strategy is simply to choose
the closest leaf to the fact-leaf that classifies data points as
the foil class. The complement is then the set of decision
nodes (representing rules) that are a parent of the foil-leaf
but not of the fact-leaf. Rules that overlap are merged to
obtain a minimum coverage rule set. The rules are then
used to construct our explanation. The method is discussed
in more detail in section 2. An example of its usage is dis-
cussed in section 3on three benchmark classification tasks.
The validation on these three tasks shows that the proposed
method constructs shorter explanations than the fully fea-
ture list, provide more information of how these features
contribute and that this contribution matches the underly-
ing model closely.
2. Foil Trees; a way for obtaining contrastive
The method we propose learns a decision tree centred
around any questioned data point. The decision tree is
trained to locally distinguish the foil-class from any other
class, including the fact class. Its training occurs on data
points that can either be generated or sampled from an
existing data set, each labeled with predictions from the
model it aims to explain. As such, our method is model-
agnostic. Similar to LIME (Ribeiro et al.,2016), the sample
weights of each generated or sampled data point depend on
its similarity to the data point in question. Samples in the
vicinity of the questioned data point receive higher weights
in training the tree, ensuring its local faithfulness.
Given this tree, the ‘foil-tree’, we search for the leaf in
which the data point in question resides, the so called ‘fact-
leaf’. This gives us the set of rules that defines that data
point as the not-foil class according to the foil-tree. These
rules respect the decision boundary of the underlying ML
model as it is trained to mirror the foil class outputs. Next,
we use an arbitrary strategy to locate the ‘foil-leaf’—for
example the leaf that classifies data point as the foil class
with the lowest number of nodes between itself and the
fact-leaf. This results in two rule sets, whose relative com-
plement define how the data point in question differs from
the foil data points as classified by the foil-leaf. This ex-
planation of the difference is done in terms of the input
features themselves.
In summary, the proposed method goes through the follow-
ing steps to obtain a contrastive explanation for an arbitrary
ML model, the questioned data point and its output accord-
ing to that ML model:
1. Retrieve the fact; the output class.
2. Identify the foil; explicitly given in the question or
derived (e.g. second most likely class).
3. Generate or sample a local data set; either ran-
domly sampled from an existing data set, generated
Contrastive Explanations with Local Foil Trees
according to a normal distribution, generated based on
marginal distributions of feature values or more com-
plex methods.
4. Train a decision tree; with sample weights depending
on the training point’s proximity or similarity to the
data point in question.
5. Locate the ‘fact-leaf’; the leaf in which the data point
in question resides.
6. Locate a ‘foil-leaf’; we select the leaf that classifies
data points as part of the foil class with the lowest
number of decision nodes between it and the fact-leaf.
7. Compute differences; to obtain the two set of rules
that define the difference between fact- and foil-leaf,
all common parent decision nodes are removed from
each rule sets. From the decision nodes that remain,
those that regard the same feature are combined to
form a single literal.
8. Construct explanation; the actual presentation of the
differences between the fact-leaf and foil-leaf.
Figure 2illustrates the aforementioned steps. The search
for the appropriate foil-leaf in step 6 can vary. In Section
2.1 we discuss this more in detail. Finally, note that the
method is not symmetrical. There will be a different an-
swer on the question “Why class A and not B?” then on
“Why class B and not A?” as the foil-tree is trained in the
first case to identify class B and in the second case to iden-
tify class A. This is because we treat the foil as the ex-
pected class or the class of interest to which we compare
everything else. In addition, even if the trees are similar,
the relative complements of their rule sets are reversed
2.1. Foil-leaf strategies
Up to now we mentioned one strategy to find a foil-leaf,
however multiple strategies are possible—although not all
strategies may result in a satisfactory explanation according
to the user. The strategy used in this study is simply the
first leaf that is closest to the fact-leaf in terms of number
decision nodes, resulting in a minimal length explanation.
A disadvantage of this strategy is its ignorance towards the
value of the foil-leaf compared to the rest of the tree. The
nearest foil-leaf may be a leaf that classifies only a rela-
tively few data points or classifies them with a relatively
high error rate. To mitigate such issues the foil-leaf selec-
tion mechanism can be generalized to a graph-search from
a specific (fact) vertex to a different (foil) vertex while min-
imizing edge weights. The foil-tree is treated as a graph
whose decision node and leaf properties influence some
weight function. This generalization allows for a number
of strategies, and each may result in a different foil-leaf.
The strategy used in this preliminary study simply reduces
to each edge having a weight of one, resulting in the nearest
foil-leaf when minimizing the total weights.
As an example, an improved strategy may be where the
edge weights are based on the relative accuracy of a node
(based on its leaves) or leaf. Where a higher accuracy re-
sults in a lower weight, allowing the strategy to find more
distant, but more accurate, foil-leaves. This may result in
relatively more complex and longer explanations, which
nonetheless hold in more general cases. For example the
nearest foil-leaf may only classify a few data points accu-
rately, whereas a slightly more distant leaf classifies sig-
nificantly more data points accurately. Given the fact that
an explanation should be both accurate and fairly general,
this proposed strategy may be more beneficial (Craven &
Note that the proposed method assumes the knowledge of
the used foil. In all cases we take the second most likely
class as our foil. Although this may be an interesting foil it
may not be the contrast the user actually wants to make.
Either the user makes its foil explicit or we introduce a
feedback loop in the interaction that allows our approach
to learn which foil is asked for in which situations. We
leave this for future work.
3. Validation
The proposed method is validated on three benchmark clas-
sification tasks from the UCI Machine Learning Reposi-
tory (Dua & Karra Taniskidou,2017); the Iris data set, the
PIMA Indians Diabetes data set and the Cleveland Heart
Disease data set. The first data set is a well-known clas-
sification task of plants based on four flower leaf charac-
teristics with a size of 150 data points and three classes.
The second data set is a binary classification task whose
task is to correctly diagnose diabetes and contains 769 data
points and has nine features. The third data set is aims at
classifying the risk of heart disease from no presence (0)
to presence (14), consisting of 297 instances with 13 fea-
To show the model-agnostic nature of our proposed method
we applied four distinct classification models to each data
set: a random forest, logistic regression, support vector ma-
chine (SVM) and a neural network. Table 1shows for each
data set and classifier the F1score of the trained model.
We validated our approach on four measures; explanation
length, accuracy, fidelity and time. These measures for
evaluating iML decision rules are adapted from Craven &
Shavlik (1999), where the mean length serves as a proxy
measure demonstrating the relative explanation compre-
hensibility (Doshi-Velez & Kim,2017). The fidelity allows
us to state how well the tree explains the underlying model,
Contrastive Explanations with Local Foil Trees
Figure 2. The steps needed to define and train a Foil Tree and to use it to construct a contrastive explanation. Each step corresponds with
the listed steps in section 2.
and the accuracy tells us how well its explanations general-
ize to unseen data points. Below we describe each in detail:
1. Mean length; average length of the explanation in
terms of decision nodes. The ideal value is in the
range [1.0,Nr. features), since a length of 0means
that no explanation is found and a length near the
number of features offers little gain compared to
showing the entire ordered feature contribution list as
in other iML methods.
2. Accuracy;F1score of the foil-tree for its binary clas-
sification task on the test set compared to the true la-
bels. This measure indicates how general the expla-
nations generated from the Foil Tree are on an unseen
test set.
3. Fidelity;F1score of the foil-tree on the test set com-
pared to the model output. This measure provides a
quantitative value of how well the Foil Tree agrees
with the underlying classification model it tries to ex-
4. Time; number of seconds needed on average to ex-
plain a test data point.
Each measure is cross-validated three times to account for
randomness in foil-tree construction. These results are
shown in their respective columns in Table 1. They show
that on average the Foil Tree is able to provide concise ex-
planations, with a mean length 1.33, while accurately mim-
icking the decision boundaries used by the model with a
mean fidelity of 0.93 and generalizes well to unseen data
with a mean accuracy of 0.92. The foil-tree performs simi-
lar to the underlying ML model in terms of accuracy. Note
that for the random forest, logistic regression and SVM
models on the diabetes data set rules of length zero were
found—i.e. no explanatory differences were found be-
tween facts and foils in a number of cases—, resulting in
a mean length of less than one. For all other models our
method was able to find a difference for every questioned
data point.
To further illustrate the proposed method, below we present
a single explanation of two classes of the Iris data set in a
dialogue setting;
System: The flowertype is ‘Setosa’.
User: Why ‘Setosa’ and not ‘Versicolor’?
System: Because for it to be ‘Versicolor’ the
‘petal width (cm)’ should be smaller and the
‘sepal width (cm)’ should be larger.
User: How much smaller and larger?
System: The ‘petal width (cm)’ should be
smaller than or equal to 0.8and the ‘sepal
width (cm)’ should be larger than 3.3.
The fact is the ‘Setosa’ class, the foil is the ‘Versicolor’
class and the total length of the explanation contains two
decision nodes or literals. The generation of this small di-
alogue is based on text templates and fixed interactions for
the user.
Contrastive Explanations with Local Foil Trees
Table 1. Performance of foil-tree explanations on the Iris, PIMA Indians Diabetes and Heart Disease classification tasks. The column
’Mean length’ also contains the total number of features for that data set as the upper bound of the explanation length.
RANDOM FOREST 0.93 1.94 (4) 0.96 0.97 0.014
LOGISTIC REGRESSION 0.93 1.50 (4) 0.89 0.96 0.007
SVM 0.93 1.37 (4) 0.89 0.92 0.010
NEURAL NET WORK 0.97 1.32 (4) 0.87 0.87 0.005
RANDOM FOREST 1.00 0.98 (9) 0.94 0.94 0.041
LOGISTIC REGRESSION 1.00 0.98 (9) 0.94 0.94 0.032
SVM 1.00 0.98 (9) 0.94 0.94 0.034
NEURAL NET WORK 1.00 1.66 (9) 0.99 0.99 0.009
RANDOM FOREST 0.94 1.32 (13) 0.88 0.90 0.106
LOGISTIC REGRESSION 1.00 1.21 (13) 0.99 0.99 0.006
SVM 1.00 1.19 (13) 0.86 0.86 0.012
NEURAL NET WORK 1.00 1.56 (13) 0.92 0.92 0.009
4. Conclusion
Current developments in Interpretable Machine Learning
(iML) created new methods to answer “Why output A?”
for Machine Learning (ML) models. A large set of such
methods use the contributions of each feature used to clas-
sify A and then provides either a subset of feature whose
contribution is above a threshold, the entire ordered feature
list or simply apply it only to low-dimensional data.
This study proposes a novel method to reduce the number
of contributing features for a class by answering a contrast-
ing question of the form “Why output A (fact) instead of
output B (foil)?” for an arbitrary data point. This allows
us to construct an explanation in which only those features
play a role that distinguish A from B. Our approach finds
the contrastive explanation by taking the complement set of
decision rules that cause the classification of A in the rule
set of B. In this study we implemented this idea by training
a decision tree to distinguish between B and not-B (one-
versus-all approach). A fact-leaf is found in which the data
point in question resides. Also, a foil-leaf is selected ac-
cording to a strategy where all data points are classified as
the foil (output B). We then form the contrasting rules by
extracting the decision nodes in the sub-tree from the low-
est common ancestor between the fact-leaf and foil-leaf,
that hold for the foil-leaf but not for the fact-leaf. Overlap-
ping rules are merged and eventually used to construct an
We introduced a simple and naive strategy of finding an ap-
propriate foil-leaf. We also provided an idea to extend this
method with more complex and accurate strategies, which
is part of our future work. We plan a user validation of
our explanations with non-experts in Machine Learning to
test the satisfaction of our explanations. In this study we
tested if the proposed method is viable on three different
benchmark tasks as well as to test its fidelity on different
underlying ML models to show its model-agnostic capac-
The results showed that for different classifiers our method
is able to offer concise explanations that accurately de-
scribe the decision boundaries of the model it explains.
As mentioned, our future work will consist out of extending
this preliminary method with more foil-leaf search strate-
gies as well as applying the method to more complex tasks
and validating its explanations with users. Furthermore, we
plan to extend the method with an adaptive foil-leaf search
to adapt explanations towards a specific user based on user
Barocas, S. and Selbst, A. D. Big Data’s Disparate Impact.
Cal. L. Rev., 104:671, 2016.
Caliskan, A., Bryson, J. J., and Narayanan, A. Semantics
Derived Automatically from Language Corpora Con-
tain Human-Like Biases. Science, 356(6334):183–186,
Chakraborty, S., Tomsett, R., Raghavendra, R., Harborne,
D., Alzantot, M., Cerutti, F., Srivastava, M., Preece, A.,
Julier, S., Rao, R. M., Kelley, Troy D., Braines, D., Sen-
soy, M., Willis, C. J., and Gurram, P. Interpretability of
Deep Learning Models: A Survey of Results. In IEEE
Smart World Congr. DAIS - Work. Distrib. Anal. Infras-
truct. Algorithms Multi-Organization Fed. IEEE, 2017.
Coglianese, C. and Lehr, D. Regulating by Robot: Admin-
istrative Decision Making in the Machine-Learning Era.
Geo. LJ, 105:1147, 2016.
Craven, M. W. and Shavlik, J. W. Rule Extraction: Where
Do We Go from Here? Technical report, University of
Wisconsin Machine Learning Research Group, 1999.
Contrastive Explanations with Local Foil Trees
Crawford, K. Artificial Intelligence’s White Guy Problem.
The New York Times, 2016.
Datta, A., Sen, S., and Zick, Y. Algorithmic Transparency
via Quantitative Input Influence: Theory and Experi-
ments with Learning Systems. In Proc. 2016 IEEE Symp.
Secur. Priv. (SP 2016), pp. 598–617. IEEE, 2016. ISBN
9781509008247. doi: 10.1109/SP.2016.42.
Dhurandhar, Amit, Chen, Pin-Yu, Luss, Ronny, Tu, Chun-
Chen, Ting, Paishun, Shanmugam, Karthikeyan, and
Das, Payel. Explanations based on the Missing: To-
wards Contrastive Explanations with Pertinent Nega-
tives. arXiv preprint arXiv:1802.07623, 2018.
Doshi-Velez, F and Kim, B. Towards A Rigorous Sci-
ence of Interpretable Machine Learning. arXiv preprint
arXiv:1702.08608, 2017.
Dua, D. and Karra Taniskidou, E. UCI Machine Learn-
ing Repository, 2017. URL http://archive.ics.
Ehsan, U., Harrison, B., Chan, L., and Riedl, M. O. Ra-
tionalization: A Neural Machine Translation Approach
to Generating Natural Language Explanations. arXiv
preprint arXiv:1702.07826, 2017.
Elenberg, E. R., Dimakis, A. G., Feldman, M., and Karbasi,
A. Streaming Weak Submodularity: Interpreting Neural
Networks on the Fly. arXiv preprint arXiv:1703.02647,
Friedler, S. A., Scheidegger, C., Venkatasubramanian, S.,
Choudhary, S., Hamilton, E. P., and Roth, D. A Compar-
ative Study of Fairness-Enhancing Interventions in Ma-
chine Learning. arXiv preprint arXiv:1802.04422, 2018.
Guidotti, R., Monreale, A., Turini, F., Pedreschi, D., and
Giannotti, F. A Survey Of Methods For Explaining
Black Box Models. arXiv preprint arXiv:1802.01933,
Hein, D, Udluft, S, and Runkler, T. A. Interpretable Poli-
cies for Reinforcement Learning by Genetic Program-
ming. arXiv preprint arXiv:1712.04170, 2017.
Hendricks, L. A., Akata, Z., Rohrbach, M., Donahue, J.,
Schiele, B., and Darrell, T. Generating Visual Explana-
tions. In Eur. Conf. Comput. Vis., pp. 3–19, 2016. ISBN
9783319464923. doi: 10.1007/978-3-319-46493-0 1.
Herman, B. The Promise and Peril of Human Evaluation
for Model Interpretability. In Conf. Neural Inf. Process.
Syst., 2017.
Huysmans, J., Dejaeger, K., Mues, C., Vanthienen, J., and
Baesens, B. An Empirical Evaluation of the Comprehen-
sibility of Decision Table, Tree and Rule Based Predic-
tive Models. Decis. Support Syst., 51(1):141–154, 2011.
ISSN 01679236. doi: 10.1016/j.dss.2010.12.003.
Krishnan, R., Sivakumar, G., and Bhattacharya, P. Ex-
tracting Decision Trees From Trained Neural Networks.
Pattern Recognit., 32:1999–2009, 1999. doi: 10.1145/
Kulesza, T., Stumpf, S., Wong, W.-K., Burnett, M. M., Per-
ona, S., Ko, A., and Oberst, I. Why-Oriented End-User
Debugging of Naive Bayes Text Classification. ACM
Trans. Interact. Intell. Syst. (TiiS), 1(1):2, 2011.
Kulesza, T., Burnett, M., Wong, W.-K., and Stumpf, S.
Principles of Explanatory Debugging to Personalize In-
teractive Machine Learning. In Proc. 20th Intl. Conf. on
Intell. User Interfaces, pp. 126–137. ACM, 2015.
Lei, T., Barzilay, R., and Jaakkola, T. Rationalizing Neu-
ral Predictions. arXiv preprint arXiv:1606.04155, 2016.
ISSN 9781450321389. doi: 10.1145/2939672.2939778.
Lipton, Z. C. The Mythos of Model Interpretability. In
2016 ICML Work. Hum. Interpret. Mach. Learn., 2016.
Lundberg, S. and Lee, S.-I. An Unexpected Unity Among
Methods for Interpreting Model Predictions. In 29th
Conf. Neural Inf. Process. Syst. (NIPS 2016), 2016.
Malioutov, D. M., Varshney, K. R., Emad, A., and Dash,
S. Learning Interpretable Classification Rules with
Boolean Compressed Sensing. Transparent Data Min.
Big Small Data. Stud. Big Data, 32, 2017. doi: 10.1007/
Miller, T., Howe, P., and Sonenberg, L. Explainable AI:
Beware of Inmates Running the Asylum. In Proc. Int. Jt.
Conf. Artif. Intell. (IJCAI), pp. 36–41, 2017.
Montavon, G., Lapuschkin, S., Binder, A., Samek, W.,
and M¨
uller, K. R. Explaining Nonlinear Classification
Decisions with Deep Taylor Decomposition. Pattern
Recognit., 65(C):211–222, 2017. ISSN 00313203. doi:
Nguyen, A., Dosovitskiy, A., Yosinski, J., Brox, T., and
Clune, J. Synthesizing the Preferred Inputs for Neurons
in Neural Networks via Deep Generator Networks. Adv.
Neural Inf. Process. Syst., 29, 2016.
Pacer, M. and Lombrozo, T. Ockham’s Razor Cuts to the
Root: Simplicity in Causal Explanation. J. Exp. Psychol.
Gen., 146(12):1761–1780, 2017. ISSN 1556-5068. doi:
Contrastive Explanations with Local Foil Trees
Puri, N., Gupta, P., Agarwal, P., Verma, S., and Kr-
ishnamurthy, B. MAGIX: Model Agnostic Glob-
ally Interpretable Explanations. arXiv preprint
arXiv:1702.07160, 2017.
Ribeiro, M. T., Singh, S., and Guestrin, C. “Why Should I
Trust You?”: Explaining the Predictions of Any Classi-
fier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Dis-
cov. Data Min. (KDD’16), pp. 1135–1144, 2016. ISBN
9781450321389. doi: 10.1145/2939672.2939778.
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,
Parikh, D., and Batra, D. Grad-CAM: Visual Expla-
nations from Deep Networks via Gradient-based Local-
ization. In NIPS 2016 Work. Interpret. Mach. Learn.
Complex Syst., 2016. ISBN 9781538610329. doi:
Sundararajan, M., Taly, A., and Yan, Q. Axiomatic Attribu-
tion for Deep Networks. In Proc. 34th Int. Conf. Mach.
Learn. (ICML), 2017.
Thiagarajan, J. J., Kailkhura, B., Sattigeri, P., and Rama-
murthy, K. N. TreeView: Peeking into Deep Neural
Networks Via Feature-Space Partitioning. In NIPS 2016
Work. Interpret. Mach. Learn. Complex Syst., 2016.
Wang, T., Rudin, C., Velez-Doshi, F., Liu, Y., Klampfl,
E., and Macneille, P. Bayesian Rule Sets for Inter-
pretable Classification. In Proc. IEEE Int. Conf. Data
Min. (ICDM), pp. 1269–1274. IEEE, 2017. ISBN
9781509054725. doi: 10.1109/ICDM.2016.130.
Zhang, J., Bargal, S. A., Lin, Z., Brandt, J., Shen, X., and
Sclaroff, S. Top-Down Neural Attention by Excitation
Backprop. Int. J. Comput. Vis., pp. 1–19, 2017. ISSN
15731405. doi: 10.1007/s11263-017-1059-x.
Zhou, Y. and Hooker, G. Interpreting Models via Single
Tree Approximation. arXiv preprint arXiv:1610.09036,
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
The search for interpretable reinforcement learning policies is of high academic and industrial interest. Especially for industrial systems, domain experts are more likely to deploy autonomously learned controllers if they are understandable and convenient to evaluate. Basic algebraic equations are supposed to meet these requirements, as long as they are restricted to an adequate complexity. Here we introduce the genetic programming for reinforcement learning (GPRL) approach based on model-based batch reinforcement learning and genetic programming, which autonomously learns policy equations from pre-existing default state–action trajectory samples. GPRL is compared to a straightforward method which utilizes genetic programming for symbolic regression, yielding policies imitating an existing well-performing, but non-interpretable policy. Experiments on three reinforcement learning benchmarks, i.e., mountain car, cart–pole balancing, and industrial benchmark, demonstrate the superiority of our GPRL approach compared to the symbolic regression method. GPRL is capable of producing well-performing interpretable reinforcement learning policies from pre-existing default trajectory data.
Full-text available
In the last years many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness sometimes at the cost of scarifying accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, delineating explicitly or implicitly its own definition of interpretability and explanation. The aim of this paper is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.
Full-text available
Machine learning is a means to derive artificial intelligence by discovering patterns in existing data. Here we show that applying machine learning to ordinary human language results in human-like semantic biases. We replicate a spectrum of known biases, as measured by the Implicit Association Test, using a widely used, purely statistical machine-learning model trained on a standard corpus of text from the Web. Our results indicate that text corpora contain re-coverable and accurate imprints of our historic biases, whether morally neutral as towards insects or flowers, problematic as towards race or gender, or even simply veridical, reflecting the status quo distribution of gender with respect to careers or first names. Our methods hold promise for identifying and addressing sources of bias in culture, including technology.
Full-text available
Understanding why a model made a certain prediction is crucial in many data science fields. Interpretable predictions engender appropriate trust and provide insight into how the model may be improved. However, with large modern datasets the best accuracy is often achieved by complex models even experts struggle to interpret, which creates a tension between accuracy and interpretability. Recently, several methods have been proposed for interpreting predictions from complex models by estimating the importance of input features. Here, we present how a model-agnostic additive representation of the importance of input features unifies current methods. This representation is optimal, in the sense that it is the only set of additive values that satisfies important properties. We show how we can leverage these properties to create novel visual explanations of model predictions. The thread of unity that this representation weaves through the literature indicates that there are common principles to be learned about the interpretation of model predictions that apply in many scenarios.
When evaluating causal explanations, simpler explanations are widely regarded as better explanations. However, little is known about how people assess simplicity in causal explanations or what the consequences of such a preference are. We contrast 2 candidate metrics for simplicity in causal explanations: node simplicity (the number of causes invoked in an explanation) and root simplicity (the number of unexplained causes invoked in an explanation). Across 4 experiments, we find that explanatory preferences track root simplicity, not node simplicity; that a preference for root simplicity is tempered (but not eliminated) by probabilistic evidence favoring a more complex explanation; that committing to a less likely but simpler explanation distorts memory for past observations; and that a preference for root simplicity is greater when the root cause is strongly linked to its effects. We suggest that a preference for root-simpler explanations follows from the role of explanations in highlighting and efficiently representing and communicating information that supports future predictions and interventions.
Transparency, user trust, and human comprehension are popular ethical motivations for interpretable machine learning. In support of these goals, researchers evaluate model explanation performance using humans and real world applications. This alone presents a challenge in many areas of artificial intelligence. In this position paper, we propose a distinction between descriptive and persuasive explanations. We discuss reasoning suggesting that functional interpretability may be correlated with cognitive function and user preferences. If this is indeed the case, evaluation and optimization using functional metrics could perpetuate implicit cognitive bias in explanations that threaten transparency. Finally, we propose two potential research directions to disambiguate cognitive function and explanation models, retaining control over the tradeoff between accuracy and interpretability.
Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust in a model. Trust is fundamental if one plans to take action based on a prediction, or when choosing whether or not to deploy a new model. Such understanding further provides insights into the model, which can be used to turn an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction. We further propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). The usefulness of explanations is shown via novel experiments, both simulated and with human subjects. Our explanations empower users in various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and detecting why a classifier should not be trusted.
Machine-learning algorithms are transforming large segments of the economy as they fuel innovation in search engines, self-driving cars, product marketing, and medical imaging, among many other technologies. As machine learning's use expands across all facets of society, anxiety has emerged about the intrusion of algorithmic machines into facets of life previously dependent on human judgment. Alarm bells sounding over the diffusion of artificial intelligence throughout the private sector only portend greater anxiety about digital robots replacing humans in the governmental sphere. A few administrative agencies have already begun to adopt this technology, while others have clear potential in the near term to use algorithms to shape official decisions over both rulemaking and adjudication. It is no longer fanciful to envision a future in which government agencies could effectively make law by robot, a prospect that understandably conjures up dystopian images of individuals surrendering their liberty to the control of computerized overlords. Should society be alarmed by governmental use of machine-learning applications? We examine this question by considering whether the use of robotic decision tools by government agencies can pass muster under core, time-honored doctrines of administrative and constitutional law. At first glance, the idea of algorithmic regulation might appear to offend one or more traditional doctrines, such as the nondelegation doctrine, procedural due process, equal protection, or principles of reason-giving and transparency. We conclude, however, that when machine-learning technology is properly understood, its use by government agencies can comfortably fit within these conventional legal parameters. We recognize, of course, that the legality of regulation by robot is only one criterion by which its use should be assessed. Agencies should not apply algorithms cavalierly, even if doing so might not run afoul of the law; in some cases, safeguards may be needed for machine learning to satisfy broader, good-governance aspirations. Yet, in contrast with the emerging alarmism, we resist any categorical dismissal of a future administrative state in which algorithmic automation guides, and even at times makes, key decisions. Instead, we urge that governmental reliance on machine learning should be approached with measured optimism about the potential benefits such technology can offer society by making government smarter and its decisions more efficient and just.
We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a deep network, and to enable users to engage with models better.