Content uploaded by Jasper van der Waa
Author content
All content in this area was uploaded by Jasper van der Waa on Jun 24, 2018
Content may be subject to copyright.
Contrastive Explanations with Local Foil Trees
Jasper van der Waa *12 Marcel Robeer *13 Jurriaan van Diggelen 1Matthieu Brinkhuis 3Mark Neerincx 1 2
Abstract
Recent advances in interpretable Machine Learn-
ing (iML) and eXplainable AI (XAI) construct
explanations based on the importance of fea-
tures in classification tasks. However, in a high-
dimensional feature space this approach may be-
come unfeasible without restraining the set of
important features. We propose to utilize the hu-
man tendency to ask questions like “Why this
output (the fact) instead of that output (the foil)?”
to reduce the number of features to those that
play a main role in the asked contrast. Our pro-
posed method utilizes locally trained one-versus-
all decision trees to identify the disjoint set of
rules that causes the tree to classify data points
as the foil and not as the fact. In this study we
illustrate this approach on three benchmark clas-
sification tasks.
1. Introduction
The research field of making Machine Learning (ML) mod-
els more interpretable is receiving much attention. One of
the main reasons for this is the advance in such ML models
and their applications to high-risk domains. Interpretabil-
ity in ML can be applied for the following purposes: (i)
transparency in the model to facilitate understanding by
users (Herman,2017); (ii) the detection of biased views
in a model (Crawford,2016;Caliskan et al.,2017); (iii) the
identification of situations in which the model works ad-
equately and safely (Barocas & Selbst,2016;Coglianese
& Lehr,2016;Friedler et al.,2018); (iv) the construction
of accurate explanations that explain the underlying causal
phenomena (Lipton,2016); and (v) to build tools that allow
*Equal contribution 1Perceptual and Cognitive Systems, Dutch
Research Organization for Applied Research (TNO), Soesterberg,
The Netherlands 2Interactive Intelligence group, Technical Uni-
versity of Delft, Delft, The Netherlands 3Department of Infor-
mation and Computing Sciences, Utrecht University, Utrecht,
The Netherlands. Correspondence to: Jasper van der Waa
<jasper.vanderwaa@tno.nl>.
2018 ICML Workshop on Human Interpretability in Machine
Learning (WHI 2018), Stockholm, Sweden. Copyright by the au-
thor(s).
model engineers to build better models and debug existing
models (Kulesza et al.,2011;2015).
The existing methods in iML focus on different approaches
of how the information for an explanation can be obtained
and how the explanation itself can be constructed. See for
example for an overview the review papers of Guidotti et al.
(2018) and Chakraborty et al. (2017). A number of exam-
ples of common methods are: ordering the feature’s con-
tribution to an output (Datta et al.,2016;Lei et al.,2016;
Ribeiro et al.,2016), attention maps and saliency of the
features (Selvaraju et al.,2016;Montavon et al.,2017;Sun-
dararajan et al.,2017;Zhang et al.,2017), prototype selec-
tion, construction and presentation (Nguyen et al.,2016),
word annotations (Hendricks et al.,2016;Ehsan et al.,
2017), and summaries with decision trees (Krishnan et al.,
1999;Thiagarajan et al.,2016;Zhou & Hooker,2016) and
decision rules (Hein et al.,2017;Malioutov et al.,2017;
Puri et al.,2017;Wang et al.,2017). In this study we fo-
cus on feature-based explanations. Such explanations tend
to be long when based on all features or use an arbitrary
cutoff point. We propose a model-agnostic method to limit
the explanation length with the help of contrastive explana-
tions. The method also adds information of how that fea-
ture contributes to the output in the form of decision rules.
Throughout this paper, the main reason for explanations is
to offer transparency in the model’s given output based on
which features play a role and what that role is. A few
methods that offer similar explanations are LIME (Ribeiro
et al.,2016), QII (Datta et al.,2016), STREAK (Elenberg
et al.,2017) and SHAP (Lundberg & Lee,2016). Each of
these approaches answers the question “Why this output?”
in some way by providing a subset of features or an or-
dered list of all features, either visualized or structured in a
text template. However, when humans answer such ques-
tions to each other they tend to limit their explanations to
a few vital points (Pacer & Lombrozo,2017). This human
tendency for simplicity also shows in iML: when multiple
explanations hold we should pick the simplest explanation
that is consistent with the data (Huysmans et al.,2011).
The mentioned approaches do this by either thresholding
the contribution parameter to a fixed value, presenting the
entire ordered list or by applying it only to low-dimensional
data.
41
arXiv:1806.07470v1 [stat.ML] 19 Jun 2018
Contrastive Explanations with Local Foil Trees
This study offers a more human-like way of limiting
the list of contributing features by setting a contrast be-
tween two outputs. The proposed contrastive explanations
present only the information that causes some data point
to be classified as some class instead of another (Miller
et al.,2017). Recently, Dhurandhar et al. (2018) have
proposed constructing explanations by finding contrastive
perturbations—minimal changes required to change the
current classification to any arbitrary other class. Instead,
our approach creates contrastive targeted explanations by
first defining the output of interest. In other words, our con-
trastive explanations answer the question “Why this output
instead of that output?”. The contrast is made between the
fact, the given output, and the foil, the output of interest.
A relative straightforward way to construct contrastive ex-
planations given a foil based on feature contributions, is to
compare the two ordered feature lists and see how much
some feature differs in their ranking. However, a feature
may have the same rank in both ordered lists but can be
used in entirely different ways for the fact and foil classes.
To mitigate this problem we propose a more meaningful
comparison based on how a feature is used to distinct the
foil from the fact. We train an arbitrary model to distin-
guish between fact and foil that is more accessible. From
that model we distill two sets of rules; one used to identify
data points as a fact and the other to identify data points as
a foil. Given these two sets, we subtract the factual rule set
from the foil rule set. This relative complement of the fact
rules in the foil rules is used to construct our contrastive
explanation. See Figure 1for an illustration.
Figure 1. This figure shows the general idea of our approach to
contrastive explanations. Given a set of rules that define data
points as either the fact or foil, we take the relative complement
of the fact rules in the foil rules to obtain a description how the
foil differs from the fact in terms of features.
The method we propose in this study obtains this comple-
ment by training a one-versus-all decision tree to recognize
the foil class. We refer to this decision tree as the Foil Tree.
Next, we identify the fact-leaf—the leaf in which the cur-
rent questioned data point resides. Followed by identifying
the foil-leaf, which is obtained by searching the tree with
some strategy. Currently our strategy is simply to choose
the closest leaf to the fact-leaf that classifies data points as
the foil class. The complement is then the set of decision
nodes (representing rules) that are a parent of the foil-leaf
but not of the fact-leaf. Rules that overlap are merged to
obtain a minimum coverage rule set. The rules are then
used to construct our explanation. The method is discussed
in more detail in section 2. An example of its usage is dis-
cussed in section 3on three benchmark classification tasks.
The validation on these three tasks shows that the proposed
method constructs shorter explanations than the fully fea-
ture list, provide more information of how these features
contribute and that this contribution matches the underly-
ing model closely.
2. Foil Trees; a way for obtaining contrastive
explanations
The method we propose learns a decision tree centred
around any questioned data point. The decision tree is
trained to locally distinguish the foil-class from any other
class, including the fact class. Its training occurs on data
points that can either be generated or sampled from an
existing data set, each labeled with predictions from the
model it aims to explain. As such, our method is model-
agnostic. Similar to LIME (Ribeiro et al.,2016), the sample
weights of each generated or sampled data point depend on
its similarity to the data point in question. Samples in the
vicinity of the questioned data point receive higher weights
in training the tree, ensuring its local faithfulness.
Given this tree, the ‘foil-tree’, we search for the leaf in
which the data point in question resides, the so called ‘fact-
leaf’. This gives us the set of rules that defines that data
point as the not-foil class according to the foil-tree. These
rules respect the decision boundary of the underlying ML
model as it is trained to mirror the foil class outputs. Next,
we use an arbitrary strategy to locate the ‘foil-leaf’—for
example the leaf that classifies data point as the foil class
with the lowest number of nodes between itself and the
fact-leaf. This results in two rule sets, whose relative com-
plement define how the data point in question differs from
the foil data points as classified by the foil-leaf. This ex-
planation of the difference is done in terms of the input
features themselves.
In summary, the proposed method goes through the follow-
ing steps to obtain a contrastive explanation for an arbitrary
ML model, the questioned data point and its output accord-
ing to that ML model:
1. Retrieve the fact; the output class.
2. Identify the foil; explicitly given in the question or
derived (e.g. second most likely class).
3. Generate or sample a local data set; either ran-
domly sampled from an existing data set, generated
42
Contrastive Explanations with Local Foil Trees
according to a normal distribution, generated based on
marginal distributions of feature values or more com-
plex methods.
4. Train a decision tree; with sample weights depending
on the training point’s proximity or similarity to the
data point in question.
5. Locate the ‘fact-leaf’; the leaf in which the data point
in question resides.
6. Locate a ‘foil-leaf’; we select the leaf that classifies
data points as part of the foil class with the lowest
number of decision nodes between it and the fact-leaf.
7. Compute differences; to obtain the two set of rules
that define the difference between fact- and foil-leaf,
all common parent decision nodes are removed from
each rule sets. From the decision nodes that remain,
those that regard the same feature are combined to
form a single literal.
8. Construct explanation; the actual presentation of the
differences between the fact-leaf and foil-leaf.
Figure 2illustrates the aforementioned steps. The search
for the appropriate foil-leaf in step 6 can vary. In Section
2.1 we discuss this more in detail. Finally, note that the
method is not symmetrical. There will be a different an-
swer on the question “Why class A and not B?” then on
“Why class B and not A?” as the foil-tree is trained in the
first case to identify class B and in the second case to iden-
tify class A. This is because we treat the foil as the ex-
pected class or the class of interest to which we compare
everything else. In addition, even if the trees are similar,
the relative complements of their rule sets are reversed
2.1. Foil-leaf strategies
Up to now we mentioned one strategy to find a foil-leaf,
however multiple strategies are possible—although not all
strategies may result in a satisfactory explanation according
to the user. The strategy used in this study is simply the
first leaf that is closest to the fact-leaf in terms of number
decision nodes, resulting in a minimal length explanation.
A disadvantage of this strategy is its ignorance towards the
value of the foil-leaf compared to the rest of the tree. The
nearest foil-leaf may be a leaf that classifies only a rela-
tively few data points or classifies them with a relatively
high error rate. To mitigate such issues the foil-leaf selec-
tion mechanism can be generalized to a graph-search from
a specific (fact) vertex to a different (foil) vertex while min-
imizing edge weights. The foil-tree is treated as a graph
whose decision node and leaf properties influence some
weight function. This generalization allows for a number
of strategies, and each may result in a different foil-leaf.
The strategy used in this preliminary study simply reduces
to each edge having a weight of one, resulting in the nearest
foil-leaf when minimizing the total weights.
As an example, an improved strategy may be where the
edge weights are based on the relative accuracy of a node
(based on its leaves) or leaf. Where a higher accuracy re-
sults in a lower weight, allowing the strategy to find more
distant, but more accurate, foil-leaves. This may result in
relatively more complex and longer explanations, which
nonetheless hold in more general cases. For example the
nearest foil-leaf may only classify a few data points accu-
rately, whereas a slightly more distant leaf classifies sig-
nificantly more data points accurately. Given the fact that
an explanation should be both accurate and fairly general,
this proposed strategy may be more beneficial (Craven &
Shavlik,1999).
Note that the proposed method assumes the knowledge of
the used foil. In all cases we take the second most likely
class as our foil. Although this may be an interesting foil it
may not be the contrast the user actually wants to make.
Either the user makes its foil explicit or we introduce a
feedback loop in the interaction that allows our approach
to learn which foil is asked for in which situations. We
leave this for future work.
3. Validation
The proposed method is validated on three benchmark clas-
sification tasks from the UCI Machine Learning Reposi-
tory (Dua & Karra Taniskidou,2017); the Iris data set, the
PIMA Indians Diabetes data set and the Cleveland Heart
Disease data set. The first data set is a well-known clas-
sification task of plants based on four flower leaf charac-
teristics with a size of 150 data points and three classes.
The second data set is a binary classification task whose
task is to correctly diagnose diabetes and contains 769 data
points and has nine features. The third data set is aims at
classifying the risk of heart disease from no presence (0)
to presence (1–4), consisting of 297 instances with 13 fea-
tures.
To show the model-agnostic nature of our proposed method
we applied four distinct classification models to each data
set: a random forest, logistic regression, support vector ma-
chine (SVM) and a neural network. Table 1shows for each
data set and classifier the F1score of the trained model.
We validated our approach on four measures; explanation
length, accuracy, fidelity and time. These measures for
evaluating iML decision rules are adapted from Craven &
Shavlik (1999), where the mean length serves as a proxy
measure demonstrating the relative explanation compre-
hensibility (Doshi-Velez & Kim,2017). The fidelity allows
us to state how well the tree explains the underlying model,
43
Contrastive Explanations with Local Foil Trees
Figure 2. The steps needed to define and train a Foil Tree and to use it to construct a contrastive explanation. Each step corresponds with
the listed steps in section 2.
and the accuracy tells us how well its explanations general-
ize to unseen data points. Below we describe each in detail:
1. Mean length; average length of the explanation in
terms of decision nodes. The ideal value is in the
range [1.0,Nr. features), since a length of 0means
that no explanation is found and a length near the
number of features offers little gain compared to
showing the entire ordered feature contribution list as
in other iML methods.
2. Accuracy;F1score of the foil-tree for its binary clas-
sification task on the test set compared to the true la-
bels. This measure indicates how general the expla-
nations generated from the Foil Tree are on an unseen
test set.
3. Fidelity;F1score of the foil-tree on the test set com-
pared to the model output. This measure provides a
quantitative value of how well the Foil Tree agrees
with the underlying classification model it tries to ex-
plain.
4. Time; number of seconds needed on average to ex-
plain a test data point.
Each measure is cross-validated three times to account for
randomness in foil-tree construction. These results are
shown in their respective columns in Table 1. They show
that on average the Foil Tree is able to provide concise ex-
planations, with a mean length 1.33, while accurately mim-
icking the decision boundaries used by the model with a
mean fidelity of 0.93 and generalizes well to unseen data
with a mean accuracy of 0.92. The foil-tree performs simi-
lar to the underlying ML model in terms of accuracy. Note
that for the random forest, logistic regression and SVM
models on the diabetes data set rules of length zero were
found—i.e. no explanatory differences were found be-
tween facts and foils in a number of cases—, resulting in
a mean length of less than one. For all other models our
method was able to find a difference for every questioned
data point.
To further illustrate the proposed method, below we present
a single explanation of two classes of the Iris data set in a
dialogue setting;
•System: The flowertype is ‘Setosa’.
•User: Why ‘Setosa’ and not ‘Versicolor’?
•System: Because for it to be ‘Versicolor’ the
‘petal width (cm)’ should be smaller and the
‘sepal width (cm)’ should be larger.
•User: How much smaller and larger?
•System: The ‘petal width (cm)’ should be
smaller than or equal to 0.8and the ‘sepal
width (cm)’ should be larger than 3.3.
The fact is the ‘Setosa’ class, the foil is the ‘Versicolor’
class and the total length of the explanation contains two
decision nodes or literals. The generation of this small di-
alogue is based on text templates and fixed interactions for
the user.
44
Contrastive Explanations with Local Foil Trees
Table 1. Performance of foil-tree explanations on the Iris, PIMA Indians Diabetes and Heart Disease classification tasks. The column
’Mean length’ also contains the total number of features for that data set as the upper bound of the explanation length.
DATA SET MOD EL F1SC OR E MEA N LE NG TH ACC UR ACY FIDELITY TIME
IRIS
RANDOM FOREST 0.93 1.94 (4) 0.96 0.97 0.014
LOGISTIC REGRESSION 0.93 1.50 (4) 0.89 0.96 0.007
SVM 0.93 1.37 (4) 0.89 0.92 0.010
NEURAL NET WORK 0.97 1.32 (4) 0.87 0.87 0.005
DIABETES
RANDOM FOREST 1.00 0.98 (9) 0.94 0.94 0.041
LOGISTIC REGRESSION 1.00 0.98 (9) 0.94 0.94 0.032
SVM 1.00 0.98 (9) 0.94 0.94 0.034
NEURAL NET WORK 1.00 1.66 (9) 0.99 0.99 0.009
HEA RT DIS EA SE
RANDOM FOREST 0.94 1.32 (13) 0.88 0.90 0.106
LOGISTIC REGRESSION 1.00 1.21 (13) 0.99 0.99 0.006
SVM 1.00 1.19 (13) 0.86 0.86 0.012
NEURAL NET WORK 1.00 1.56 (13) 0.92 0.92 0.009
4. Conclusion
Current developments in Interpretable Machine Learning
(iML) created new methods to answer “Why output A?”
for Machine Learning (ML) models. A large set of such
methods use the contributions of each feature used to clas-
sify A and then provides either a subset of feature whose
contribution is above a threshold, the entire ordered feature
list or simply apply it only to low-dimensional data.
This study proposes a novel method to reduce the number
of contributing features for a class by answering a contrast-
ing question of the form “Why output A (fact) instead of
output B (foil)?” for an arbitrary data point. This allows
us to construct an explanation in which only those features
play a role that distinguish A from B. Our approach finds
the contrastive explanation by taking the complement set of
decision rules that cause the classification of A in the rule
set of B. In this study we implemented this idea by training
a decision tree to distinguish between B and not-B (one-
versus-all approach). A fact-leaf is found in which the data
point in question resides. Also, a foil-leaf is selected ac-
cording to a strategy where all data points are classified as
the foil (output B). We then form the contrasting rules by
extracting the decision nodes in the sub-tree from the low-
est common ancestor between the fact-leaf and foil-leaf,
that hold for the foil-leaf but not for the fact-leaf. Overlap-
ping rules are merged and eventually used to construct an
explanation.
We introduced a simple and naive strategy of finding an ap-
propriate foil-leaf. We also provided an idea to extend this
method with more complex and accurate strategies, which
is part of our future work. We plan a user validation of
our explanations with non-experts in Machine Learning to
test the satisfaction of our explanations. In this study we
tested if the proposed method is viable on three different
benchmark tasks as well as to test its fidelity on different
underlying ML models to show its model-agnostic capac-
ity.
The results showed that for different classifiers our method
is able to offer concise explanations that accurately de-
scribe the decision boundaries of the model it explains.
As mentioned, our future work will consist out of extending
this preliminary method with more foil-leaf search strate-
gies as well as applying the method to more complex tasks
and validating its explanations with users. Furthermore, we
plan to extend the method with an adaptive foil-leaf search
to adapt explanations towards a specific user based on user
feedback.
References
Barocas, S. and Selbst, A. D. Big Data’s Disparate Impact.
Cal. L. Rev., 104:671, 2016.
Caliskan, A., Bryson, J. J., and Narayanan, A. Semantics
Derived Automatically from Language Corpora Con-
tain Human-Like Biases. Science, 356(6334):183–186,
2017.
Chakraborty, S., Tomsett, R., Raghavendra, R., Harborne,
D., Alzantot, M., Cerutti, F., Srivastava, M., Preece, A.,
Julier, S., Rao, R. M., Kelley, Troy D., Braines, D., Sen-
soy, M., Willis, C. J., and Gurram, P. Interpretability of
Deep Learning Models: A Survey of Results. In IEEE
Smart World Congr. DAIS - Work. Distrib. Anal. Infras-
truct. Algorithms Multi-Organization Fed. IEEE, 2017.
Coglianese, C. and Lehr, D. Regulating by Robot: Admin-
istrative Decision Making in the Machine-Learning Era.
Geo. LJ, 105:1147, 2016.
Craven, M. W. and Shavlik, J. W. Rule Extraction: Where
Do We Go from Here? Technical report, University of
Wisconsin Machine Learning Research Group, 1999.
45
Contrastive Explanations with Local Foil Trees
Crawford, K. Artificial Intelligence’s White Guy Problem.
The New York Times, 2016.
Datta, A., Sen, S., and Zick, Y. Algorithmic Transparency
via Quantitative Input Influence: Theory and Experi-
ments with Learning Systems. In Proc. 2016 IEEE Symp.
Secur. Priv. (SP 2016), pp. 598–617. IEEE, 2016. ISBN
9781509008247. doi: 10.1109/SP.2016.42.
Dhurandhar, Amit, Chen, Pin-Yu, Luss, Ronny, Tu, Chun-
Chen, Ting, Paishun, Shanmugam, Karthikeyan, and
Das, Payel. Explanations based on the Missing: To-
wards Contrastive Explanations with Pertinent Nega-
tives. arXiv preprint arXiv:1802.07623, 2018.
Doshi-Velez, F and Kim, B. Towards A Rigorous Sci-
ence of Interpretable Machine Learning. arXiv preprint
arXiv:1702.08608, 2017.
Dua, D. and Karra Taniskidou, E. UCI Machine Learn-
ing Repository, 2017. URL http://archive.ics.
uci.edu/ml.
Ehsan, U., Harrison, B., Chan, L., and Riedl, M. O. Ra-
tionalization: A Neural Machine Translation Approach
to Generating Natural Language Explanations. arXiv
preprint arXiv:1702.07826, 2017.
Elenberg, E. R., Dimakis, A. G., Feldman, M., and Karbasi,
A. Streaming Weak Submodularity: Interpreting Neural
Networks on the Fly. arXiv preprint arXiv:1703.02647,
2017.
Friedler, S. A., Scheidegger, C., Venkatasubramanian, S.,
Choudhary, S., Hamilton, E. P., and Roth, D. A Compar-
ative Study of Fairness-Enhancing Interventions in Ma-
chine Learning. arXiv preprint arXiv:1802.04422, 2018.
Guidotti, R., Monreale, A., Turini, F., Pedreschi, D., and
Giannotti, F. A Survey Of Methods For Explaining
Black Box Models. arXiv preprint arXiv:1802.01933,
2018.
Hein, D, Udluft, S, and Runkler, T. A. Interpretable Poli-
cies for Reinforcement Learning by Genetic Program-
ming. arXiv preprint arXiv:1712.04170, 2017.
Hendricks, L. A., Akata, Z., Rohrbach, M., Donahue, J.,
Schiele, B., and Darrell, T. Generating Visual Explana-
tions. In Eur. Conf. Comput. Vis., pp. 3–19, 2016. ISBN
9783319464923. doi: 10.1007/978-3-319-46493-0 1.
Herman, B. The Promise and Peril of Human Evaluation
for Model Interpretability. In Conf. Neural Inf. Process.
Syst., 2017.
Huysmans, J., Dejaeger, K., Mues, C., Vanthienen, J., and
Baesens, B. An Empirical Evaluation of the Comprehen-
sibility of Decision Table, Tree and Rule Based Predic-
tive Models. Decis. Support Syst., 51(1):141–154, 2011.
ISSN 01679236. doi: 10.1016/j.dss.2010.12.003.
Krishnan, R., Sivakumar, G., and Bhattacharya, P. Ex-
tracting Decision Trees From Trained Neural Networks.
Pattern Recognit., 32:1999–2009, 1999. doi: 10.1145/
775047.775113.
Kulesza, T., Stumpf, S., Wong, W.-K., Burnett, M. M., Per-
ona, S., Ko, A., and Oberst, I. Why-Oriented End-User
Debugging of Naive Bayes Text Classification. ACM
Trans. Interact. Intell. Syst. (TiiS), 1(1):2, 2011.
Kulesza, T., Burnett, M., Wong, W.-K., and Stumpf, S.
Principles of Explanatory Debugging to Personalize In-
teractive Machine Learning. In Proc. 20th Intl. Conf. on
Intell. User Interfaces, pp. 126–137. ACM, 2015.
Lei, T., Barzilay, R., and Jaakkola, T. Rationalizing Neu-
ral Predictions. arXiv preprint arXiv:1606.04155, 2016.
ISSN 9781450321389. doi: 10.1145/2939672.2939778.
Lipton, Z. C. The Mythos of Model Interpretability. In
2016 ICML Work. Hum. Interpret. Mach. Learn., 2016.
Lundberg, S. and Lee, S.-I. An Unexpected Unity Among
Methods for Interpreting Model Predictions. In 29th
Conf. Neural Inf. Process. Syst. (NIPS 2016), 2016.
Malioutov, D. M., Varshney, K. R., Emad, A., and Dash,
S. Learning Interpretable Classification Rules with
Boolean Compressed Sensing. Transparent Data Min.
Big Small Data. Stud. Big Data, 32, 2017. doi: 10.1007/
978-3-319-54024-5.
Miller, T., Howe, P., and Sonenberg, L. Explainable AI:
Beware of Inmates Running the Asylum. In Proc. Int. Jt.
Conf. Artif. Intell. (IJCAI), pp. 36–41, 2017.
Montavon, G., Lapuschkin, S., Binder, A., Samek, W.,
and M¨
uller, K. R. Explaining Nonlinear Classification
Decisions with Deep Taylor Decomposition. Pattern
Recognit., 65(C):211–222, 2017. ISSN 00313203. doi:
10.1016/j.patcog.2016.11.008.
Nguyen, A., Dosovitskiy, A., Yosinski, J., Brox, T., and
Clune, J. Synthesizing the Preferred Inputs for Neurons
in Neural Networks via Deep Generator Networks. Adv.
Neural Inf. Process. Syst., 29, 2016.
Pacer, M. and Lombrozo, T. Ockham’s Razor Cuts to the
Root: Simplicity in Causal Explanation. J. Exp. Psychol.
Gen., 146(12):1761–1780, 2017. ISSN 1556-5068. doi:
10.1037/xge0000318.
46
Contrastive Explanations with Local Foil Trees
Puri, N., Gupta, P., Agarwal, P., Verma, S., and Kr-
ishnamurthy, B. MAGIX: Model Agnostic Glob-
ally Interpretable Explanations. arXiv preprint
arXiv:1702.07160, 2017.
Ribeiro, M. T., Singh, S., and Guestrin, C. “Why Should I
Trust You?”: Explaining the Predictions of Any Classi-
fier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Dis-
cov. Data Min. (KDD’16), pp. 1135–1144, 2016. ISBN
9781450321389. doi: 10.1145/2939672.2939778.
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,
Parikh, D., and Batra, D. Grad-CAM: Visual Expla-
nations from Deep Networks via Gradient-based Local-
ization. In NIPS 2016 Work. Interpret. Mach. Learn.
Complex Syst., 2016. ISBN 9781538610329. doi:
10.1109/ICCV.2017.74.
Sundararajan, M., Taly, A., and Yan, Q. Axiomatic Attribu-
tion for Deep Networks. In Proc. 34th Int. Conf. Mach.
Learn. (ICML), 2017.
Thiagarajan, J. J., Kailkhura, B., Sattigeri, P., and Rama-
murthy, K. N. TreeView: Peeking into Deep Neural
Networks Via Feature-Space Partitioning. In NIPS 2016
Work. Interpret. Mach. Learn. Complex Syst., 2016.
Wang, T., Rudin, C., Velez-Doshi, F., Liu, Y., Klampfl,
E., and Macneille, P. Bayesian Rule Sets for Inter-
pretable Classification. In Proc. IEEE Int. Conf. Data
Min. (ICDM), pp. 1269–1274. IEEE, 2017. ISBN
9781509054725. doi: 10.1109/ICDM.2016.130.
Zhang, J., Bargal, S. A., Lin, Z., Brandt, J., Shen, X., and
Sclaroff, S. Top-Down Neural Attention by Excitation
Backprop. Int. J. Comput. Vis., pp. 1–19, 2017. ISSN
15731405. doi: 10.1007/s11263-017-1059-x.
Zhou, Y. and Hooker, G. Interpreting Models via Single
Tree Approximation. arXiv preprint arXiv:1610.09036,
2016.
47