XAI Method Properties: A (Meta-)study?
Gesina Schwalbe1,2[0000−0003−2690−2478] and Bettina Finzel2[0000−0002−9415−6254]
1Continental AG, Regensburg, Germany
2Cognitive Systems Group, University of Bamberg, Germany
Abstract. In the meantime, a wide variety of terminologies, motiva-
tions, approaches and evaluation criteria have been developed within
the scope of research on explainable artiﬁcial intelligence (XAI). Many
taxonomies can be found in the literature, each with a diﬀerent focus,
but also showing many points of overlap. In this paper, we summarize
the most cited and current taxonomies in a meta-analysis in order to
highlight the essential aspects of the state-of-the-art in XAI. We also
present and add terminologies as well as concepts from a large number
of survey articles on the topic. Last but not least, we illustrate concepts
from the higher-level taxonomy with more than 50 example methods,
which we categorize accordingly, thus providing a wide-ranging overview
of aspects of XAI and paving the way for use case-appropriate as well as
context-speciﬁc subsequent research.
Keywords: Explainable Artiﬁcial Intelligence ·Taxonomy ·Meta-Analysis
Machine learning models oﬀer the great beneﬁt that they can deal with hardly
speciﬁable problems as long as these can be exempliﬁed by data samples. This
has opened up a lot of opportunities for promising automation and assistance
systems, like highly automated driving, medical assistance systems, text sum-
maries and question-answer systems, just to name a few. However, many types of
models that are automatically learned from data will not only exhibit high per-
formance, but also be black-box, hiding information on the learning progress,
internal representation, and ﬁnal processing in a format not or hardly inter-
pretable by humans.
There are now diverse use-case speciﬁc motivations for allowing humans to
understand a given software component, i.e. to build up a mental model ap-
proximating the algorithm in a certain way. This starts with legal reasons, like
?The research leading to these results is funded by the BMBF ML-3 project Trans-
parent Medical Expert Companion (TraMeExCo), FKZ 01IS18056 B, 2018–2021,
and by the German Federal Ministry for Economic Aﬀairs and Energy within the
project “KI Wissen – Automotive AI powered by Knowledge”. The authors would
like to thank the consortium for the successful cooperation.
arXiv:2105.07190v1 [cs.LG] 15 May 2021
2 G. Schwalbe and B. Finzel
the General Data Protection Regulation  adopted by the European Union in
recent years. Another example are domain speciﬁc standards, like the functional
safety standard ISO 26262  requiring assessibility of software components in
safety critical systems, which is detailed to a requirement for explainability of
machine learning based components in the ISO/TR 4804  draft standard.
Many further reasons of public interest like fairness or security, and business
interests like ease of debugging, knowledge retrieval, or appropriate user trust
have been identiﬁed [4,61]. This need to translate behavioral or internal aspects
of black-box algorithms into a human interpretable form gives rise to the broad
topic of explainable artiﬁcial intelligence (XAI).
In recent years, the topic of XAI methods has received an exponential boost
in research interest [4,64,1,121]. For practical application of XAI in human-AI
interaction systems, it is important to ensure a choice of XAI method(s) appro-
priate for the corresponding use-case. While thorough use-case analysis including
the main goal and derived requirements is one essential ingredient here , we
argue that a necessary foundation for choosing correct requirements is a complete
knowledge of the diﬀerent aspects (traits, properties) of XAI methods that may
inﬂuence their applicability. Well-known aspects are e.g. portability, so whether
the method requires access to the model internals or not, or locality, so whether
single predictions are explained or some global properties of the model to ex-
plain. As will become clear from our literature analysis in section 2, this only
just scratches the surface of application relevant aspects of XAI methods.
This paper aims to help for one practitioners seeking a categorization scheme
for choosing an appropriate XAI method for their use-case, and secondly re-
searchers in identifying desired combination of aspects that have not or little
been considered so far. For this, we provide a complete collection and a struc-
tured overview in the form of a taxonomy of XAI method aspects, together with
method examples for each aspect. The method aspects are obtained from an
extensive literature survey on categorization schemes for explainability and in-
terpretability methods, resulting in the ﬁrst meta-study on XAI surveys of our
knowledge. Other than similar work, we do not aim to provide a survey on XAI
methods, but rather gather the valuable work done so far into a good starting
point for in-depth understanding of sub-topics of XAI research, and research on
XAI methods themselves.
Our main contributions are:
–A detailed and complete taxonomy containing and structuring application
relevant XAI method aspects so far considered in literature (see Figure 2).
–A large collection of more than 50 surveys on XAI methods as starting
material for research on the topic (see Tables 1, 2, 3, 4). To our knowledge,
this represents the ﬁrst meta-study on XAI methods.
–A large and diverse collection of more than 50 XAI methods presented as
examples for the method aspects with a ﬁnal detailed categorization by main
method aspects (see Table 5).
The rest of the paper reads as follows: In the following subsections we ﬁrst
introduce in more depth some basic notions of explainable AI for readers less
XAI Method Properties: A (Meta-)study 3
familiar with the topic (subsection 1.1), and then give some details on our review
approach (subsection 1.2). The remainder of this work in section 2 then details
XAI method aspects and the suggested taxonomy thereof, following a procedural
approach. The aspects are each accompanied by illustrating examples, which are
ﬁnally summarized and sorted into the main aspects of the taxonomy in Table 5.
1.1 What is XAI?
In order to overcome the opaqueness and black-box character of end-to-end ma-
chine learning approaches, various methods for explainable artiﬁcial intelligence
(XAI) have been developed and applied during the last years. From the devel-
opment of new techniques emerged the usage of diﬀerent terms and concepts to
distinguish XAI methods.
In the following we shortly introduce important terms and concepts in order
to give the reader, who may not be familiar with XAI, a general glance on this
ﬁeld. We are not aiming for presenting a comprehensive summary and refer to
state-of-the-art surveys instead.
The concept of XAI exists already since many decades, but it was only against
the background of increasing demand for trustworthy and transparent machine
learning () that the term XAI was introduced to the research community in
2017 by the Defense Advanced Research Projects Agency (DARPA), see .
According to DARPA, XAI eﬀorts aim for two main goals. The ﬁrst one is to
create machine learning techniques that produce models that can be explained
(their decision-making process as well as the output), while maintaining a high
level of learning performance. It further should convey a user-centric approach,
to enable humans to understand their artiﬁcial counterparts. As a consequence,
XAI aims for increasing the trust in learned models and to allow for an eﬃcient
partnership between human and artiﬁcial agents ().
In order to reach the ﬁrst goal, DARPA proposes three strategies: deep ex-
planation, interpretable models and model induction.
Deep explanation refers to combining deep learning with other methods
in order to create hybrid systems that produce richer representations of what a
deep neural network has learned and that enable to extract underlying semantic
Interpretable models are deﬁned as techniques that learn more structured
representations or that allow for tracing causal relationships.
The strategy of model induction summarizes techniques, which are used to
infer an approximate explainable model by observing the input-output behaviour
of a model that is explained.
The second, more user-centric, goal requires a highly inter-disciplinary per-
spective, based on ﬁelds such as computer science, social sciences as well as psy-
chology in order to produce more explainable models, suitable explanation
interfaces and to communicate explanations eﬀectively under considera-
tion of psychological aspects.
According to  literature on user-centric perspectives describes aspects
that add up to a process involving recognition, understanding and explicabil-
4 G. Schwalbe and B. Finzel
ity/explainability as well as interpretability. The output of such a process is
local interpretability for explainable artiﬁcial intelligence methods and global in-
terpretability for interpretable machine learning. Systems that follow both tracks
of interpretability are called comprehensible artiﬁcial intelligence (see ). An
overview is given in Fig. 1 which was taken from ).
Understand Explain Interpret
explanandum: ML Model
explanandum: ML Results
(focus of iML)
(focus of xAI)
intrinsic understanding, global explanations
ex post understanding, local explanations
actions / decision
actions / decision
Fig. 1. A framework for comprehensible artiﬁcial intelligence 
According to the authors, understanding is described as the ability to recog-
nize correlations, the context, and is a necessary precondition for explanations.
Explaining can take place for two reasons: explicability or explainability. Ex-
plicability refers to making properties of a model transparent. For example in
 a visual explanation method is applied to make explicit on which regions in
an image a deep neural net focused to recognize distinct facial expressions. The
analysis shows that in some cases the neural net looked at the background rather
than the face for certain facial expressions and human participants. Through ex-
plicability domain experts can extract information and verify results. Explain-
ability goes one step further and aims for comprehensibility. This means that
the reasoning, the model or the evidence for a result can be explained such that
the context can be understood by a human. The ultimate goal of such systems
would be, to reach ultra-strong machine learning, where machine learning would
help humans to improve in their tasks. For example  examined the com-
prehensibility of programs learned with Inductive Logic Programming and 
showed that comprehensibility of such programs could help laymen to under-
stand how and why a certain prediction was made. Both, understanding and
explainability can be seen as necessary preconditions to fulﬁll interpretability.
Interpretability can be reached on two diﬀerent levels: globally and locally. While
the global perspective explains the model and its logic as a whole (”How was the
conclusion derived?”), local approaches aim for explaining individual decisions
or predictions (”Why was this example classiﬁed as a car?”), independent of
the model’s internal structure (). According to the literature, transparent and
comprehensible artiﬁcial intelligence relies on interpretability on one hand, and
XAI Method Properties: A (Meta-)study 5
interactivity on the other hand . Especially correctability is an enabler of
understanding the internal working of a model and provides methods to model
adaption (see Interactivity in section 2.2).
By combining diﬀerent algorithmic approaches as well as by providing multi-
modal explanations, various types of users and input data and divers use cases
can be satisﬁed. Thus, diﬀerent perspectives need to be taken into account. We
therefore aim to provide an extensive overview on the topic of XAI in this paper
and want to provide a starting point for further research on ﬁnding methods
appropriate to speciﬁc use cases and contexts. Our approach on how we searched
for relevant work is described in the next subsection, followed by the chapter,
where we present our proposed taxonomy.
One goal of this paper is to provide a complete overview of relevant aspects or
properties of XAI methods. In order to achieve this, a systematic and broad
literature analysis was conducted on papers in the time range of 2010 to 2021.
Search: Work on XAI taxonomies Firstly, we identiﬁed common terms associated
directly with XAI taxonomies (for abbreviations both the abbreviation and the
full expression must be considered):
–machine learning terms: AI, DNN, Deep Learning, ML
–explainability terms: XAI, explain, interpret
–terms associated with taxonomies: taxonomy, framework, toolbox, guide
We then collected google scholar search results for combinations of these terms.
Most notably, we considered the search phrases “explain AI taxonomy” (more
than 30 pages of results), “XAI taxonomy toolbox guide” (8 pages of results),
“explainable AI taxonomy toolbox guide” (more than 30 pages of results), and
the combination of all search terms “explain interpret AI artiﬁcial intelligence
DNN Deep Learning ML machine learning taxonomy framework toolbox guide
XAI” (2 pages of results). For each search, the ﬁrst 30 pages of results were
scanned for the title, promising results then were scanned for the abstract.
Search: General XAI Surveys In another iteration we collected search results for
XAI surveys not necessarily proposing, but possibly implicitly using, a taxonomy.
For this, we again conducted a search, now for the more general terms “XAI”
and “XAI survey”, which again were scanned ﬁrst by title, then by abstract.
This resulted in a similar number of ﬁnally chosen and in-depth assessed papers
as the taxonomy search (not counting duplicate search results).
Results The search resulted in over 50 surveys on XAI, most of them from
the years 2018 to 2021, that were analysed for XAI method aspects, taxonomy
structuring proposals, and suitable example methods for each aspect. A selection
of surveys is shown in Tables 1,2, 3, 4. For the selection, ﬁrst the citation count
6 G. Schwalbe and B. Finzel
of the surveys was collected from the popular citation databases google scholar3,
semantic scholar4, opencitations5, and NASA ADS6. The highest result (mostly
google scholar) was chosen for comparison. Finally, we used as selection criteria
the citation score per year, the recency, and whether the speciﬁcity, so surveys
focusing on a concrete sub-topic of explainability.
To exemplify the aspects of our proposed taxonomy, we selected again more
than 50 concrete XAI methods that are reviewed in example sections for the
corresponding XAI aspects. The selection focused on high diversity and recency
of the methods, in order to establish a broad view on the XAI topic. Finally, each
of the methods were analysed on main taxonomy aspects, which is summarized in
the overview table Table 5. Also some examples of larger toolboxes are collected
in Table 4.
Table 1. Selected surveys on XAI methods with a broad focus.
General XAI method collections
 Linardatos et al. 2021 Extensive survey on XAI methods with code and toolbox
 Islam et al. 2021 Shallow taxonomy with some example methods explained for each
aspect, a short meta-study of XAI surveys, and a collection of
future perspectives for XAI
 Mueller et al. 2021 Design principles and survey on metrics for XAI systems
 Zhou et al. 2021 Detailed review on XAI metrics with a shallow taxonomy both for
methods and metrics
 Molnar 2020 Book on interpretability methods including details on many
transparent and many model-agnostic methods
 Arrieta et al. 2020 Extensive and diverse XAI method collection for responsible AI
 Xie et al. 2020 Introduction to XAI with wide variety of examples of standard
 Das & Rad 2020 Presents a review and taxonomy for local and global explanations
based on backpropagation and pertubation-based methods
(model-speciﬁc versus model-agnostic)
 Vilone & Longo 2020 Extensive survey on methods, with overview tables mapping
methods to (few) properties
 Benchekroun et al. 2020 Presents a preliminary taxonomy that includes pre-modelling
explainability as an approach to link knowledge about data with
knowledge about the used model and its results; motivates
 Carvalho et al. 2019 extensive collection of diﬀerent XAI aspects, especially metrics,
with some examples
 Gunning et al. 2019 Very short low-level introduction to XAI and open research
 Adadi & Berrada 2018 quite extensive literature survey of 381 papers related to XAI
 Gilpin et al. 2018 Extensive survey on diﬀerent kinds of XAI methods including rule
extraction and references to further more specialized surveys
XAI Method Properties: A (Meta-)study 7
Table 2. Selected meta-studies on the topic of explainable AI.
 Langer et al. 2021 Reviews XAI with respect to stakeholder understanding and
satisfaction of stakeholder desiderata; discusses the context of
 Chromik & Sch¨ußler 2020 Rigorous taxonomy development for XAI methods from an HCI
 Ferreira et al. 2020 Survey that refers to further XAI surveys and presents a
taxonomy of XAI among computer science and
 Mueller et al. 2019 Detailed DARPA report respectively meta-study on
state-of-the-literature on XAI including a detailed list of XAI
method aspects and metrics (Chap. 7, 8)
 Lipton 2018 Small generic XAI taxonomy with discussion on desiderata for
 Doshi-Velez & Kim 2017 collection of latent dimensions of interpretability with
recommendations on how to choose and evaluate a method
In this section, a taxonomy of XAI methods is established by selecting key dimen-
sions for their classiﬁcation. The categorization is done in a procedural manner:
One usually should start with the problem deﬁnition (subsection 2.1), before
detailing the explanator properties (subsection 2.2). Lastly, we discuss diﬀerent
metrics (subsection 2.3) that can be applied to explanation systems. For an
overview of the taxonomy, see Figure 2. The presented aspects are illustrated
by selected example methods (marked in gray). The selection is by no means
complete but rather should give an impression about the wide range of XAI
methods and how to apply our taxonomy to both some well-known and less
known but interesting methods. An overview over valuable further sources is
given in Tables 1, 2, 3, 4.
In the following we will be using as nomenclature:
Explanandum (what is to be explained) The complete oracle to be explained.
This usually encompasses a model (e.g. a deep neural network), which may
or may not encompass the actual object of explanation.
Explanator (the one who explains) This is the system component providing
Explainee (the one to whom is explained) This is the receiver of the expla-
nations. Note that this often but not necessarily is a human. Explanations
may also be used e.g. in multi-agent systems for communication between the
agents and without a human in the loop in most of the information exchange
Human-AI system A system containing both algorithmic components and a
human actor that have to cooperate for achieving a goal. We here consider
in speciﬁc explanation systems, i.e. such human-AI systems in which the
8 G. Schwalbe and B. Finzel
Table 3. Selected domain speciﬁc surveys on XAI methods.
Domain speciﬁc XAI surveys
 Heuillet et al. 2021 Survey on XAI methods for reinforcement learning
 Zhang & Chen 2020 Survey on recommendation systems with good overview on models
 Tjoa & Guan 2020 Survey with focus on medical XAI, sorting methods into a shallow
 Cropper et al. 2020 Survey on inductive logic programming methods for constructing
rule-based transparent models
 Singh et al. 2020 Survey and taxonomy of XAI methods for image classiﬁcation with
focus on medical applications
 Danilevsky et al. 2020 Survey on XAI methods for natural language processing
 Calegari et al. 2020 Overview of the main symbolic/sub-symbolic integration techniques
 Baniecki & Biecek 2020 Presents challenges in explanation, traits to overcome these as well
as a taxonomy for interactive explanatory model analysis
 Li et al. 2020 Review of state-of-the art metrics to evaluate explanation methods
and experimental assessment of performance of recent explanation
 Puiutta & Veith 2020 Review with short taxonomy on XAI methods for reinforcement
 Samek et al. 2019 Short introductory survey on visual explainable AI
 Guidotti et al. 2018 Review of model-agnostic XAI methods with a fo cus on XAI for
 Zhang & Zhu 2018 Survey on visual XAI methods for convolutional networks
 Nunes & Jannach 2017 Very detailed taxonomy of XAI for recommendation systems (see
 Hailesilassie 2016 Review on rule extraction methods
Table 4. Examples of XAI toolboxes. For a more detailed collection we refer the reader
to  and .
Examples of XAI toolboxes
 Spinner et al. 2020 explAIner toolbox
 Alber et al. 2019 Interface and reference implementation for some standard saliency map
 Arya et al. 2019 IBM AI explainability 360 toolbox with 8 diverse XAI methods
 Nori et al. 2019 Microsoft toolbox InterpretML with 5 model-agnostic and 4 transparent
XAI Method Properties: A (Meta-)study 9
cooperation involves explanations about an algorithmic part of the system
(the explanandum) by an explanator to the human interaction partner (the
explainee) resulting in an action of the human.
2.1 Problem Deﬁnition
The following aspects consider the concretion of the explainability problem. The
ﬁrst step in general should be to determine the use-case speciﬁc requirements
for aspects of the explanator (see subsection 2.2), and possibly targeted met-
ric values (see subsection 2.3). This should be motivated by the actual goal or
desiderata of the explanation, which can be e.g. veriﬁability of properties like
fairness, safety, and security, knowledge discovery, promotion of user adoption
respectively trust, or many more. An extensive list of desiderata can be found in
. Next, when the requirements are deﬁned, the task that is to be explained
must be clear, and the solution used for the task, meaning the type of explanan-
dum. For explainability purposes the level of model transparency is the relevant
Task XAI methods out-of-the-box usually only apply to a speciﬁc set of task
types of the to-be-explained model, and input data types. For white-box methods
that access model internals, additional constraints may hold for the architecture
of the model (cf. portability aspect in ).
Task type Typical task categories are unsupervised clustering (clu), regression,
classiﬁcation (cls), detection (det), semantic segmentation which is pixel-wise
classiﬁcation, or instance segmentation. Many XAI methods targeting a ques-
tion for classiﬁcation, e.g. “Why this class?”, can be extended to det, seg, and
temporal resolution via snippeting of the new dimensions: “Why this class in this
spatial/temporal snippet?”. It must be noted that XAI methods working on clas-
siﬁers often require access to a continuous classiﬁcation score prediction instead
of the ﬁnal discrete classiﬁcation. Such methods can also be used on regres-
sion tasks to answer questions about local trends, i.e. “Why does the prediction
tend into this direction?”. Examples of regression predictions are bounding box
dimensions in object detection.
Examples RISE  (Randomized Input Sampling for Explanation) is a RISE 
model-agnostic attribution analysis method specialized on image data. It pro-
duces heatmaps for visualization by randomly dimming super-pixels of an input
image to ﬁnd those which have the greatest inﬂuence on the local class conﬁdence
when deleted. The extension D-RISE  to object detection considers not a one- D-RISE 
dimensional but the total prediction vector for change measurement. Other than ILP 
these local image-bound explanation methods, surrogate models produced using
inductive logic programming (ILP)  require the binary classiﬁcation output
of a model. ILP frameworks require background knowledge (logical theory) as
input together with positive and negative examples. From this, a logic program
10 G. Schwalbe and B. Finzel
Object of explanation
(layers, units, vectors)
(decision boundary, feature
uncertainty Functionally grounded
fidelity / soundness
completeness / coverage
algorithmic complexity /
indicativeness (for certainty,
bias, feature importance,
(1) Problem definition:
Specifying the explanation.
semantic / instance
tabular (numerical, categorical,
text (natural or formal)
images, point clouds
simulatable (size- or
General linear model
General additive model
Finite state automata
partly to fully
local (why?), on
single or group of
mere conjunction vs.
boolean logic vs.
with or without
Input (2) Explanator:
Generating the explanation.
Evaluating the explanation.
effectiveness / quality
of mental model
degree of understanding
improvement of human
improvement of human-
AI system performance
number of iterations
data (see task)
user task in human-AI
system (e.g. role capability,
Level of abstraction
Explanator output type
by example (e.g. closest other
samples, word cloud)
contrastive / counterfactual / near
miss (e.g. adversarial ex.)
prototype (e.g. generated, concept
rule based (e.g. if-then, binary, m-
Fig. 2. Overview of the taxonomy presented in section 2
XAI Method Properties: A (Meta-)study 11
in the form of ﬁrst-order rules is learned covering as many of the samples as pos-
sible. An example of an ILP surrogate model method is CA-ILP  (Concept CA-ILP 
Analysis for ILP): In order to explain parts of a convolutional image classiﬁer
with logical rules, they ﬁrst learn global extractors for symbolic features which
are then used for training an ILP surrogate model. Clustering tasks can often be
explained by providing examples or prototypes of the ﬁnal clusters, which will
be discussed in subsection 2.2.
Input data type Not every XAI method supports every input and output signal
type, also called data type . One input type is tabular (symbolic) data, which
encompasses numerical, categorical, binary, and ordinary (ordered) data. Other
symbolic input type are natural language or graphs, and non-symbolic types are
images and point clouds (with or without temporal resolution), and audio.
Examples Typical examples for image explanations are methods producing LIME 
heatmaps highlighting parts of the image that were relevant for (a part of) the
decision. This highlighting of input snippets can also be applied to textual inputs
where single words or sentence parts can be snippets. A prominent example of
heatmapping both applicable to images and text inputs is the model-agnostic
LIME  method (Local Interpretable Model-agnostic Explanations): It learns
as local approximation a linear model on feature snippets of the input. For
training, randomly selected snippets are removed. For textual inputs the snippets
are words, for images they are super-pixels and blackened for removal. While
LIME is suitable for image or textual input data,  provides a broad overview
on model-agnostic XAI methods for tabular data.
Model transparency A model is considered to be transparent if its function is
understandable without need for further explanation . To obtain an explain-
able model, one can either post-hoc ﬁnd a transparent surrogate model from
which to derive the explanation (without changing the trained model), design
the model to include self-explanations as additional output, or start with an
intrinsically transparent model or blended,i.e. partly transparent, model from
the beginning. Many examples for post-hoc methods are given later on, details
on the other transparency types can be found below.
Intrisic transparency As introduced in , one can further diﬀerentiate between
diﬀerent levels of transparency: The model can directly be adapted as mental
model by a human (simulatable ), or it can be split up into parts each of
which is simulatable (decomposable ). Simulatability can either be measured
based on the size of the model, or the needed length of computation. As a third
category, algorithmic transparency is considered, which means the model can be
mathematically investigated, e.g. the shape of the error surface is known.
Examples The following models are considered inherently transparent in the
literature (cf. [70, Chap. 4], [38, Sec. 5], )
–Diagrams (cf. )
12 G. Schwalbe and B. Finzel
–Decision rules: This encompasses boolean rules as can be extracted from
decision trees, or fuzzy or ﬁrst-order logic rules. For further insights in in-
ductive logic programming approaches to ﬁnd the latter kind of rules see e.g.
the recent survey .
–Linear and logistic models
–Support vector machines
–General linear models (GLM): Here it is assumed that there is a linear rela-
tionship between the input features and the expected output value when this
is transformed by a given transformation. For example, in logistic regression,
the transformation is the logit. See e.g. [70, Sec. 4.3] for a basic introduction
and further references.
–General additive models (GAM): It is assumed that the expected output
value is the sum of transformed features. See the survey  for more details
and further references. One concrete example of general additive models is
Explainer  the Additive Model Explainer . They train predictors for a given set
of features, and another small DNN predicting the additive weights for the
feature predictors. They use this setup to learn a GAM surrogate models for
a DNN, which also provides a prior to the weights: They should correspond
to the sensitivity of the DNN with respect to the features.
–Graphs (cf. )
–Finite state automata
–Simple clustering approaches, e.g. k-means clustering: The standard k-means
clustering  clustering method  works with an intuitive model, simply consisting of k
prototypes and a proximity measure, with inference associating new samples
to the closest prototype representing a cluster. As long as the proximity
measure is not too complex, this method can be regarded as an unsupervised
inherently interpretable model.
Blended models Blended models consist partly of intrinsically transparent, sym-
bolic models, that are integrated in sub-symbolic non-transparent ones. These
kind of hybrid models are especially interesting for neuro-symbolic computing
and similar ﬁelds combining symbolic with sub-symbolic models .
Examples An example of a blended model are Logic Tensor Networks .
Logic Tensor Nets  Their idea is to use fuzzy logic to encode logical constraints on DNN outputs,
with a DNN acting as fuzzy logic predicate. The framework in  allows addi-
tionally to learn semantic relations subject to symbolic fuzzy logic constraints.
The relations are represented by simple linear models. Unsupervised deep learn-
ing can be made interpretable by approaches such as combining autoencoders
with visualization approaches or by explaining choices of “neuralized” clustering
methods  (i.e. clustering models translated to a DNN) with saliency maps.
Enhancing an autoencoder was applied for example in the FoldingNet  ar-
chitecture on point clouds. There, a folding-based decoder allows to view the
reconstruction of point clouds, namely the warping from a 2D grid into the point
XAI Method Properties: A (Meta-)study 13
cloud surface. A saliency based solution can be produced by algorithms such as
layer-wise relevance propagation which will be discussed in later examples.
Self-explaining models Self-explaining models provide additional outputs that
explain the output of a single prediction. According to , there are three
standard types of outputs of explanation generating models: attention maps,
disentangled representations, and textual or multi-modal explanations.
Attention maps These are heatmaps that highlight relevant parts of a given
single input for the respective output.
Examples The work in  adds an attention module to a DNN that is
processed in parallel to, and later multiplied with, convolutional outputs.
Furthermore, they suggest a clustering-based post-processing of the attention
maps to highlight most meaningful parts.
Disentangled representations means that single or groups of dimensions in
the intermediate output of the explanandum directly represent symbolic
(also called semantic) concepts.
Examples One can by design force one layer of a DNN to exhibit a disentan-
gled representation. One example are capsule networks , that structure Capsule Nets 
the network not by neurons but into groups of neurons, the capsules, that
characterize each one entity, e.g. an object or object part. The length of a
capsule vector is interpreted as the probability that the corresponding object
is present, while the rotation encodes properties of the object (e.g. rotation or
color). Later capsules get as input the weighted sum of transformed previous
capsule outputs, with the transformations learned and the weights obtained
in an iterative routing process. A simpler disentanglement than alignment of
semantic concepts with groups of neurons is alignment of single dimensions.
This is done e.g. in the ReNN  architecture. They explicitly modularize ReNN 
their DNN to ensure semantically meaningful intermediate outputs. Other Semantic
methods rather follow a post-hoc approach that ﬁne-tunes a trained DNN
towards more disentangled representations, like it is suggested for Seman-
tic Bottleneck Networks . These consist of the pretrained backbone of a
DNN, proceeded by a layer in which each dimension corresponds to a se-
mantic concept, called semantic bottleneck, and ﬁnalized by a newly trained
front DNN part. During ﬁne-tuning, ﬁrst the connections from the backend
to the semantic bottleneck are trained, then the parameters of the front
DNN. Another interesting ﬁne-tuning approach is that of concept whitening Concept
, which supplements batch-normalization layers with a linear transforma-
tion that learns to align semantic concepts with unit vectors of an activation
Textual or multi-model explanations provide the explainee with a direct
verbal or combined explanation that as part of the model output.
Examples An example are the explanations provided by  for the applica- 
tion of end-to-end steering control in autonomous driving. Their approach is
two-fold: They add a custom layer that produces attention heatmaps similar
to , and these are used by a second custom part to generate textual expla-
nations of the decision which are (weakly) aligned with the model processing.
14 G. Schwalbe and B. Finzel
ProtoPNet  for image classiﬁcation provides visual examples rather than
ProtoPNet  text. The network architecture is based on ﬁrst selecting prototypical im-
age patches, and then inserting a prototype layer that predicts similarity
scores for patches of an instance with prototypes. These can then be used
for explanation of the ﬁnal result in the manner of “This is a sparrow as
its beak looks like that of other sparrow examples”. A truly multi-modal
 example is , which trains alongside a classiﬁer a long-short term memory
DNN (LSTM) to generate natural language justiﬁcations of the classiﬁca-
tion. The LSTM uses both the intermediate features and predictions of the
image classiﬁer, and is trained towards high class discriminativeness of the
justiﬁcations. The explanations can optionally encompass bounding boxes
for features that were important for the classiﬁcation decision, making it
The aspects of the explanator encompass mathematical properties, like linearity
and monotonicity , requirements on the input, and properties of the output
and the explanation generation, more precisely the interactivity of that process.
Finally, we collect some mathematical constraints that can be desirable and
veriﬁed on an explanator.
Required Input The necessary inputs to the explanator may diﬀer amongst meth-
ods . While the explanandum, the model to explain, must usually provided
to the explanator, many methods do also require valid data samples, or even
user feedback (cf. section 2.2) or further situational context (cf.  for a more
detailed deﬁnition of context).
Portability An important practical aspect for post-hoc explanations is whether
or in how far the explanation method is dependent on access to internals of
the explanandum model. This level of dependency is called portability, translu-
cency, or transferability. In the following, we will not further diﬀerentiate be-
tween the strictness of requirements of model-speciﬁc methods. Transparent and
self-explaining models are always model-speciﬁc, as the interpretability requires
a special model type or model architecture (modiﬁcation). Higher levels of de-
model-agnostic also called pedagogical  or black-box means that only ac-
cess to model input and output is required.
Examples A prominent example of model-agnostic methods is the previously
discussed LIME  method for local approximation via a linear model.
Another method to ﬁnd feature importance weights without any access to
SHAP  model internals is SHAP  (SHapley Additive exPlanation). Their idea is
to axiomatically ensure: local ﬁdelity; features missing from the original input
XAI Method Properties: A (Meta-)study 15
have no eﬀect; an increase of a weight also means an increased attribution of
the feature to the ﬁnal output; and uniqueness of the weights. Just as LIME,
SHAP just requires a deﬁnition of “feature” or snippet on the input in order
to be applicable.
model-speciﬁc also called decompositional  or white-box means that ac-
cess is needed to the internal processing or architecture of the explanandum
model, or even constraints apply.
Examples Methods relying on gradient or relevance information for genera-
tion of visual attention maps are strictly model-speciﬁc. A grandient-based Sensitivity
method is Sensitivity Analysis . They pick the vector representing the
steepest ascend in the gradient tangential plane of a sample point. This
method is independent of the type of input features, but can only analyse
one one-dimensional output at once. Output-type-agnostic but dependent Deconvnet 
Guided Backprop 
on a convolutional architecture and image inputs is Deconvnet  and
its successors Backpropagation  and Guided Backpropagation . They
approximate a reconstruction of an input by deﬁning inverses of pool and
convolution operations, which allows to backpropagate the activation of sin-
gle ﬁlters back to input image pixels (see  for a good overview). The LRP 
idea of Backpropagation is generalized axiomatically by LRP  (Layer-
wise Relevance Propagation): They require that the sum of linear relevance
weights for each neuron in a layer should be constant throughout the layers
(relevance is neither created nor extinguished from layer to layer). Meth-
ods that achieve this are e.g. Taylor decomposition or the back-propagation
of relevance weighted by the forward-pass weights. The advancement Pat- PatternAttribution 
ternAttribution  fulﬁlls the additional constraint to be sound on linear
hybrid also called eclectic  or gray-box, means that the explanator only
depends on access to parts of the model intermediate output, but not the
Examples The rule extraction technique DeepRED  (Deep Rule Ex- DeepRED 
traction with Decision tree induction) is an example of an eclectic method,
so neither fully model-agnostic nor totally reliant on access to model inter-
nals. The approach conducts a backwards induction over the layer outputs
of a DNN, between each two applying a decision tree extraction. While they
enable rule extraction for arbitrarily deep DNNs, only small networks will
result in rules of decent length for explanations.
Explanation locality Literature diﬀerentiates between diﬀerent ranges of validity
of an explanation, respectively surrogate model. A surrogate model is valid in
the ranges where high ﬁdelity can be expected (see subsection 2.3). The range
of input required by the explanator depends on the targeted validity range,
so whether the input must representing a local or the global behavior of the
explanandum. The general locality types are:
Local means the explanator is valid in a neighborhood of one or a group of
given (valid) input samples. Local explanations tackle the question of why a
given decision for one or a group of examples was made.
16 G. Schwalbe and B. Finzel
Examples Heatmapping methods are typical examples for local-only ex-
planators, such as the discussed perturbation-based model-agnostic meth-
ods RISE , D-RISE , LIME , SHAP , as well as the model-
speciﬁc sensitivity and backpropagation based methods LRP , Patter-
nAttribution , Sensitivity Analysis , and Deconvnet and its succes-
Global means the explanator is valid in the complete (valid) input space. Other
than the why of local explanations, global interpretability can also be de-
scribed as answering how a decision is made.
Examples A graph-based global explanator is generated by . Their idea
Graphs  is that semantic concepts in an image usually consist of sub-objects to which
they have a constant relative spatial relation (e.g. a face has a nose in the
middle and two eyes next to each other), and that the localization of concepts
should not only rely on high ﬁlter activation patterns, but also on their sub-
part arrangement. To achieve this, they translate the convolutional layers of
a DNN into a tree of nodes (concepts), the explanatory graph. Each node
belongs to one ﬁlter, is anchored at a ﬁxed spatial position in the image,
and represents a spatial arrangement of its child notes. The graph can also
be used for local explanations via heatmaps: To localize a node in one input
image, it is assigned the position closest to its anchor for which its ﬁlter ac-
tivation is highest and for which the expected spatial relation to its children
is best fulﬁlled. While most visualization based methods provide only local
Visualization  visualizations, a global, prototype-based, visual explanation is provided by
Feature Visualizations . The goal here is to visualize the function of a
part of a DNN by ﬁnding prototypical input examples that strongly activate
that part. These can be found via picking, search, or optimization. Other
VIA  than visualizations, rule extraction methods usually only provide global ap-
proximations. An example is the well-known model-agnostic rule extractor
VIA  (Validity Interval Analysis), which iteratively reﬁnes or generalizes
pairs of input- and output-intervals. An example for getting from local to
SpRAy  global explanations is SpRAy  (Spectral Relevance Analysis). They sug-
gest to apply spectral clustering  to local feature attribution heatmaps
of a data samples in order to ﬁnd spuriously distinct global behavioral pat-
terns. The heatmaps were generated via LRP .
Output The output is characterized by several aspects: what is explained (the
object of explanation), how it is explained (the actual output type), and how it
Object of explanation The object (or scope ) of an explanation describes
which item of the development process should be explained. Items we identiﬁed
processing The objective is to understand the (symbolic) processing pipeline of
the model, i.e. to answer parts of the question “How does the model work?”.
XAI Method Properties: A (Meta-)study 17
This is the usual case for model-agnostic analysis methods. Types of pro-
cessing to describe are e.g. the decision boundary, and feature attribution (or
feature importance). Note that these are closely related, as highly important
features usually locally point out the direction to the decision boundary. In
case a symbolic explanator is targeted, one may need to ﬁrst ﬁnd a symbolic
representation of input, output, or the model internal representation. Note
that model-agnostic methods that do not investigate the input data usually
target explanations of the model processing.
Examples Feature attribution methods encompass all the discussed attribu-
tion heatmapping methods (e.g. RISE , LIME , LRP ). LIME can
be considered a corner case, as it both explains feature importance but also
tries to approximate the decision boundary using a linear model on super-
pixels, which can itself serve directly as an explanation. A typical way to
describe decision boundaries are decision trees or sets of rules, like extracted
by the discussed VIA , and DeepRED . Standard candidates for TREPAN 
model-agnostic decision tree extraction are TREPAN  for M-of-N rules
at the split points, and the C4.5  decision tree generator for shallower but C4.5 
wider trees with interval-based splitting points. Concept tree  is a recent Concept Tree 
extension of TREPAN that adds automatic grouping of correlated features
into the candidate concepts to use for the tree nodes.
inner representation Machine learning models learn new representations of
the input space, like the latent spaces representations found by DNNs. Ex-
plaining these inner representations answers “How does the model see the
world?”. A more ﬁne-grained diﬀerentiation considers whether layers,units,
or vectors in the feature space are explained.
–units: One example of unit analysis is the discussed Feature Visualiza-
tion . In contrast to this unsupervised assignment of convolutional NetDissect 
ﬁlters to prototypes, NetDissect  (Network Dissection) assigns ﬁlters
to pre-deﬁned semantic concepts in a supervised manner: For a ﬁlter,
that semantic concept (color, texture, material, object, or object part) is
selected for which the ground truth segmentation masks have the highest
overlap with the upsampled ﬁlter’s activations. The authors also suggest
that concepts that are less entangled, so less distributed over ﬁlters,
are more interpretable, which is measurable with their ﬁlter-to-concept-
–vectors: Other than NetDissect, Net2Vec  also wants to assign con- Net2Vec 
cepts to their possibly entangled representations in the latent space. For
a concept, they learn a linear 1 ×1-convolution on the output of a layer,
which segments the concept in an image. The weight vector of the linear
model for a concept can be understood as a prototypical representation
(embedding) for that concept in the DNN intermediate output. They
found that such embeddings behave like vectors in a word vector space:
Concepts that are semantically similar feature embeddings with high
cosine similarity. Similar to Net2Vec, TCAV  (Testing Concept Ac- TCAV 
tivation Vectors) also aims to ﬁnd embeddings of NetDissect concepts.
18 G. Schwalbe and B. Finzel
They are interested in embeddings that are represented as a linear com-
bination of convolutional ﬁlters, but in embedding vectors lying in the
space of the complete layer output. In other words, they do not segment
concepts but make an image-level classiﬁcation whether the concept is
present. These are found by using an SVM model instead of the 1 ×1-
convolution. Additionally, they suggest to use partial derivatives along
those concept vectors to ﬁnd the local attribution of a semantic con-
cept to a certain output. Other than the previous supervised methods,
ACE  ACE  (Automatic Concept-based Explanations). does not learn a lin-
ear classiﬁer but does an unsupervised clustering of concept candidates
in the latent space. The cluster center then is selected as embedding vec-
tor. A super-pixeling approach together with outlier removal are used to
obtain concept candidates.
–layers: The works of  and IIN  (invertible interpretation net-
works) extend on the previous approaches and analyse a complete layer
output space at once. For this, they ﬁnd a subspace with a basis of
concept embeddings, which allows an invertible transformation to a dis-
entangled representation space. While IIN use invertible DNNs for the
bijection of concept to latent space,  linear maps in their experi-
ments. These approaches can be seen as a post-hoc version of the Se-
mantic Bottleneck  architecture, only not replacing the complete later
part of the model, but just learning connections from the bottleneck to
the succeeding trained layer.  additionally introduces the notion of
completeness of a set of concepts as the maximum performance of the
model intercepted by the semantic bottleneck.
development (during training) Some methods focus on assessing eﬀects dur-
ing training [70, Sec. 2.3]: “How does the model evolve during the training?
What eﬀects do new samples have?”
Examples One example is the work of , who inspect the model during
 training to investigate the role of depth for neural networks. Their ﬁndings
indicate that depth actually is of computational beneﬁt. An example which
Functions  can be used to provide e.g. prototypical explanations are Inﬂuence Func-
tions . They gather the inﬂuence of training samples during the training
to later assess the total impact of samples to the training. They also suggest
to use this information as a proxy to estimate the inﬂuence of the samples
to model decisions.
uncertainty  Capture and explain (e.g. visualize) the uncertainty of a pre-
diction of the model. This encompasses the broad ﬁeld of Bayesian deep
learning  and uncertainty estimation . It is argued in e.g.  for
medical applications and in  for autonomous driving, why it is important
to make the uncertainty of model decisions accessible to users.
data Pre-model interpretability  is the point where explainability touches
the large research area of data analysis and feature mining.
Examples Typical examples for projecting high-dimensional data into easy-
PCA  to-visualize 2D space are component analysis methods like PCA (Princi-
pal Component Analysis) . A slightly more sophisticated approach is
XAI Method Properties: A (Meta-)study 19
t-SNE  (t-Distributed Stochastic Neighbor Embedding). In order to vi-
sualize a set of high-dimensional data points, they try to ﬁnd a map from
these points into a 2D or 3D space that is faithful on pairwise similarities.
And also clustering methods can be used to generate prototype or exam- spectral
ple based explanations of typical features in the data. Examples here are
k-means clustering  and the graph-based spectral clustering .
Output type The output type, also considered the actual explanator , de-
scribes the type of information presented to the explainee. Note that this (“what”
is shown) is mostly independent of the presentation form (“how” it is shown).
Typical types are:
by example instance e.g. closest other samples, word cloud
Examples The discussed ProtoPNet  is based on selecting and comparing
relevant example snippets from the input image data.
contrastive / counterfactual / near miss including adversarial examples
Examples The perturbation-based feature importance heatmapping ap- CEM 
proach of RISE is extended in CEM  (Contrastive, Black-box Expla-
nations Model). The do not only ﬁnd positively contributing features, but
also the features that must minimally be absent to not change the output.
prototype e.g. generated, concept vector
Examples A typical prototype generator is used in the discussed Feature Vi-
sualization  method: images are generated, e.g. via gradient descent, that
represent the prototypical pattern for activating a ﬁlter. While this consid-
ers prototypical inputs, concept embeddings as collected in TCAV  and
Net2Vec  describe prototypical activation patterns for a given seman-
tic concept. The concept mining approach ACE  combines prototypes
with examples: They search a concept embedding as prototype for an auto-
matically collected set of example patches, that can be used to explain the
Examples A lot of feature importance methods producing heatmaps have
been discussed before (e.g. RISE , D-RISE , CEM , LIME ,
SHAP , LRP , PatternAttribution , Sensitivity Analysis , De-
convnet and successors [115,95,99]). One further example is the work in , 
which follows a perturbation-based approach. Similar to RISE, their idea is
to ﬁnd a minimal occlusion mask that if used to perturb the image (e.g.
blur, noise, or blacken) maximally changes the outcome. To ﬁnd the mask,
backpropagation is used, making it a model-speciﬁc method. Some older CAM 
but popular and simpler example methods are Grad-CAM  and its pre-
decessor CAM  (Class Activation Mapping). While Deconvnet and its
successors can only consider the feature importance with respect to interme-
diate outputs, (Grad-)CAM produces class-speciﬁc heatmaps, which are the
weighted sum of the ﬁlter activation maps for one (usually the last) convolu-
tional layer. For CAM, it is assumed the convolutional backend is ﬁnalized by
a global average pooling layer that densely connects to the ﬁnal classiﬁcation
20 G. Schwalbe and B. Finzel
output. Here, the weights in the sum are the weights connecting the neurons
of the global average pooling layer to the class outputs. For Grad-CAM, the
weights in the sum are the averaged derivation of the class output by each
activation map pixel. This is also used in the more recent , who do not
Grad-CAM  apply Grad-CAM directly to the output but to each of a minimal set of
projections from a convolutional intermediate output of a DNN that predict
semantic concepts. Similar to Grad-CAM, SIDU  (Similarity Distance
SIDU  and Uniqueness) also adds up the ﬁlter-wise weighted activations of the last
convolutional layer. The weights encompass a combination of a similarity
score and a uniqueness score for the prediction output under each ﬁlter acti-
vation mask. The scores aim for high similarity of a masked predictions with
the original one and low similarity to the other masked prediction, leading
to masks capturing more complete and interesting object regions.
rule based e.g. decision tree; or if-then, binary, m-of-n, or hyperplane rules
Examples The mentioned exemplary rule-extraction methods DeepRED 
and VIA , as well as decision tree extractors TREPAN , Concept
Tree  and C4.5  all provide global, rule-based output. For further
rule extraction examples we refer the reader to the comprehensive surveys
[41,108,6] on the topic, and the survey  for recurrent DNNs. An exam-
LIME-Aleph  ple of a local rule-extractor is the recent LIME-Aleph  approach, which
generates a local explanation in the form of ﬁrst-order logic rules. This is
learned using inductive logic programming (ILP)  trained on the sym-
bolic knowledge about a set of semantically similar examples. Due to the
use of ILP, the approach is limited to tabular input data and classiﬁcation
outputs, but just as LIME it is model-agnostic. A similar approach is fol-
NBDT  lowed by NBDT  (Neural-Backed Decision Trees). They assume that the
concept embeddings of super-categories are represented by the mean of their
sub-category vectors (e.g. the mean of “cat” and “dog” should be “animal
with four legs”). This is used to infer from bottom-to-top a decision tree
where the nodes are super-categories and the leaves are the classiﬁcation
classes. At each node it is decided which of the sub-nodes best applies to the
image. As embedding for a leaf concept (an output classes) they suggest to
take the weights connecting the penultimate layer to a class output, and as
similarity measure for the categories they use dot-product (cf. Net2Vec and
dimension reduction i.e. sample points are projected to a sub-space
Examples Typical dimensionality reduction methods mentioned previously
are PCA  and t-SNE .
dependence plots plot the eﬀect of an input feature on the ﬁnal output of a
Examples PDP  (Partial Dependency Plots, cf. [70, sec. 5.1]) calculate
PDP  for one input feature and each vallue of this feature the expected model
outcome averaged over the dataset. This results in a plot (for elach output)
that indicates the global inﬂuence of the respective feature on the model.
The local equivalent, ICE  (Individual Conditional Expectation, cf. [70,
XAI Method Properties: A (Meta-)study 21
sec. 5.2]) plots, obtain the PDP for generated data samples locally around a
Examples The previously discussed Explanatory Graph  method pro-
vides amongst others a graph-based explanation output.
Presentation The presentation of information can be characterized by two cat-
egories of properties: the used presentation form, and the level of abstraction
used to present available information. The presentation form simply summarizes
the human sensory input channels utilized by the explanation, which can be: vi-
sual (the most common one including diagrams, graphs, and heatmaps), textual
in either natural language or formal form, auditive, and combinations thereof.
In the following the aspects inﬂuencing the level of abstraction are elaborated.
These can be split up into (1) aspects of the smallest building blocks of the ex-
planation, the information units, and (2) the accessibility or level of complexity
of their combinations (the information units). Lastly, further ﬁltering may be
applied before ﬁnally presenting the explanation, including privacy ﬁlters.
Information units The basic units of the explanation, cognitive chunks ,
or information unit, may diﬀer in the level or processing applied to them.
The simplest form are unprocessed raw features, as used in explanations
by example. Derived features capture some indirect information contained
in the raw inputs, like super-pixels or attention heatmaps. These need not
necessarily have a semantic meaning to the explainee, other than explicitly
semantic features,e.g. concept activation vector attributions. The last type
of information units are abstract semantic features not directly grounded in
any input, e.g. prototypes. Feature interactions may occur as information
units or be left unconsidered for the explanation.
Examples Some further notable examples of heatmapping methods for SmoothGrad 
feature attribution are SmoothGrad  and Integrated Gradients . One
drawback of the methods described so far is that the considered loss surfaces
that are linearly approximated tend to be “rough”, i.e. exhibit signiﬁcant
variation in the point-wise values, gradients, and thus feature importance
. SmoothGrad  aims to mitigate this by averaging the gradient from
random samples within a ball around the sample to investigate. Integrated
gradients  do the averaging (to be precise: integration) along a path
between two points in the input space. A technically similar approach but Integrated
with a diﬀerent goal is Integrated Hessians . They intend not to grasp
and visualize the sensitivity of the model for one feature (as derived feature),
but their information units are interactions of features, i.e. how much the
change of one feature changes the inﬂuence of the other on the output. This
is done by having a look at the Hessian matrix, which is obtained by two
subsequent integrated gradient calculations.
Accessibility The accessibility, level of detail, or level of complexity describes
how much intellectual eﬀort the explainee has to bring up in order to un-
derstand the simulatable parts of the explanation. Thus, the perception of
22 G. Schwalbe and B. Finzel
complexity heavily depends on the end-user, which is mirrored in the com-
plexity metric discussed later in subsection 2.3. In general, one can diﬀeren-
tiate between representations that are considered simpler, and such that are
more expressive but complex. Because accessibility refers to the simulatable
parts, this diﬀers from the decomposable transparency level: For example,
very large decision trees or very high-dimensional (general) linear models
may be perceived as globally complex by the end-user. However, when look-
ing at the simulatable parts of the explanator, like small groups of features
or nodes, they are easy to grasp.
Examples Accessibility can indirectly be assessed by the complexity and
expressivity of the explanation (see subsection 2.3). To give some examples:
Simple presentations are e.g. linear models, general additive models, decision
trees and Boolean decision rules, Bayesian models, or clusters of examples
(cf. subsection 2.1; more complex are e.g. ﬁrst-order or fuzzy logical decision
Privacy awareness Sensible information like names may be contained in parts
of the explanation, even though they are not necessary for understanding the
actual decision. In such cases, an important point is privacy awareness :
Is sensible information removed if unnecessary, or properly anonymized if
Interactivity The interaction of the user with the explanator may either be
static, so the explainee is once presented with an explanation, or interactive,
meaning an iterative process accepting user feedback as explanation input. In-
teractivity is characterized by the interaction task and the explanation process.
Interaction task The user can either inspect explanations or correct them. In-
specting takes place through exploration of diﬀerent parts of one explanation
or through consideration of various alternatives and complementing explana-
tions, such as implemented in the iNNvestigate toolbox . Besides, the user
can be empowered within the human-AI partnership to provide corrective
feedback to the system via an explanation interface, in order to adapt the
explanator and thus the explanandum.
Examples State-of-the-art systems
CAIPI  –enable the user to perform corrections on labels and to act upon wrong
explanations through interactive machine learning (intML), such as im-
plemented in the CAIPI approach ,
EluciDebug  –they allow for re-weighting of features for explanatory debugging, like
the EluciDebug system ,
Crayons  –adaption of features as provided by Crayons , and
LearnWithME  –correcting generated verbal explanations through user-deﬁned constraints,
such as implemented in the medical-decision support system LearnWith-
Explanation process As mentioned above explanation usually takes place in
an iterative fashion. Sequential analysis allows the user to query further
information in an iterative manner and to understand the model and its
XAI Method Properties: A (Meta-)study 23
decisions over time, in accordance with the user’s capabilities and the given
Examples This includes combining diﬀerent methods to create multi-modal Multi-modal
explanations and involving the user into a dialogue, such as realized through
a phrase-critic model as presented in .
Mathematical Constraints Mathematical constraints encode some formal
properties of the explanator that were found to be helpful for explanation re-
ceival. Constraints mentioned in literature are:
Linearity Considering a concrete proxy model as explanator output, linearity
is often considered as a desirable form of simplicity [55,14,70].
Monotonicity Similar to linearity, one here again considers a concrete proxy
model as output of the explanator. It is then considered a desirable level of
simplicity if the dependency of that model’s output on one input feature is
Satisﬁability This is the case if the explantor outputs readily allow application
of formal methods like solvers.
Number of iterations While some XAI methods require a one-shot inference
of the explanandum model (e.g. gradient-based methods), others require
several iterations of queries to the explanandum. Since these might be costly
or even restricted in some use cases a limited number of iterations needed
by the explanator may be desirable in some cases. Such restrictions may
arise from non-gameability  constraints on the explanandum model, i.e.
the number of queries is restricted in order to guard against systematic
optmization of outputs by users (e.g. searching for adversaries).
By now, there is a considerable amount of metrics suggested to assess the quality
of XAI methods with respect to diﬀerent goals. This section details the types
of metrics considered in literature. Following the original suggestion in , we
categorize metrics by their level of human involvement required to measure them.
For approaches to measure the below described metrics we refer the reader to
. They provide a good starting point with an in-depth analysis of metrics
measurement for visual feature attribution methods.
Functionally Grounded Metrics Metrics are considered functionally grounded
if the do not requiring any human feedback but instead measure formal proper-
ties of the explanator. This applies to the following metrics:
Fidelity or soundness , causality , or faithfulness , measures how
accurately the behavior of the surrogate model used for the explanations
conforms with that of the actual object of explanation. More simpliﬁcation
usually comes along with less ﬁdelity, since corner cases are not captured
anymore, also called the ﬁdelity interpretability trade-oﬀ.
24 G. Schwalbe and B. Finzel
Completeness or coverage measures how large the validity range of an expla-
nation is, so in which subset of the input space high ﬁdelity can be expected.
It can be seen as a generalization of ﬁdelity to the distribution of ﬁdelity.
Accuracy ignores the prediction quality of the original model and only consid-
ers the prediction quality of the surrogate model for the original task. This
only applies to post-hoc explanations.
Algorithmic complexity and scalability measure the information theoretic
complexity of the algorithm used to derive the explanator. This includes the
time to convergence (to an acceptable solution), and is especially interesting
for complex approximation schemes like rule extraction.
Stability or robustness  measures the change of explanator (output) given
a change on the input samples. This corresponds to (adversarial) robust-
ness of deep neural networks and a stable algorithm is usually also better
comprehensible and desirable. Stability makes most sense for local methods.
Consistency measures the change of the explanator (output) given a change
on the model to explain. The idea behind consistency is that functionally
equivalent models should produce the same explanation. This assumption
is important for model-agnostic approaches, while for model-speciﬁc ones
a dependency on the model architecture may even be desirable. (e.g. for
Sensitivity measures whether local explanations change if the model output
changes strongly. The intuition behind this is that a strong change in the
model output usually comes along with a change in the discrimination strat-
egy of the model between the diﬀering samples . Such changes should be
reﬂected in the explanations. Note that this may be in conﬂict with stability
goals for regions in which the explanandum model behaves chaotically.
Indicativeness (or localization in case of visual feature importance maps )
means how well an explanation points out certain points of interest, e.g.
by being sensitive to them and explicitly highlighting it for the explainee.
Such points of interest considered in literature are certainty, bias, feature
importance, and outliers (cf. ).
Expressiveness or the level of detail is interested in the expected information
density felt by the user. It is closely related to the level of abstraction of the
presentation. Several functionally grounded proxies were suggested to obtain
comparable measures for expressivity:
–the depth or amount of added information, also measured as the mean
number of used information units per explanation
–number of relations that can be expressed
–the expressiveness category of used rules, namely mere conjunction, boolean
logic, ﬁrst-order logic, or fuzzy rules (cf. )
Human Grounded Metrics Other than functionally grounded metrics, hu-
man grounded metrics require human feedback on proxy tasks for their measure-
ment. Often, proxy tasks are considered instead of the ﬁnal application to avoid
a need for expensive experts or application runtime (think of medical domains).
XAI Method Properties: A (Meta-)study 25
The goal of an explanation always is that the receiver of the explanation can
build a mental model of (aspects of ) the object of explanation. Human grounded
metrics aim to measure some fundamental psychological properties of the XAI
methods, namely quality of the mental model. The following are counted as such
Interpretability or comprehensibility or complexity measures how accurately
the mental model approximates the explanator model. This measure mostly
relies on subjective user feedback whether they “could make sense” of the
presented information. It depends on background knowledge, biases, and
cognition of the subject and can reveal use of vocabulary inappropriate to
the user .
Eﬀectiveness how accurately the mental model approximates the object of
explanation. In other words, one is interested in how well a human can
simulate the (aspects of interest of the) object after being presented with
the explanations. Proxies for the eﬀectiveness can be ﬁdelity and accessibility
[70, Sec. 2.4]. This may serve as a proxy for interpretability.
(Time) eﬃciency measures how time eﬃcient an explanation is, i.e. how long
it takes a user to build up a viable mental model. This is especially of interest
in applications with a limited time frame for user reaction, like product
recommendation systems  or automated driving applications .
Degree of understanding measures in interactive contexts the current status
of understanding It helps to estimate the remaining time or measures needed
to reach the desired extend of the explainee’s mental model.
Information amount measures the total subjective amount of information
conveyed by one explanation. Even though this may be measured on an
information theoretic basis, it usually is subjective and thus requires hu-
man feedback. Functionally grounded related metrics are the complexity of
the object of explanation, together with ﬁdelity, and coverage. For example,
more complex models have a tendency to contain more information, and
thus require more complex explanations if to be approximated widely and
Application Grounded Metrics Other than human grounded metrics, ap-
plication grounded ones work on human feedback for the ﬁnal application. The
following metrics are considered application grounded:
Satisfaction measures the direct content of the explainee with the system, so
implicitly measures the beneﬁt of explanations for the explanation system
Persuasiveness assesses the capability of the explanations to nudge an ex-
plainee into a certain direction. This is foremostly considered in recommen-
dation systems , but has high importance when it comes to analysis tasks,
where false positives and false negatives of the human-AI system are unde-
sirable. In this context, a high persuasiveness may indicate a miscalibration
26 G. Schwalbe and B. Finzel
Improvement of human judgement measures whether the explanation sys-
tem user develops an appropriate level of trust in the decisions of the ex-
plained model. Correct decisions should be trusted more than wrong deci-
sions e.g. because explanations of wrong decisions are illogical.
Improvement of human-AI system performance considers the end-to-end
task to be achieved by all of explanandum, explainee, and explanator. This
can e.g. be the diagnosis quality of doctors assisted by a recommendation
Automation capability gives an estimate on how much of the manual work
conducted by the human in the human-AI system can be be automatized.
Especially for local explanation techniques automation may be an important
factor for feasibility if the number of samples a human needs to scan can be
Novelty estimates the subjective degree of novelty of information provided to
the explainee . This is closely related to eﬃciency and satisfaction: Espe-
cially in exploratory use cases, high novelty can drastically increase eﬃciency
(no repetitive work for the explainee), and keep satisfaction high (decrease
the possibility for boredom of the explainee).
Table 5: Review of an exemplary selection of XAI techniques according to the de-
ﬁned taxonomy aspects (without fully transparent models). Abbreviations by col-
umn: image data=img, point cloud data=pcl; Trans.=transparency, post-hoc=p,
transparent=t, self-explaining =s, blended=b; processing=p, representation=r, de-
velopment during training=t data =d; visual=vis, symbolic=sym, plot =plt; feature
importance=ﬁ, contrastive=con, prototypical=proto, decision tree=tree, distribu-
Self-explaining and blended models
-  cls s p sym/vis rules/ﬁ
-  any s p sym/vis rules/ﬁ
ProtoPNet  cls,img s p/r vis proto/ﬁ
Capsule Nets  cls s r sym ﬁ
Semantic Bottlenecks, ReNN, Concept
[66,107,18] any s r sym ﬁ
Logic Tensor Nets  any b Xp/r sym rule
FoldingNet  any,pcl b p vis ﬁ/red
Neuralized clustering  any b p vis ﬁ
LIME, SHAP [86,67] cls Xp p vis ﬁ/con
RISE  cls,img Xp p vis ﬁ
D-RISE  det, img Xp p vis ﬁ
CEM  cls,img Xp p vis ﬁ/con
XAI Method Properties: A (Meta-)study 27
Table 5: Review of an exemplary selection of XAI techniques according to the de-
ﬁned taxonomy aspects (without fully transparent models). Abbreviations by col-
umn: image data=img, point cloud data=pcl; Trans.=transparency, post-hoc=p,
transparent=t, self-explaining =s, blended=b; processing=p, representation=r, de-
velopment during training=t data =d; visual=vis, symbolic=sym, plot =plt; feature
importance=ﬁ, contrastive=con, prototypical=proto, decision tree=tree, distribu-
Sensitivity analysis  cls p p vis ﬁ
Deconvnet, (Guided) Backprop. [115,95,99] img p p vis ﬁ
CAM, Grad-CAM [119,93] cls,img p p vis ﬁ
SIDU  cls,img p p vis ﬁ
Concept-wise Grad-CAM  cls,img p p/r vis ﬁ
SIDU  cls,img p p vis ﬁ
LRP  cls p p vis ﬁ
Pattern Attribution  cls p p vis ﬁ
-  cls p p vis ﬁ
SmoothGrad, Integrated Gradients [97,100] cls p p vis ﬁ
Integrated Hessians  cls p p vis ﬁ
Global representation analysis
Feature Visualization  img p Xr vis proto
NetDissect  img p Xr vis proto/ﬁ
Net2Vec  img p (X) r vis ﬁ
TCAV  any p Xr vis ﬁ
ACE  any p Xr vis ﬁ
-  any p Xr vis proto
IIN  any p (X) r vis/sym ﬁ
Explanatory Graph  img p (X) p/r vis graph
PDP  any Xp p vis plt
ICE  any XpXp vis plt
TREPAN, C4.5, Concept Tree [20,82,85] cls XpXp sym tree
VIA  cls XpXp sym rules
DeepRED  cls p Xp sym rules
LIME-Aleph  cls Xp p sym rules
CA-ILP  cls p Xp sym rules
NBDT  cls p Xp sym tree
CAIPI  cls,img Xp r vis ﬁ/con
EluciDebug  cls Xp r vis ﬁ,plt
Crayons  cls,img Xt p vis plt
28 G. Schwalbe and B. Finzel
Table 5: Review of an exemplary selection of XAI techniques according to the de-
ﬁned taxonomy aspects (without fully transparent models). Abbreviations by col-
umn: image data=img, point cloud data=pcl; Trans.=transparency, post-hoc=p,
transparent=t, self-explaining =s, blended=b; processing=p, representation=r, de-
velopment during training=t data =d; visual=vis, symbolic=sym, plot =plt; feature
importance=ﬁ, contrastive=con, prototypical=proto, decision tree=tree, distribu-
LearnWithME  cls XtXp, r sym rules
Multi-modal phrase-critic model  cls,img p Xp vis,sym plt,rules
Inspection of the training
-  any p Xt vis dist
Inﬂuence functions  cls p Xt vis ﬁ/dist
Data analysis methods
t-SNE, PCA [68,52] any XpXd vis red
k-means, spectral clustering [42,105] any XpXd vis proto
In this paper, we combined existing taxonomies and surveys on the topic of XAI
into an overarching taxonomy and added other highly relevant concepts from the
literature. Starting from the deﬁnition of the problem of XAI, we developed our
taxonomy based on three main parts: the task, the explainer and metrics. We de-
ﬁned each of these parts and explained them using numerous example concepts
and example methods from the most relevant as well as the most recent research
literature. To provide a guide on the methods, we classiﬁed the presented meth-
ods according to seven criteria that are signiﬁcant in the literature. We asked
about the task, the form of transparency, whether the method is model-agnostic
or model-speciﬁc, whether it generates global or local explanations, what the
object of explanation is, in what form explanations are presented and the type
of explanation. In our taxonomy, we highlighted that beyond the presented parts
(task, explainer and metric), there are also other, use case speciﬁc, aspects to
consider when developing, applying, and evaluating XAI to account for diﬀer-
ent stakeholders and their context. To date, there is no article in the current
research literature that uniﬁes taxonomies, illustrates them through a variety of
methods, and also serves as a starting point for use case driven research.
1. Adadi, A., Berrada, M.: Peeking inside the black-box: A survey on explainable
artiﬁcial intelligence (xai). IEEE Access 6, 52138–52160 (2018)
2. Adadi, A., Berrada, M.: Peeking Inside the Black-Box: A Survey on Explain-
able Artiﬁcial Intelligence (XAI). In: IEEE Access. vol. 6, pp. 52138–52160
(2018). https://doi.org/10.1109/ACCESS.2018.2870052, https://ieeexplore.ieee.
XAI Method Properties: A (Meta-)study 29
3. Alber, M., Lapuschkin, S., Seegerer, P., H¨agele, M., Sch¨utt, K.T., Montavon,
G., Samek, W., M¨uller, K.R., D¨ahne, S., Kindermans, P.J.: iNNvestigate Neural
Networks! Journal of Machine Learning Research 20(93), 1–8 (2019), http://
4. Arrieta, A.B., Rodr´ıguez, N.D., Ser, J.D., Bennetot, A., Tabik, S., Barbado,
A., Garc´ıa, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., Her-
rera, F.: Explainable artiﬁcial intelligence (XAI): Concepts, taxonomies, oppor-
tunities and challenges toward responsible AI. Information Fusion 58, 82–115
(2020). https://doi.org/10.1016/j.inﬀus.2019.12.012, http://www.sciencedirect.
5. Arya, V., Bellamy, R.K.E., Chen, P.Y., Dhurandhar, A., Hind, M., Hoﬀman,
S.C., Houde, S., Liao, Q.V., Luss, R., Mojsilovic, A., Mourad, S., Pedemonte,
P., Raghavendra, R., Richards, J.T., Sattigeri, P., Shanmugam, K., Singh, M.,
Varshney, K.R., Wei, D., Zhang, Y.: One explanation does not ﬁt all: A toolkit
and taxonomy of AI explainability techniques. CoRR abs/1909.03012 (2019),
6. Augasta, M.G., Kathirvalavakumar, T.: Rule extraction from neural net-
works — A comparative study. In: Proc. 2012 Int. Conf. Pattern
Recognition, Informatics and Medical Engineering. pp. 404–408 (Mar
2012). https://doi.org/10.1109/ICPRIME.2012.6208380, https://ieeexplore.ieee.
7. Bach, S., Binder, A., Montavon, G., Klauschen, F., M¨uller, K.R., Samek,
W.: On pixel-wise explanations for non-linear classiﬁer decisions by layer-
wise relevance propagation. PLOS ONE 10(7), e0130140 (Jul 2015).
8. Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., M¨uller,
K.R.: How to explain individual classiﬁcation decisions. Journal of Machine Learn-
ing Research 11, 1803–1831 (Aug 2010), http://portal.acm.org/citation.cfm?id=
9. Baniecki, H., Biecek, P.: The Grammar of Interactive Explanatory Model Anal-
ysis. arXiv:2005.00497 [cs, stat] (Sep 2020), http://arxiv.org/abs/2005.00497,
10. Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Network dissection: Quan-
tifying interpretability of deep visual representations. In: Proc. 2017 IEEE Conf.
Comput. Vision and Pattern Recognition. pp. 3319–3327. IEEE Computer Soci-
ety (2017). https://doi.org/10.1109/CVPR.2017.354, http://arxiv.org/abs/1704.
11. Benchekroun, O., Rahimi, A., Zhang, Q., Kodliuk, T.: The Need for Standard-
ized Explainability. arXiv:2010.11273 [cs] (Oct 2020), http://arxiv.org/abs/2010.
11273, arXiv: 2010.11273
12. Bruckert, S., Finzel, B., Schmid, U.: The next generation of medical decision
support: A roadmap toward transparent expert companions. Frontiers in Artiﬁcial
Intelligence 3, 75 (2020)
13. Calegari, R., Ciatto, G., Omicini, A.: On the integration of symbolic and sub-
symbolic techniques for XAI: A survey. Intelligenza Artiﬁciale 14(1), 7–32 (Jan
2020). https://doi.org/10.3233/IA-190036, https://content.iospress.com/articles/
14. Carvalho, D.V., Pereira, E.M., Cardoso, J.S.: Machine learning interpretabil-
ity: A survey on methods and metrics. Electronics 8(8), 832 (Aug 2019).
30 G. Schwalbe and B. Finzel
15. Chang, C.H., Tan, S., Lengerich, B., Goldenberg, A., Caruana, R.: How in-
terpretable and trustworthy are GAMs? CoRR abs/2006.06466 (Jun 2020),
16. Chen, C., Li, O., Tao, D., Barnett, A., Rudin, C., Su, J.: This looks like that:
Deep learning for interpretable image recognition. In: Advances in Neural Infor-
mation Processing Systems 32. vol. 32, pp. 8928–8939 (2019), https://proceedings.
17. Chen, R., Chen, H., Huang, G., Ren, J., Zhang, Q.: Explaining neu-
ral networks semantically and quantitatively. In: Proc. 2019 IEEE/CVF
International Conference on Computer Vision. pp. 9186–9195. IEEE (Oct
2019). https://doi.org/10.1109/ICCV.2019.00928, https://ieeexplore.ieee.org/
18. Chen, Z., Bei, Y., Rudin, C.: Concept whitening for interpretable image recogni-
tion. CoRR abs/2002.01650 (Feb 2020), https://arxiv.org/abs/2002.01650
19. Chromik, M., Schuessler, M.: A taxonomy for human subject evaluation of black-
box explanations in XAI. In: Proc. Workshop Explainable Smart Systems for
Algorithmic Transparency in Emerging Technologies. vol. Vol-2582, p. 7. CEUR-
20. Craven, M.W., Shavlik, J.W.: Extracting tree-structured represen-
tations of trained networks. In: Advances in Neural Information
Processing Systems 8, NIPS, Denver, CO, USA, November 27-30,
1995. pp. 24–30. MIT Press (1995), http://papers.nips.cc/paper/
21. Cropper, A., Dumancic, S., Muggleton, S.H.: Turning 30: New ideas in inductive
logic programming. CoRR abs/2002.11002 (2020), https://arxiv.org/abs/2002.
22. Danilevsky, M., Qian, K., Aharonov, R., Katsis, Y., Kawas, B., Sen, P.: A Survey
of the State of Explainable AI for Natural Language Processing. arXiv:2010.00711
[cs] (Oct 2020), http://arxiv.org/abs/2010.00711
23. Das, A., Rad, P.: Opportunities and Challenges in Explainable Artiﬁcial Intelli-
gence (XAI): A Survey. arXiv:2006.11371 [cs] (Jun 2020), http://arxiv.org/abs/
2006.11371, arXiv: 2006.11371
24. Dey, A.K.: Understanding and using context. Personal and Ubiquitous Computing
5(1), 4–7 (Jan 2001). https://doi.org/10.1007/s007790170019, https://doi.org/
25. Dhurandhar, A., Chen, P.Y., Luss, R., Tu, C.C., Ting, P., Shanmugam, K., Das,
P.: Explanations based on the missing: Towards contrastive explanations with
pertinent negatives. In: Advances in Neural Information Processing Systems
31. pp. 592–603. Curran Associates, Inc. (2018), http://papers.nips.cc/paper/
7340-explanations-based-on-the-missing- towards-contrastive-explanations-with- pertinent-negatives.
26. Donadello, I., Seraﬁni, L., d’Avila Garcez, A.S.: Logic tensor networks for se-
mantic image interpretation. In: Proc. 26th Int. Joint Conf. Artiﬁcial Intelli-
gence. pp. 1596–1602. ijcai.org (2017). https://doi.org/10.24963/ijcai.2017/221,
27. Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine
learning. CoRR abs/1702.08608 (Feb 2017), http://adsabs.harvard.edu/abs/
XAI Method Properties: A (Meta-)study 31
28. Esser, P., Rombach, R., Ommer, B.: A disentangling invertible interpre-
tation network for explaining latent representations. In: Proc. 2020 IEEE
Conf. Comput. Vision and Pattern Recognition. pp. 9220–9229. IEEE (Jun
2020). https://doi.org/10.1109/CVPR42600.2020.00924, https://ieeexplore.ieee.
29. Fails, J.A., Olsen Jr, D.R.: Interactive machine learning. In: Proceedings of the
8th international conference on Intelligent user interfaces. pp. 39–45 (2003)
30. Ferreira, J.J., Monteiro, M.S.: What Are People Doing About XAI User Ex-
perience? A Survey on AI Explainability Research and Practice. In: Marcus, A.,
Rosenzweig, E. (eds.) Design, User Experience, and Usability. Design for Contem-
porary Interactive Environments. pp. 56–73. Lecture Notes in Computer Science,
Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-
31. Fong, R., Vedaldi, A.: Net2Vec: Quantifying and explaining how concepts are
encoded by ﬁlters in deep neural networks. In: Proc. 2018 IEEE Conf. Comput.
Vision and Pattern Recognition. pp. 8730–8738. IEEE Computer Society (2018).
content cvpr 2018/html/Fong Net2Vec Quantifying and CVPR 2018 paper.
32. Fong, R.C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful
perturbation. In: Proc. 2017 IEEE Intern. Conf. on Comput. Vision. pp. 3449–
3457. IEEE Computer Society (2017). https://doi.org/10.1109/ICCV.2017.371,
33. Friedman, J.H.: Greedy function approximation: A gradi-
ent boosting machine. The Annals of Statistics 29(5), 1189–
1232 (Oct 2001). https://doi.org/10.1214/aos/1013203451, https:
34. Ghorbani, A., Wexler, J., Zou, J.Y., Kim, B.: Towards automatic
concept-based explanations. In: Advances in Neural Information Pro-
cessing Systems 32. pp. 9273–9282 (2019), http://papers.nips.cc/paper/
35. Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining
explanations: An overview of interpretability of machine learning. In: Proc. 5th
IEEE Int. Conf. Data Science and Advanced Analytics. pp. 80–89. IEEE (2018).
36. Goldstein, A., Kapelner, A., Bleich, J., Pitkin, E.: Peeking Inside the Black
Box: Visualizing Statistical Learning With Plots of Individual Conditional Ex-
pectation. Journal of Computational and Graphical Statistics 24(1), 44–65 (Jan
2015). https://doi.org/10.1080/10618600.2014.907095, https://doi.org/10.1080/
37. Goodman, B., Flaxman, S.: European union regulations on algorithmic
decision-making and a “right to explanation”. AI Magazine 38(3), 50–
57 (Oct 2017). https://doi.org/10.1609/aimag.v38i3.2741, https://ojs.aaai.org/
38. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.:
A survey of methods for explaining black box models. ACM Comput. Surv. 51(5),
93:1–93:42 (Aug 2018). https://doi.org/10.1145/3236009
39. Gunning, D.: Explainable artiﬁcial intelligence (xai). Defense Advanced Research
Projects Agency (DARPA), nd Web 2(2) (2017)
32 G. Schwalbe and B. Finzel
40. Gunning, D., Steﬁk, M., Choi, J., Miller, T., Stumpf, S., Yang, G.Z.:
XAI—Explainable artiﬁcial intelligence. Science Robotics 4(37) (Dec 2019).
41. Hailesilassie, T.: Rule extraction algorithm for deep neural networks: A review.
CoRR abs/1610.05267 (2016), http://arxiv.org/abs/1610.05267
42. Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A k-means clustering algorithm.
Journal of the Royal Statistical Society. Series C (Applied Statistics) 28(1),
100–108 (1979). https://doi.org/10.2307/2346830, https://www.jstor.org/stable/
43. Hendricks, L.A., Akata, Z., Rohrbach, M., Donahue, J., Schiele, B., Darrell, T.:
Generating visual explanations. In: Computer Vision – ECCV 2016. pp. 3–19.
Lecture Notes in Computer Science, Springer International Publishing (2016).
44. Hendricks, L.A., Hu, R., Darrell, T., Akata, Z.: Grounding visual explanations. In:
Proceedings of the European Conference on Computer Vision (ECCV) (Septem-
45. Henne, M., Schwaiger, A., Roscher, K., Weiss, G.: Benchmarking uncertainty esti-
mation methods for deep learning with safety-related metrics. In: Proc. Workshop
Artiﬁcial Intelligence Safety. CEUR Workshop Proceedings, vol. 2560, pp. 83–90.
CEUR-WS.org (2020), http://ceur-ws.org/Vol-2560/paper35.pdf
46. Heuillet, A., Couthouis, F., D´ıaz-Rodr´ıguez, N.: Explainability in deep re-
inforcement learning. Knowledge-Based Systems 214, 106685 (Feb 2021).
47. Holzinger, A., Biemann, C., Pattichis, C.S., Kell, D.B.: What do we need to build
explainable AI systems for the medical domain? arXiv:1712.09923 [cs, stat] (Dec
2017), http://arxiv.org/abs/1712.09923, arXiv: 1712.09923
48. Islam, S.R., Eberle, W., Ghafoor, S.K., Ahmed, M.: Explainable artiﬁcial intelli-
gence approaches: A survey. arXiv:2101.09429 [cs] (Jan 2021), http://arxiv.org/
49. ISO/TC 22 Road vehicles: ISO/TR 4804:2020: Road Vehicles — Safety and
Cybersecurity for Automated Driving Systems — Design, Veriﬁcation and Val-
idation. International Organization for Standardization, ﬁrst edn. (Dec 2020),
50. ISO/TC 22/SC 32: ISO 26262-6:2018(En): Road Vehicles — Functional Safety
— Part 6: Product Development at the Software Level, ISO 26262:2018(En),
vol. 6. International Organization for Standardization, second edn. (Dec 2018),
51. Janizek, J.D., Sturmfels, P., Lee, S.I.: Explaining explanations: Axiomatic feature
interactions for deep networks. CoRR abs/2002.04138 (2020), https://arxiv.
52. Jolliﬀe, I.T.: Principal Component Analysis. Springer Series in Statistics,
Springer-Verlag, second edn. (2002). https://doi.org/10.1007/b98835, https://
53. Kauﬀmann, J., Esders, M., Montavon, G., Samek, W., M¨uller, K.R.: From Clus-
tering to Cluster Explanations via Neural Networks. arXiv:1906.07633 [cs, stat]
(Jun 2019), http://arxiv.org/abs/1906.07633, arXiv: 1906.07633
54. Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep
learning for computer vision? In: Advances in Neural Information Pro-
cessing Systems 30. pp. 5580–5590 (2017), http://papers.nips.cc/paper/
7141-what-uncertainties-do-we-need-in- bayesian-deep-learning- for-computer-vision
XAI Method Properties: A (Meta-)study 33
55. Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J., Viegas, F., Sayres,
R.: Interpretability beyond feature attribution: Quantitative testing with concept
activation vectors (TCAV). In: Proc. 35th Int. Conf. Machine Learning. Proceed-
ings of Machine Learning Research, vol. 80, pp. 2668–2677. PMLR (Jul 2018),
56. Kim, J., Canny, J.F.: Interpretable learning for self-driving cars by visualizing
causal attention. In: Proc. 2017 IEEE Int. Conf. Comput. Vision. pp. 2961–2969.
IEEE Computer Society (2017). https://doi.org/10.1109/ICCV.2017.320, http:
57. Kim, J., Rohrbach, A., Darrell, T., Canny, J.F., Akata, Z.: Textual explanations
for self-driving vehicles. In: Proc. 15th European Conf. Comput. Vision, Part II.
Lecture Notes in Computer Science, vol. 11206, pp. 577–593. Springer (2018).
58. Kindermans, P.J., Sch¨utt, K.T., Alber, M., M¨uller, K.R., Erhan, D., Kim, B.,
D¨ahne, S.: Learning how to explain neural networks: PatternNet and Patter-
nAttribution. In: Proc. 6th Int. Conf. on Learning Representations (Feb 2018),
59. Koh, P.W., Liang, P.: Understanding Black-box Predictions via Inﬂuence Func-
tions. In: Proc. 34th Int. Conf. Machine Learning. pp. 1885–1894. PMLR (Jul
60. Kulesza, T., Stumpf, S., Burnett, M., Wong, W.K., Riche, Y., Moore, T., Oberst,
I., Shinsel, A., McIntosh, K.: Explanatory debugging: Supporting end-user de-
bugging of machine-learned programs. In: 2010 IEEE Symposium on Visual Lan-
guages and Human-Centric Computing. pp. 41–48. IEEE (2010)
61. Langer, M., Oster, D., Speith, T., Hermanns, H., K¨astner, L., Schmidt, E.,
Sesing, A., Baum, K.: What Do We Want From Explainable Artiﬁcial Intel-
ligence (XAI)? – A Stakeholder Perspective on XAI and a Conceptual Model
Guiding Interdisciplinary XAI Research. Artiﬁcial Intelligence p. 103473 (Feb
2021). https://doi.org/10.1016/j.artint.2021.103473, http://arxiv.org/abs/2102.
07817, arXiv: 2102.07817
62. Lapuschkin, S., W¨aldchen, S., Binder, A., Montavon, G., Samek, W.,
M¨uller, K.R.: Unmasking Clever Hans predictors and assessing what ma-
chines really learn. Nature Communications 10(1), 1096 (Mar 2019).
63. Li, X.H., Shi, Y., Li, H., Bai, W., Song, Y., Cao, C.C., Chen, L.: Quantitative
Evaluations on Saliency Methods: An Experimental Study. arXiv:2012.15616 [cs]
(Dec 2020), http://arxiv.org/abs/2012.15616, arXiv: 2012.15616
64. Linardatos, P., Papastefanopoulos, V., Kotsiantis, S.: Explainable AI: A review
of machine learning interpretability methods. Entropy 23(1), 18 (Jan 2021).
65. Lipton, Z.C.: The mythos of model interpretability. Queue 16(3), 31–
57 (Jun 2018). https://doi.org/10.1145/3236386.3241340, http://arxiv.org/abs/
66. Losch, M., Fritz, M., Schiele, B.: Interpretability beyond classiﬁcation output:
Semantic bottleneck networks. In: Proc. 3rd ACM Computer Science in Cars
Symp. Extended Abstracts (Oct 2019), https://arxiv.org/pdf/1907.10882.pdf
67. Lundberg, S.M., Lee, S.I.: A uniﬁed approach to interpreting model
predictions. In: Advances in Neural Information Processing Systems 30.
pp. 4765–4774. Curran Associates, Inc. (2017), http://papers.nips.cc/paper/
34 G. Schwalbe and B. Finzel
68. van der Maaten, L., Hinton, G.: Visualizing Data using t-SNE. Journal of Ma-
chine Learning Research 9(86), 2579–2605 (2008), http://jmlr.org/papers/v9/
69. McAllister, R., Gal, Y., Kendall, A., van der Wilk, M., Shah, A., Cipolla, R.,
Weller, A.: Concrete problems for autonomous vehicle safety: Advantages of
Bayesian deep learning. In: Proc. 26th Int. Joint Conf. Artiﬁcial Intelligence.
pp. 4745–4753 (2017), https://doi.org/10.24963/ijcai.2017/661
70. Molnar, C.: Interpretable Machine Learning. Lulu.com (2020), https://
71. Muddamsetty, S.M., Jahromi, M.N.S., Ciontos, A.E., Fenoy, L.M., Moeslund,
T.B.: Introducing and assessing the explainable AI (XAI) method: SIDU.
arXiv:2101.10710 [cs] (Jan 2021), http://arxiv.org/abs/2101.10710
72. Mueller, S.T., Hoﬀman, R.R., Clancey, W., Emrey, A., Klein, G.: Explanation
in human-AI systems: A literature meta-review, synopsis of key ideas and pub-
lications, and bibliography for explainable AI. arXiv:1902.01876 [cs] (Feb 2019),
73. Mueller, S.T., Veinott, E.S., Hoﬀman, R.R., Klein, G., Alam, L., Mamun,
T., Clancey, W.J.: Principles of explanation in human-AI systems. CoRR
abs/2102.04972 (Feb 2021), https://arxiv.org/abs/2102.04972
74. Muggleton, S.H., Schmid, U., Zeller, C., Tamaddoni-Nezhad, A., Besold, T.:
Ultra-strong machine learning: comprehensibility of programs learned with ilp.
Machine Learning 107(7), 1119–1140 (2018)
75. Nori, H., Jenkins, S., Koch, P., Caruana, R.: InterpretML: A uniﬁed framework
for machine learning interpretability. CoRR abs/1909.09223 (Sep 2019), http:
76. Nunes, I., Jannach, D.: A systematic review and taxonomy of explanations in deci-
sion support and recommender systems. User Modeling and User-Adapted Inter-
action 27(3-5), 393–444 (Dec 2017). https://doi.org/10.1007/s11257-017-9195-0,
77. Olah, C., Mordvintsev, A., Schubert, L.: Feature visualization. Distill 2(11),
e7 (Nov 2017). https://doi.org/10.23915/distill.00007, https://distill.pub/2017/
78. Petsiuk, V., Das, A., Saenko, K.: RISE: Randomized input sampling for explana-
tion of black-box models. In: Proc. British Machine Vision Conf. p. 151. BMVA
Press (2018), http://bmvc2018.org/contents/papers/1064.pdf
79. Petsiuk, V., Jain, R., Manjunatha, V., Morariu, V.I., Mehra, A., Ordonez,
V., Saenko, K.: Black-box explanation of object detectors via saliency maps.
arXiv:2006.03204 [cs] (Jun 2020), http://arxiv.org/abs/2006.03204
80. Poceviˇci¯ut˙e, M., Eilertsen, G., Lundstr¨om, C.: Survey of XAI in digi-
tal pathology. Lecture Notes in Computer Science 2020, 56–88 (2020).
https://doi.org/10.1007/978-3-030-50402-1 4, http://arxiv.org/abs/2008.06353
81. Puiutta, E., Veith, E.M.S.P.: Explainable reinforcement learning: A survey. In:
Machine Learning and Knowledge Extraction - 4th IFIP TC 5, TC 12, WG
8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2020,
Dublin, Ireland, August 25-28, 2020, Proceedings. Lecture Notes in Computer
Science, vol. 12279, pp. 77–95. Springer (2020). https://doi.org/10.1007/978-3-
030-57321-8 5, https://doi.org/10.1007/978-3-030-57321-8 5
82. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann
Series in Machine Learning, Morgan Kaufmann (1993), https://kupdf.
XAI Method Properties: A (Meta-)study 35
83. Rabold, J., Schwalbe, G., Schmid, U.: Expressive explanations of DNNs by com-
bining concept analysis with ILP. In: KI 2020: Advances in Artiﬁcial Intelligence.
pp. 148–162. Lecture Notes in Computer Science, Springer International Publish-
ing (2020). https://doi.org/10.1007/978-3-030-58285-2 11
84. Rabold, J., Siebers, M., Schmid, U.: Explaining black-box classiﬁers with ILP
– empowering LIME with Aleph to approximate non-linear decisions with re-
lational rules. In: Proc. Int. Conf. Inductive Logic Programming. pp. 105–
117. Lecture Notes in Computer Science, Springer International Publishing
(2018). https://doi.org/10.1007/978-3-319-99960-9 7, https://link.springer.com/
chapter/10.1007/978-3-319- 99960-9 7
85. Renard, X., Woloszko, N., Aigrain, J., Detyniecki, M.: Concept tree: High-level
representation of variables for more interpretable surrogate decision trees. In:
Proc. 2019 ICML Workshop Human in the Loop Learning. arXiv.org (2019),
86. Ribeiro, M.T., Singh, S., Guestrin, C.: ”Why should I trust you?”: Explaining the
predictions of any classiﬁer. In: Proc. 22nd ACM SIGKDD Int. Conf. Knowledge
Discovery and Data Mining. pp. 1135–1144. KDD ’16, ACM (2016), https://arxiv.
87. Rieger, I., Kollmann, R., Finzel, B., Seuss, D., Schmid, U.: Verifying deep learning-
based decisions for facial expression recognition (accepted). In: Proceedings of the
ESANN Conference 2020 (2020)
88. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between cap-
sules. In: Advances in Neural Information Processing Systems 30. pp.
3856–3866. Curran Associates, Inc. (2017), http://papers.nips.cc/paper/
89. Samek, W., Montavon, G., Lapuschkin, S., Anders, C.J., M¨uller, K.R.: Toward
interpretable machine learning: Transparent deep neural networks and beyond.
CoRR abs/2003.07631 (2020), https://arxiv.org/abs/2003.07631
90. Samek, W., M¨uller, K.R.: Towards explainable artiﬁcial intelligence. In:
Explainable AI: Interpreting, Explaining and Visualizing Deep Learn-
ing. Lecture Notes in Computer Science, vol. 11700, pp. 5–22. Springer
(2019). https://doi.org/10.1007/978-3-030-28954-6 1, https://doi.org/10.1007/
91. Schmid, U., Finzel, B.: Mutual explanations for cooperative decision making in
medicine. KI-K¨unstliche Intelligenz pp. 1–7 (2020)
92. Schmid, U., Zeller, C., Besold, T., Tamaddoni-Nezhad, A., Muggleton, S.: How
does predicate invention aﬀect human comprehensibility? In: International Con-
ference on Inductive Logic Programming. pp. 52–67. Springer (2016)
93. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.:
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Local-
ization. In: Proc. 2017 IEEE Int. Conf. Computer Vision. pp. 618–626. IEEE
(Oct 2017). https://doi.org/10.1109/ICCV.2017.74, http://ieeexplore.ieee.org/
94. Shwartz-Ziv, R., Tishby, N.: Opening the Black Box of Deep Neural Networks via
Information. CoRR abs/1703.00810 (2017), http://arxiv.org/abs/1703.00810
95. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks:
Visualising image classiﬁcation models and saliency maps. In: 2nd Intern. Conf.
Learning Representations, Workshop Track Proceedings. vol. abs/1312.6034.
CoRR (2014), http://arxiv.org/abs/1312.6034
36 G. Schwalbe and B. Finzel
96. Singh, A., Sengupta, S., Lakshminarayanan, V.: Explainable Deep Learn-
ing Models in Medical Image Analysis. Journal of Imaging 6(6), 52
(Jun 2020). https://doi.org/10.3390/jimaging6060052, https://www.mdpi.com/
97. Smilkov, D., Thorat, N., Kim, B., Vi´egas, F.B., Wattenberg, M.: SmoothGrad:
Removing noise by adding noise. CoRR abs/1706.03825 (2017), http://arxiv.
98. Spinner, T., Schlegel, U., Schafer, H., El-Assady, M.: explAIner: A vi-
sual analytics framework for interactive and explainable machine learn-
ing. IEEE Transactions on Visualization and Computer Graphics 26, 1064–
1074 (2020). https://doi.org/10.1109/TVCG.2019.2934629, https://arxiv.org/
99. Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.A.: Striving for sim-
plicity: The all convolutional net. In: 3rd Intern. Conf. Learning Representations,
ICLR 2015, Workshop Track Proceedings. vol. abs/1412.6806. CoRR (2015),
100. Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks.
In: Proc. 34th Int. Conf. Machine Learning. Proceedings of Machine Learning
Research, vol. 70, pp. 3319–3328. PMLR (2017), http://proceedings.mlr.press/
101. Teso, S., Kersting, K.: Explanatory interactive machine learning. In: Proceedings
of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. pp. 239–245
102. Thrun, S.: Extracting rules from artiﬁcial neural networks with dis-
tributed representations. In: Advances in Neural Information Processing
Systems 7. pp. 505–512. MIT Press (1995), http://papers.nips.cc/paper/
103. Tjoa, E., Guan, C.: A survey on explainable artiﬁcial intelligence (XAI): Toward
medical XAI. IEEE Transactions on Neural Networks and Learning Systems pp.
1–21 (2020). https://doi.org/10.1109/TNNLS.2020.3027314, https://ieeexplore.
104. Vilone, G., Longo, L.: Explainable artiﬁcial intelligence: A systematic review.
CoRR abs/2006.00093 (2020), https://arxiv.org/abs/2006.00093
105. von Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing
17(4), 395–416 (Dec 2007). https://doi.org/10.1007/s11222-007-9033-z, http://
106. Wan, A., Dunlap, L., Ho, D., Yin, J., Lee, S., Jin, H., Petryk, S., Bargal, S.A.,
Gonzalez, J.E.: NBDT: Neural-backed decision trees. arXiv:2004.00221 [cs] (Jun
107. Wang, H.: ReNN: Rule-embedded neural networks. In: Proc. 24th Int.
Conf. Pattern Recognition. pp. 824–829. IEEE Computer Society (2018).
108. Wang, Q., Zhang, K., II, A.G.O., Xing, X., Liu, X., Giles, C.L.: A comparative
study of rule extraction for recurrent neural networks. CoRR abs/1801.05420
109. Wang, Q., Zhang, K., Ororbia II, A.G., Xing, X., Liu, X., Giles, C.L.: An empirical
evaluation of rule extraction from recurrent neural networks. Neural Computation
30(9), 2568–2591 (Jul 2018). https://doi.org/10.1162/neco a 01111, http://arxiv.
XAI Method Properties: A (Meta-)study 37
110. Weitz, K.: Applying Explainable Artiﬁcial Intelligence for Deep Learning Net-
works to Decode Facial Expressions of Pain and Emotions. PhD Thesis, Otto-
Friedrich-University Bamberg (Aug 2018), http://www.cogsys.wiai.uni-bamberg.
111. Xie, N., Ras, G., van Gerven, M., Doran, D.: Explainable deep learning: A ﬁeld
guide for the uninitiated. CoRR abs/2004.14545 (2020), https://arxiv.org/abs/
112. Yang, Y., Feng, C., Shen, Y., Tian, D.: Foldingnet: Interpretable unsupervised
learning on 3d point clouds. CoRR abs/1712.07262 (2017), http://arxiv.org/
113. Yao, J.: Knowledge extracted from trained neural networks: What’s next? In:
Data Mining, Intrusion Detection, Information Assurance, and Data Networks
Security 2005, Orlando, Florida, USA, March 28-29, 2005. SPIE Proceedings,
vol. 5812, pp. 151–157. SPIE (2005). https://doi.org/10.1117/12.604463
114. Yeh, C.K., Kim, B., Arik, S.O., Li, C.L., Pﬁster, T., Ravikumar, P.: On
completeness-aware concept-based explanations in deep neural networks. CoRR
abs/1910.07969 (Feb 2020), http://arxiv.org/abs/1910.07969
115. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks.
In: 13th European Conf. Computer Vision - Part I. Lecture Notes in Com-
puter Science, vol. 8689, pp. 818–833. Springer International Publishing (2014).
116. Zhang, Q., Cao, R., Shi, F., Wu, Y.N., Zhu, S.C.: Interpreting CNN knowledge via
an explanatory graph. In: Proc. 32nd AAAI Conf. Artiﬁcial Intelligence. pp. 4454–
4463. AAAI Press (2018), https://www.aaai.org/ocs/index.php/AAAI/AAAI18/
117. Zhang, Q., Zhu, S.C.: Visual interpretability for deep learning: A survey. Frontiers
of IT & EE 19(1), 27–39 (2018). https://doi.org/10.1631/FITEE.1700808, http:
118. Zhang, Y., Chen, X.: Explainable recommendation: A survey and new per-
spectives. Foundations and Trends in Information Retrieval 14(1), 1–101
(2020). https://doi.org/10.1561/1500000066, http://www.nowpublishers.com/
119. Zhou, B., Khosla, A., Lapedriza, `
A., Oliva, A., Torralba, A.: Learning deep
features for discriminative localization. In: 2016 IEEE Conf. Comput. Vi-
sion and Pattern Recognition. pp. 2921–2929. IEEE Computer Society (2016).
120. Zhou, B., Sun, Y., Bau, D., Torralba, A.: Interpretable basis decomposi-
tion for visual explanation. In: Computer Vision – ECCV 2018. pp. 122–
138. Lecture Notes in Computer Science, Springer International Publishing
(2018). https://doi.org/10.1007/978-3-030-01237-3 8, https://link.springer.com/
chapter/10.1007%2F978-3-030- 01237-3 8
121. Zhou, J., Gandomi, A.H., Chen, F., Holzinger, A.: Evaluating the quality of
machine learning explanations: A survey on methods and metrics. Electron-
ics 10(5), 593 (Jan 2021). https://doi.org/10.3390/electronics10050593, https:
122. Zilke, J.R., Loza Menc´ıa, E., Janssen, F.: DeepRED – rule extraction from
deep neural networks. In: Proc. 19th Int. Conf. Discovery Science. pp. 457–
473. Lecture Notes in Computer Science, Springer International Publish-
ing (2016). https://doi.org/10.1007/978-3-319-46307-0 29, https://link.springer.
com/chapter/10.1007%2F978-3-319- 46307-0 29