PreprintPDF Available

Actionable explanations for contestable AI

Authors:
  • Delft University of Technology and Leiden University
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

Humans should have the ability to contest an AI agent's decision. Such an AI agent can support that ability by providing actionable explanations, making clear how its decisions can be altered through appropriate action. This paper formally defines six properties that make an explanation actionable. This formalisation enables uni-vocal comparisons and argumentations on explanation theories or models that contribute to contesting AI agents' decisions, and provide the stepping stones for the development and testing of methods to generate such explanations. A literature review showed not all "actionable properties" are being addressed appropriately. In particular, current explanations are faithfull to the AI agent's functioning and of counterfactual nature, with an increasing attention to the explanation's interpretability. However, current actionable explanations do not convey explicit and feasible action suggestions that acknowledge human preferences. We conclude with a call to the research community to address these research gaps more actively such humans can maintain their autonomy even when subjected to decisions made by an AI agent.
Journal of Artificial Intelligence Research 1 (1993) 1-15 Submitted 6/91; published 9/91
Actionable explanations for contestable AI
Jasper S. van der Waa jasper.vanderwaa@tno.nl
Jurriaan van Diggelen jurriaan.vandiggelen@tno.nl
TNO, Human Machine Teaming,
Kampweg 55, 3769 DE Soesterberg, The Netherlands
Mark A. Neerincx mark.neerincx@tno.nl
Technical University Delft, Interactive Intelligence,
Van Mourik Broekmanweg 6, 2628 XE Delft, The Netherlands, and
TNO, Human Machine Teaming,
Kampweg 55, 3769 DE Soesterberg, The Netherlands
Catholijn M. Jonker C.M.Jonker@tudelft.nl
Technical University Delft, Interactive Intelligence,
Van Mourik Broekmanweg 6, 2628 XE Delft, The Netherlands, and
University Leiden, LIACS,
Niels Bohrweg 1, 2333 CA Leiden, The Netherlands
Abstract
Humans should have the ability to contest an AI agent’s decision. Such an AI agent can
support that ability by providing actionable explanations, making clear how its decisions
can be altered through appropriate action. This paper formally defines six properties that
make an explanation actionable. This formalisation enables uni-vocal comparisons and
argumentations on explanation theories or models that contribute to contesting AI agents’
decisions, and provide the stepping stones for the development and testing of methods
to generate such explanations. A literature review showed not all “actionable properties”
are being addressed appropriately. In particular, current explanations are faithfull to the
AI agent’s functioning and of counterfactual nature, with an increasing attention to the
explanation’s interpretability. However, current actionable explanations do not convey
explicit and feasible action suggestions that acknowledge human preferences. We conclude
with a call to the research community to address these research gaps more actively such
humans can maintain their autonomy even when subjected to decisions made by an AI
agent.
1. Introduction
Within the field of Explainable AI (XAI) the main referred purpose of explanations is to
calibrate trust by justifying an AI agent’s decision or behaviour. See for instance Miller
[2019], Hoffman et al. [2018], Shin [2021], Neerincx et al. [2018] and van der Waa et al. [2020].
Trust calibration is of importance when the human and AI agent collaborate to make a best
possible decision Neerincx et al. [2018]. However, AI agents are increasingly used in cases
where they make a decision about a human who is then subjected to that decision. Examples
of such use cases are the automated processing of loan applications Ustun et al. [2019] or
the initial filtering of job-applicants Hmoud et al. [2019]. Even in use cases where human
and AI agent collaborate in their decision, there is often another human subjected to that
decision. For example in the case where a doctor is aided in their diagnosis of a patient with
©1993 AI Access Foundation. All rights reserved.
Waa, Diggelen, Neerincx, & Jonker
he help of an AI agent Ploug and Holm [2020]. In these examples the life and autonomy of
a human is affected by, in part, what an AI agent determines.
However, there is an intrinsic human value for humans to retain autonomy over their
own lives, even when parts of their lives are governed by AI agents Venkatasubramanian
and Alfano [2020]. Autonomy, or self-determination, can be retained by enabling AI agents
that are contestable by those who are subject to the AI agent’s decisions. The field of
XAI can help support this contestability of AI agents by providing explanations that are
actionable Walmsley [2021], the term “algorithmic recourse” is also at times used to refer to
contestability Wachter et al. [2017]. This implies that the explanation conveys the under-
standing humans need to identify the appropriate action needed to contest the AI agent’s
decision in a way such that this AI agent makes a more preferable decision about them.
Such action might entail directly influencing the situation such that the AI agent makes a
different decision Wachter et al. [2017]. However, it might also entail actions where humans
are supported in filing sufficiently motivated complaints or requests to an appropriate over-
sight committee who can decide to recall and improve the AI agent Mulligan et al. [2019].
The former is useful when the AI agent uses information that can be corrected or otherwise
influenced by the human, whereas the latter is useful when the AI agent perceived as being
unfair or malfunctioning.
To provide humans with the ability to effectively contest and alter an AI agent’s deci-
sions, they need an understanding about how such decisions are made and can be influenced
through action Boulicaut and Besson [2008], Goodman and Flaxman [2017]. In other words,
humans require actionable knowledge about an AI agent’s internal decision making to con-
test its decisions. Not only could such actionable knowledge improve the collaboration and
acceptance of AI agents, the need for contestability is emphasized in upcoming regulations
proposed by the European Union to regulate the application of AI agents 1. Although
within the field of XAI there is increasingly more attention to support human contestabil-
ity of AI agents through explanations, there is no consensus on what makes explanations
support contestability. There is an overall lack in what defines an explanation as conveying
actionable understanding to support contestability, explanations we refer to as actionable
explanations. The lack of a clear definition of what makes an explanation actionable, lim-
its us in determining whether such explanations actually support a human’s contestability.
This in turn prohibits the creation of a shared research agenda and the evaluation whether
such explanations cause an AI agent to adhere to regulations requiring AI agent’s to be
contestable.
This work aims to remedy this and formally define six properties that would define
an explanation as being actionable and thus supporting contestability (see Figure 1 for an
overview). We do so by taking a socio-technical system (STS) perspective Mumford [1987].
Such an perspective includes the AI agent, the human agent and their shared context. Three
components are needed to address the challenge of contestability and to help formalize the
notion of actionable explanations. As only by recognizing the effects an AI agent has on
human agents and how such human agents can influence their shared situation can we fully
address contestability.
1. https://www.europarl.europa.eu/thinktank/en/document/EPRS_BRI(2019)640163
100
Actionable explanation for contestable AI
We present a formal framework to define the role and purpose of an explanation gen-
erated and communicated by an AI agent. Next, this framework is used to formalize the
six proposed properties. This formalization aims to remove ambiguity in their definition
to foster scientific discussion, support the development of explanation generating methods,
and the derivation of metrics to measure such properties. Finally, we perform a literature
review of explanation generating methods whose explanations are referred to as supporting
contestability. Each method is reviewed on whether it adheres to any of the six properties,
with the aim to identify potential research gaps. This review shows the current state of the
art on actionable explanations and open research challenges that still need to be resolved
to truly support humans’ contestability of AI agents.
Throughout this work we will illustrate our reasoning with a concrete example of an AI
agent that functions as a vaccination planning tool during a pandemic. The tool’s purpose
is to plan vaccination dates given their medical records and lifestyle. Imagine someone
receiving a date three months from now. That person however expected to be vaccinated
within this month as it believes it should receive priority over others. Contestability in
this example means that this person is capable of identifying the most effective action that
would result in a more favourable vaccination date or a tool more in line with their own
values. We will use this example of a vaccination planning tool throughout the paper. See
below for an actionable explanation that adheres to all six proposed properties:
“Your vaccination is in three months due to your good health. Your records
indicate you previously risked obesity. If this would still be the case with no
other changes, your vaccination date would be in two weeks. If you believe
you still risk obesity, contact your general physician to verify this and update
your medical records accordingly, which would initiate a reschedule from three
months to possibly two weeks.”
This paper is structured as follows. First, in Section 2 we introduce our socio-technical
perspective towards contestability and actionable explanations. We then propose in Section
3 a formal framework to formalize explanations between the AI and human agent from such
a perspective. This framework is then used to formalize of each of the six proposed prop-
erties, separated in three consecutive levels: an actionable explanation should be accurate
(Section 4), indicative (Section 5), and personalized (Section 6). Next, Section 7 presents
our literature review and identifies future research directions. Finally, we discuss and reflect
on our work in Section 8 followed by our conclusions in Section 9.
2. Actionable explanations in a socio-technical system
See Figure 1 for an overview of the six proposed properties. They are subdivided into
three levels. The first level states that the explanation should be accurate, which implies
that the explanation should be faithful to the AI agent’s reasoning and inner working as
well as interpretable to the human agent. The second level describes that the explanation
should be indicative. We will argue that this implies that the explanation should describe
alternative situations in which different decisions will be made (e.g., counterfactuals) as well
as suggest explicit actions to arrive at such alternative situations. Lastly, we will argue that
this explanation should contain actions feasible for the human agent to perform as well as
101
Waa, Diggelen, Neerincx, & Jonker
Figure 1: An overview of the proposed six properties that define and explanation as ac-
tionable. An actionable explanation is an explanation that provides humans with
the understanding needed to effectively contest an AI agent’s decision. These
properties are separated into three levels of increasing complexity.
resulting in an AI agent’s decision deemed preferable by that human agent. Thus, the third
and final level describes the personalization of the explanation. Together, these properties
define what would make an explanation actionable to support one’s contestability.
The six properties are motivated and defined through a socio-technical system perspec-
tive. Even when a system is technologically sound, its functioning on deployment is not
guaranteed Mumford [1987]. This typically occurs when in the design the social and organi-
sational context of its application is omitted. A socio-technnical systems (STS) perspective
tries to remedy this, as it ensures consideration of application context in system design Trist
[1981]. It supports the reasoning about the effects a system brings about in the application
context, whether such effects are preferred, and if not, how they should be addressed.
With the notion of contestability, the AI agent, human agent and their shared context
come together. Contestability is the human’s ability to effectively identify the actions needed
to alter the shared context to induce a more favourable decision from the AI agent Walmsley
[2021]. This ability is also referred to as algorithmic recourse Wachter et al. [2017]. Current
research towards actionable explanations to support contestability focuses on conveying
counterfactuals. A counterfactual explanation conveys alternative situations where the AI
agent would make a different decision Wachter et al. [2017]. It is hypothesized that this
enables human agents to select the alternative situation with a more favourable outcome
and infer which actions to take to arrive at that situation. Although the communication
of counterfactuals recognizes the relation of the AI agent with its own situational context,
they forego the human agent and its context, which risks resulting in sub-optimal actionable
explanations. For example when the human agent is incapable of correctly interpreting the
counterfactuals and incapable of inferring an effective contesting action from them.
102
Actionable explanation for contestable AI
Figure 2: A socio-technical system (STS) perspective on how explanations can support
the explainee to take appropriate action when an AI agent’s decision is deemed
unfavourable.
We argue that an STS perspective towards actionable explanation results in better
explanations to support contestability. Specifically that the recognition of the AI and
human agent as well as their shared context allows the field of XAI to design more effective
actionable explanations. In Figure 2 we illustrate the STS focused on contestability and
the role of explanations therein.
This figure shows the AI agent, referred to as the explainer, and the human agent,
referred to as the explainee, interacting with each other in a shared world. Both the ex-
plainer and explainee observe this world, inferring their own unique set of observations.
The explainer uses these observations to decide on a decision or advice which it attempts
to explain. The explainer communicates both to the explainee. Who interprets the expla-
nation, adding to its understanding how the explainer makes decisions, and appraises the
decision, reflecting how pleased the explainee is with the decision. The explainee combines
this interpretation, appraisal and its observations of the world to determine whether action
is required, useful or possible to contest the explainer’s decision. This is the process where
the explainee decides on a contesting action if required and which action that should be
chosen to alter the explainer’s decision into a more favourable one. When such an action is
determined, the explainee performs it, changing the world in a way that ideally results in
the intended effects and a more favourable decision.
In our running example the vaccination planner tool is the explainer, and the person
querying it for a vaccination date the explainee. The process of observing the world means
for both that they know the explainee’s medical records and lifestyle. The vaccination
planner tool, uses these observations to plan a date three months from now, a decision that
affects the explainee’s life. The explainee subsequently appraises the decided date and feels
this is unfavourable. If no explanation would be given, the explainee can only rely on its
own experience, knowledge and assumptions to decide upon an action to try and change this
decision. However, the tool does provide an explanation which the explainee can interpret.
103
Waa, Diggelen, Neerincx, & Jonker
This explanation becomes actionable when it directly supports the explainee to determine
which action to take to contest the explainer’s decision.
In the explanation example given in the introduction, the planner tool justifies its de-
cision by referring to the observation that the explainee is in good health. The proposed
properties faithfulness and interpretable state that this justification adheres to the tool’s
reasoning and is interpretable by the explainee. The explanation is also a counterfactual
explanation, the third property. In this example the communicated counterfactual is the
alternative context when the explainee would still suffer from a risk for obesity. This would
cause the vaccination date to fall within two weeks instead of three months. The expla-
nation also contains explicit action suggestions, thus adhering to the fourth property. It
suggests the action to contact its physician to report the risk of obesity, which could result
in an updated medical record causing a rescheduling of the vaccination date. Finally, the
explanation is feasible and preferable since the explainee is able to contact its physician
and would favour a vaccination date within two weeks over three months. These final two
properties limit the number of relevant counterfactuals to be communicated to only those
which the explainee can act upon.
In the next sections we formally define each of these six properties to explicate their
meaning and remove any ambiguity. We do so given a formal framework based on the
socio-technical system perspective outlined in Figure 2. The next section provides this
framework.
3. Formal framework to explanations
Below we describe the framework used to formalize the proposed properties. Below we
introduce we list the sets used in our formalization.
Worlds.
Worlds.
Worlds. Wis the set of possible worlds, where w, w0Ware variables ranging over
worlds. For example, two worlds might differ in the person querying the planner tool
with different medical records and lifestyles.
Agents.
Agents.
Agents. Ais the finite set of agents, where a, a0Aare variables ranging over agents.
For example the vaccination planner AI and the human querying it could be agents in A.
Decisions.
Decisions.
Decisions. Dis the finite set of all decisions that agents can make. Where DaDis
the finite set of decisions agent acan take. Where dDis a variable ranging over D,
and similarly daDaa variable ranging over the possible decisions from agent a. For
example, the decision of the scheduled date for vaccination by the vaccination planner
agent.
Contesting actions.
Contesting actions.
Contesting actions. Π is the finite set of all possible actions agents can undertake in the
world to contest an agent’s decision. With Πa0Π the set of all actions agent a0A
can undertake. Where πa0Πa0is a variable ranging over single actions from a0. For
example, to contact one’s physician to update the used medical records. We write Πa0
to signify taking no action.
Decision-making functions.
Decision-making functions.
Decision-making functions. Qis the set of decision-making functions that describe the
decision-making processes of agents. Let QaQdenote the set of decision-making
104
Actionable explanation for contestable AI
functions of agent a. Where qQis a variable ranging over Q, and similarly qaQaa
variable ranging over all decision-making functions of agent a. For example the rule to
prioritize people with a risk for obesity for vaccination. Similarly, a counterfactual is also
such a function, as it is a specific set of observations (medical records) with its associated
decision (vaccination in three months).
For any Q0Qwe use |=Q0to express that a decision making process satisfies the decision
making functions of Q0. Formally, we define this recursively as follows.
The base case is: q, w, da:w|={q}daiff daq(w). If no confusion is possible, we write
|=qinstead of |={q}. The general case is: Q0Q,w, da:w|=Q0diff qQ0:w|=qd.
In general, we assume that for any agent a, the set Qais consistent. Consistency means
that in the same state there are no decision functions applicable that would lead the agent
to make decisions that contradict each other.
Explanations.
Explanations.
Explanations. Eis the set of possible explanations. With E= 2Qwhich denotes that
explanations consist of one or more decision-making functions. Where Eda
aEis the
explanation from agent aabout a decision da. For example, Eda
acan contain the deci-
sion rule that people risking obesity are prioritized for vaccination, when explaining the
decision to plan the vaccination in three months.
These sets will be used to formalize each of the distinct steps shown in Figure 2. We
do so following a multi-agent system formalization that makes use of the notion of beliefs.
Beliefs represent (possibly) imperfect information which are believed to be true by some
agent Georgeff and Rao [1991]. For instance, an agent can have beliefs about the world state
(e.g., observations Georgeff and Rao [1991]), its own decision making (e.g., introspective
beliefs Souza [2016]), and of other agents (e.g., mental models Jonker et al. [2010]). This
offers a basis to formally define explanations and their properties, as explanations can
be viewed as formulating, conveying and interpreting beliefs between agents for a certain
purpose.
The notion of beliefs is used in combination with the above sets to formalize the processes
from Figure 2. We denote Bela(·) as agent aAbelieving that a statement ·is true.
For example, Bela0(qQa) signifies that a0believes that amakes decisions according to
decision-making function q. This qcan be communicated in an explanation. We can then
use this to formalize the interpretation of an explanation.
Below we formally define the processes from Figure 2, starting with those of the ex-
plainer.
3.1 Explainer processes
The explainer aAfrom Figure 2 makes use of three processes; observe the world, decide
on a decision or advice, and explaining that decision to an explainee. Below we list the
definition of each, making use of the previously discussed sets.
Definition 3.1 (Observe).
Let aAbe an agent, then Oa:W7−Wis the agent’s observation function. If
wis the current world state, Oa(w) denotes the agent’s current observations. The agent
105
Waa, Diggelen, Neerincx, & Jonker
believes what he observes, i.e., Bela(Oa(w)) . Ideally, the observations are correct, such that
Oa(w)w. We write Oato denote the current observations by the agent if no confusion is
possible.
The planner tool from our example observes the medical records of anyone querying it.
Those records are part of the world, and are the observations the planner tool agent makes
about it. Note that this process is also occurs for the explainee, a0A, whose current
observations are denoted as Oa0.
Definition 3.2 (Decide).
Let aAbe an agent, then Oa|=Qadais its decision-making process based on its own
current observations and decision-making functions, with daDaas the made decision.
With our running example, the decision for vaccination in three months or two weeks
are decisions. The planner tool makes these decisions based on its observed medical records
and according to its set of decision making functions, which can be anything, e.g. a set of
rules or a deep neural network.
Definition 3.3 (Explain).
Explaining leads to an explanation Eda
aEabout the decision daprovided by agent a. We
assume that agent abelieves its explanations in the sense that it believes that the decision
dais made according to Eda
a. Formally, Bela(Oa|=Eda
ada) .
Recall that explanations consist of one or more decision-making functions (E= 2Q).
In our example the explanation consists out of two of such functions. First, the planner
tool conveys that people risking obesity are prioritized (a decision-rule, which is a decision-
making function). Secondly, if nothing changed except for the querying person to have that
risk, its vaccination would be in two weeks instead of three (a counterfactual, a very specific
decision-making function). Note, that the planner tool’s actual decisions do not necessarily
have to be made according to these two. For instance when the tool’s explanation or
decision process is faulty. We only assume that the planner tools believes it makes decisions
according to them.
3.2 Explainee processes
The explainee a0Afrom Figure 2 makes use of four unique processes. That of interpreting
an explanation, appraising the decision from the explainer, deciding if action is needed, and
perform that action if so. We define these as follows:
Definition 3.4 (Interpret).
The interpretation of agent a0for some explanation is done by the function Ia0:D×E7−E.
Given an Eda
afrom agent aabout decision da, the interpretation Ia0(Eda
a, da) are all decision-
making functions of which agent a0believes that auses them to make its decision da;
Bela0(Ida
a0Qa) . We abbreviate Ia0(Eda
a, da) to Ida
a0.
106
Actionable explanation for contestable AI
Ideally, Ida
a0=Eda
a, but, in general, this need not be the case. In the running example, the
explanation conveys the decision rule that people risking obesity are prioritised. However,
the person querying the planner tool (the explainee), might have a different interpretation of
what constitutes as ”risking obesity” than the planner tool (the explainer). For instance, the
interpretation could become that people weighing more than the explainee receive priority,
which is not necessarily a correct interpretation of the explanation.
Definition 3.5 (Appraise).
Appraising a decision leads to the appraisal vRof a decision daby the explainee a0. Let
v0be the appraisal for any d0
aDawhere d0
a6=da. Then v > v0denotes that dais favored
over d0
aand vv0denotes dais favored less than d0
aby agent a0. The explainee’s appraisal
thus defines Daas a total ordered set unique to that explainee.
The above definition relies on the fact that people have a certain preference for decisions
and that they are (dis)pleased with a decision when a certain threshold is reached. In the
example, this could mean that the explainee would favor any vaccination date within 2
months equally well, but any date later then that would be less and less preferable.
Definition 3.6 (Decide on contesting action).
The decision for an contesting action leads to a given action πa0Πa0by a0based on the
appraisal vof the current decision da. Where πa06=iff v < t, with tRbeing an arbitrary
threshold, meaning that the explainee decides on an action when the current decision is
deemed unfavourable. We assume that πa0is made on the basis of its interpretation Ida
a0
and observations Oa0.
Combined with the appraisal, this definition follows the notion that at some point a
decision is unfavourable enough to warrant action. Recall that was the “empty” action,
signifying a decision to take no action. In the example this might have been the case
if the vaccination date was within two months. Instead it is planned in three months,
thus triggering the process of deciding on an actual action, which it will try to do based
on its appraisal of the decision (e.g., reflecting its motivation), the interpretation of the
explanation (e.g., if risking obesity the date would be in two weeks), and its own observations
(e.g., currently risking obesity). Potentially deciding to indeed contact its physician as the
explanation suggests.
Definition 3.7 (Perform contesting action).
By performing a contesting action πa0, a new world w0is achieved following from the current
world w. This is expressed with the transition function w0=Act(πa0, w).
If the person querying our example planning tool decided to contact its physician, a
world might be achieved where that person indeed runs the risk for obesity according to its
medical records.
In this treatise, we assume that the human agent, the explainee, decides to take action
based on its own observations, its appraisal of the decision and its interpretation of the
explanation.Under this assumption, it is thus vital that the offered explanation supports
107
Waa, Diggelen, Neerincx, & Jonker
the explainee in taking the correct action. When it offers this support, that explanation is
referred to as an actionable explanation.
Below we formally define each of our six proposed properties that would make an expla-
nation actionable. We make use of the above defined sets and processes of both explainer
and explainee, which we typically denote as aand a0respectively.
4. Level 1: Accuracy
The first two properties for an actionable explanation revolve around the explanation’s ac-
curacy. To best support contestability, an explanation needs first-most to be accurate, hence
we refer to this as a Level 1 actionable explanation. However, explanation accuracy is an
ambiguous concept with many different interpretations and perspectives Lipton [2018]. To
remove this ambiguity we distinguish between two aspects of accuracy, namely faithfulness
and interpretability. The former addresses how sound the explanation is to the explainer’s
decision-making. The latter addresses how resembling the interpretation of the explainee is
to the explanation. We argue that both are required to identify an explanation as accurate.
In theory, when an explanation is both faithful and interpretable, the explainee could
rely on that information to decide if and how to act. However, the explainee is limited in
estimating the faithfulness and interpretability of an explanation. As it is unlikely that the
explainee understands the explainer sufficiently to identify an explanation as unfaithful. It
is even less likely that the explainee would be aware of an incorrect interpretation on its own
end. As such, assuring that explanations are faithful and interpretable is the responsibility
of the explanations’ designer.
4.1 Explanation faithfulness
Informally, a faithful explanation is an explanation whose conveyed description of a decision-
making process is correct compared to that same process: the explanation contains no
falsehoods. If an explanation is not faithful, it cannot be relied upon to incite a good
understanding nor the inference of an appropriate action, which reduces the ability for the
explainee to contest the explainer’s decision based on this explanation.
If in our example, the planner tool would actually make its decision based on age instead
of the risk for obesity, the communicated explanation would not be faithful. The supported
contestability is reduced, as even when the explainee decides to contact its physician and
it is decided that the explainee indeed risks obesity, nothing would change. Thus it is vital
that explanations are faithful due to an explainee’s dependency on them.
We formalize the faithful property as follows:
Property 4.1 (Faithful).
Let Eda
abe an explanation from explainer aabout some decision da. Then Faithful(Eda
a)
iff Oa|=Eda
ada, where Oais the current set of observations made by the explainer.
In other words, an explanation is Faithful when all the descriptive decision-making
functions the explanation contains leads to the explainer’s decision. Given that the explainer
108
Actionable explanation for contestable AI
only adds decision-making functions to its explanations if it believes that these apply, this
definition implies that in case of a faithful explanation these beliefs are indeed true.
4.2 Explanation interpretability
Even a faithful explanation can be misinterpreted, resulting in incorrect inferences from that
explanation by the explainee. When inferring which action to take to contest the explainer’s
decision, the supported contestability is reduced. Informally we define an explanation to be
interpretable when its interpretation by the explainee results in exactly the same information
as was conveyed by the explainer.
In the example, an incorrect interpretation would be to infer that all who weigh more
then the person querying the planning tool have priority. As this does not necessarily follow
a medical definition of risking obesity. This might incite a different kind of action, one based
on a feeling of unfairness. Hence, it does not matter how faithful an explanation is, if its
interpretation is not sound, the support to the explainer’s contestability is limited.
We define an interpretable explanation as follows:
Property 4.2 (Interpretable).
Let Eda
abe an explanation from explainer aabout decision da. Then Interpretable(Eda
a)
iff Ida
a0Eda
a.3
Note that this definition does not explicitly require that the explainee has the exact
interpretation of what was conveyed. Instead, an equivalent interpretation suffices. For
instance, the interpretation ”people with a weight above mine are prioritized” might be a
correct interpretation of ”people running risk for obesity are prioritized”, if that person runs
the risk themselves which would make the two decision-rules equivalent. Furthermore, as
long as the interpretation is equivalent, the medium with which the explanation is conveyed
does not matter. Whether visualized or textual, the above definition supports that both
can result in the same interpretation.
To summarize, ideally we would like explanations to be both faithful and interpretable.
This ensures that the knowledge that is communicated can reliably be used to infer any
contesting actions.
5. Level 2: Indicative
Besides being accurate, for an explanation to be actionable, it should specifically aim to
support the explainee’s ability to take action. Thus, we argue that actionable explanations
should contain counterfactuals and be explicit in the contesting actions to achieve those.
The former ensures that alternative observation sets and their subsequent decision are
communicated, which help the explainee infer favourable decisions and when they are given.
The latter ensures that the explanation also contains actions that result in aforementioned
counterfactuals. This removes an additional inference step for the explainee.
3. means here that wW, dD:w|=Eda
adw|=Ida
a0
d.
109
Waa, Diggelen, Neerincx, & Jonker
These two properties create a more indicative explanation. Such an explanation better
indicates what options are available to the explainee and how to achieve them, improving the
support to the explainer’s ability to contest decisions. Albeit, these proposed counterfactuals
and actions should be be faithful as well as correctly interpreted. Therefore, we refer to
this as the second level of actionable explanation. It builds on the former level of requiring
accurate explanations.
5.1 Counterfactual
The purpose of an explanation is to explain why a decision was made. For example through
providing decision rules or behavioural examples. Counterfactuals are such example-based
explanations. A counterfactual consists of 1) hypothetical observations different from the
current observations and 2) the decision the explainer believes it would make in those
hypothetical situations. Counterfactual explanations are viewed as a way to support con-
testability Wachter et al. [2017]. In a sense, a counterfactual tells the explainee; ”if this and
that would be observed, I would make this decision instead”. Only the differences between
the hypothetical observations and current observations tend to be explained Wachter et al.
[2017]. This is done to minimize the explained information to only that what is vital Miller
[2018].
We can formalize a counterfactual using our framework. We denote the hypothetical
observations as O0
a. Then, using the decision-making functions the explainer follows Qa,
we can write Bela(O0
a|=Qad0
a) . This expresses that the explainer believes it would make
decision d0
awhen observing Oathrough the use of Qa. We denote (O0
ad0
a)W×Daas
the counterfactual; the observations O0
aof which the explainer believes would lead to some
decision d0
a.
The counterfactual (O0
ad0
a) is a specific type of decision-making function. As it
explicitly describes which decision will be made with what observations. Thus we can say
that (O0
ad0
a)Q, and that Bela( (O0
ad0
a)Qa) .
For an explanation to support contestability, we thus want it to contain counterfactu-
als. We define this as the property Counterfactual, denoting that the explanation contains
counterfactuals. Formally;
Property 5.1 (Counterfactual).
Let CFa(da) = {(O0
ad0
a)W×Da|Bela(O0
a|=Qad0
a)d0
a6=da}, which represents
the set of all possible counterfactuals whose decisions are different then the current
decision da.
Then Counterfactual(Eda
a) iff CFa(da)Eda
a6=.
This defines any explanation as being Counterfactual when it contains at least one
counterfactual from CFa(da), the set of all counterfactuals with a different decision. This
property already limits the counterfactuals to only those with a different decision than the
current one, with next properties limiting this set further. In our running example, the
explanation contains one of such counterfactuals although only the actual differences with
110
Actionable explanation for contestable AI
the current observations are communicated. Namely, it contains the counterfactual where
the explainee would have the same medical records except for an added risk for obesity.
With those observations, the planner tool would plan the vaccination in two weeks instead
of three months.
The difficulty with this property is that so far it only limits the counterfactuals in an
explanation based on them having a different decision than the current one. Besides that,
there is no limit to the number of counterfactuals in the explanation. The properties in the
next section will remedy this, for now the Counterfactual property simply states that an
explanation should contain counterfactuals whose decision is different.
These counterfactuals enable the explainee to appraise each different decision, and select
one decision that is more acceptable. In this case, when deciding on an action, the explainee
only needs to infer which actions would lead to the explainer making those observations.
In our example this would be the action to visit a physician and let that physician note a
potential risk of obesity in the explainee’s records. As such, the Counterfactual property is
a step towards an actionable explanation that supports contestability.
This property relies on the properties Faithful and Interpretable. When the explanation
is Faithful it means that a counterfactual (O0
ad0
a)Eda
ais sound, meaning that the
explainer indeed makes the decision d0
awhen observing O0
a. It relies on the Interpretable
property, in the sense that the explainee should understand both the observations and
different decision from a counterfactual (O0
ad0
a)Eda
a. Otherwise the inference for a
contesting action might be invalid. For example, if the explainee would not understand
what it means to have ”a risk for obesity”, it cannot infer whether it has that risk but it
simply is missing from the medical records.
5.2 Action explicitness
Even if the explanation contains counterfactuals, the explainee still has to infer suitable
contesting actions that result in the explainer making those other observations. This can
be difficult if the explainee is not a domain expert or may not understand how the explainer
observes the world. Thus we argue that the explainer should explicitly convey action sug-
gestions that could lead to a communicated counterfactual. Such suggestions simplify the
explainee’s inference for a suitable action.
We refer to an explanation conveying contesting action suggestions as being Explicit. In-
terestingly such suggestions do not explain anything about the explainer’s decision-making,
instead they explain how to achieve a more desirable decision from that explainer. The
result is that the Explicit property implies an extension of the definition of an explana-
tion. Where an explanation Eda
awas defined as only containing decision-making functions,
Eda
aQ, we now extend this to also include actions. Any explanation who thus wants to
support an explainee’s contestability should contain more then just an explanation about
how decisions are made. It should also explicitly suggest actions the explainee might want
to take to achieve a different decision.
To formally define the property Explicit, we first have to define the process of selecting
one or more actions as suggestions related to a counterfactual (O0
ad0
a). We refer to this
process or function as ρ, and we define it as follows:
111
Waa, Diggelen, Neerincx, & Jonker
Definition 5.1 (Action suggestion identification).
Let the function ρ:CFa(da)7→ 2Πdetermine the set of contesting actions for counterfac-
tuals, defined by ρ((O0
ad0
a)) = {πΠ|Bela(O0
aActa0(π, w)) }.
Less formally, this function ρtakes a counterfactual (O0
ad0
a) and selects one or more
contesting actions. The explainer believes that when any of these actions is performed by
the explainee the world changes in a way that causes the explainer to observe O0
ainstead.
In our example, this function ρidentifies the action to contact a physician to achieve a
change, a risk for obesity, in the explainee’s medical records, which is the action related to
the counterfactual where with a risk for obesity the vaccination is planned in two weeks.
We can use this function ρto formalize the Explicit property. Specifically, we can use
it to obtain and add actions suggestions for every counterfactual in the explanation. More
formally:
Property 5.2 (Explicit).
Let Eda
abe a counterfactual explanation from explainer aabout decision da.
Let the function X : E7→ ES(E×Π) be defined as follows:
X(Eda
a) =nqEda
a|q /CFa(da)o[
nρ(O0
ad0
a),(O0
ad0
a)|
(O0
ad0
a)Eda
a(O0
ad0
a)CFa(da)o
This function adds action suggestions to every counterfactual in the explanation.
We call X(Eda
a) an Explicit explanation.
This function X takes an explanation and make all of its counterfactuals explicit by
applying ρto get the suggested contesting actions. These actions are combined with the
counterfactual. So for an arbitrary counterfactual (O0
ad0
a)Eda
a, the function X extends
it to hπa0,(O0
ad0
a)iif {πa0}=ρ((O0
ad0
a)). In our running example, hπa0,(O0
ad0
a)iis
the counterfactual with a date in two weeks when a risk of obesity is in the medical records,
and πa0is the suggestion to contact a physician to check and add such a risk to the records.
When an explanation is Explicit it specifically supports the explainee’s ability to contest
decisions through actions, thus it becomes a more actionable explanation. It reduces the
amount of inference the explainee has to perform to decide upon a contesting action, as it
can review each suggestion and decide whether it is worth it.
6. Level 3: Personalized
An issue with both a Counterfactual and Explicit explanation is the large number of possible
counterfactuals and action suggestions. A selection needs to be made which will prove to
112
Actionable explanation for contestable AI
be the most effective in supporting selecting the most appropriate and effective action.
The communicated action suggestions should be limited to those that are Feasible for the
explainee to perform. Whereas the communicated counterfactuals should be limited to the
Preferable alternative decisions the explainee. We argue that these two properties lead to a
natural way of reducing the number of communicated counterfactuals and action suggestions
to those that effectively support contestability.
The properties Feasible and Preferable lead to a personalized explanation. Ideally, with
a single counterfactual that leads to the most preferred alternative decision, combined with
action suggestions the explainee is capable of performing.
6.1 Feasible
In particular, the idea of a feasible explanation is not novel in the literature on creat-
ing explanations supporting contestability Poyiadzi et al. [2020]. This property is often
implemented as the similarity between the current observations Oaand the alternative
observations O0
afrom (O0
ad0
a). The greater the similarity, the more feasible the counter-
factual is assumed for an explainee to accomplish. However, this notion is being critiqued
as observational differences do not need to be in proportion to effort Mahajan et al. [2019].
Some observations might be changed relatively easy by the explainee (e.g., correcting a
mistake in the medical records), others take more effort (e.g., adjusting one’s lifestyle) and
others can be impossible (e.g., changing one’s age). We agree with these criticisms; that
the feasibility of achieving an counterfactual is the degree to which the explainee is capa-
ble of performing the required actions, not the differences those actions effectuate in the
explainer’s observations.
We thus define a Feasible explanations, as an explanation containing action suggestions
that are part of the set of contesting actions the explainee can take. To formally define
this, we introduce a function Γ which identifies all such action suggestions in an (explicit)
explanation. If all these actions are part of the explainee’s actions Πa0, the explanation is
deemed feasible. More formally:
Property 6.1 (Feasible).
Let X(Eda
a) be an explicit explanation from the explainer aabout some decision da.
We define the function Γ : ES(E×Π) 7→ Π inductively as follows:
- Γ=
- Γ(O0
ad0
a), ρ((O0
ad0
a))=ρ((O0
ad0
a))
-ΓX(Eda
a)=nρ((O0
ad0
a)) |
ρ((O0
ad0
a)),(O0
ad0
a)X(Eda
a)o
113
Waa, Diggelen, Neerincx, & Jonker
This function identifies and extracts all contesting actions within an explicit explana-
tion.
Then Feasible(X(Eda
a)) iff Γ(X(Eda
a)) Πa0.
This property dictates that for an explicit explanation to be feasible, all its proposed
contesting actions should be possible to perform by the explainee. In our running example
the explainee is deemed capable of contacting its physician to update its medical records
with a risk for obesity if applicable. Another suggested action could have been to adjust the
explainee’s lifestyle such that this risk of obesity is guaranteed. Both actions might have the
same result; a risk for obesity, causing a prioritizes the explainee’s vaccination. However,
the latter action is not part of the set of actions the explainee as it already believes it has
a risk for obesity and is not willing to change it lifestyle to increase it.
Previously we argued that an explicit explanation improves contestability. With the
given explicit action suggestions an explainee only has to choose instead of also inferring
them. When these action suggestions are also all guaranteed to be Feasible, contestability
is further improved, as the number of suggestions are reduced to those that matter for the
explainee.
6.2 Personalized content
The above definition of Feasibility limits the action suggestions to those that the explainee
is capable of. However, they do not limit the number of counterfactuals that could be
communicated. We argue that the natural way of doing so is to account for what decisions
the explainee prefers the explainer to make. Thus only communicating counterfactuals in
its explanation that result in an alternative decision that is more preferred then the current
one. Ideally, we would like this to be the most preferred decision the explainer can take. In
other words, we want the explanation to communicate the Preferable, which offers a way to
intelligibly select counterfactuals while further supporting the explainee’s contestability.
For an explanation to become Preferable, all of its counterfactuals should have alterna-
tive decisions that appraised more then the currently made decision. Formally:
Property 6.2 (Preferable).
Let Counterfactual(Eda
a) be an explanation from the explainer aabout some decision
da. Also, let vbe the appraisal of daby a0.
Then Preferable(Eda
a) iff (O0
ad0
a)Eda
a:v0> v where v0is the appraisal of d0
a.
According to this definition, an explanation is deemed preferable if all its counterfac-
tuals result in a better appraisal then the currently made decision, which implies that the
explainer is aware of what the explainee prefers in decisions. In our running example, this
means that the vaccination planner tool is aware that a vaccination date within two weeks
is preferred more than a date within three months.
114
Actionable explanation for contestable AI
The property Preferable offers a way to select counterfactuals from the potentially infi-
nite possible counterfactual. In doing so, it also improves the explainee’s contestability as
the explanation only addresses what needs to change to get a more preferred decision.
6.3 Actionable explanations
To summarize, we introduced three levels of explanation properties; 1) accurate explana-
tions, 2) indicative explanations and 3) personalized explanations. The first level states
that explanations should be faithful and interpretable. The second level states that the
explanation should contain counterfactuals as well as explicit action suggestions. Finally,
the third level states that these counterfactuals should be limited to those with decisions
the explainee prefers while the action suggestions are feasible for the explainee to perform.
For an explanation to support the explainee’s ability to contest the explainer’s decisions,
all three levels need to apply to that explanation. We define such an explanation as Ac-
tionable. As it not only explains how the explainer makes decisions, but does so in a way
that supports the explainee to contest the explainer’s decisions. When explanations are
accurate, their content can be relied upon as well as understood. When explanations are
indicative, they reduce the explainee’s required inference based on knowledge they might
not have. Finally, when explanations are personalized, they recognize the potentially unique
needs of the explainee. Thus conveying reliable and interpretable knowledge that allows the
explainee to take control over their life, even when that life is partially governed by decisions
made by AI agents.
7. Future research areas for actionable explanations
A concise literature review was conducted to map the state of the art explanation generating
methods to the above defined properties. The aim of this review is to find the current state
of achieving actionable explanations and future research areas.
The review was conducted by combining several literature survey papers on explanation
generating methods. These were the works from Adadi and Berrada [2018], Guidotti et al.
[2018b] and Arrieta et al. [2020]. This resulted in a total of 237 papers on methods. These
were then pruned to only those methods that stated to address the topic of contestability.
This resulted in a total of 75 unique methods who aim to generate an explanation to help
the human to contest the AI agent’s decisions.
In Table 1 we show the percentages of the 75 reviewed methods per addressed, mentioned
or not present property. Table 2 shows the complete overview for each method. No method
seems to currently adhere to all six properties.
Most of the reviewed methods (81%) aim to identify or generate counterfactuals from
a given set of alternative situations. These methods can often be led back to the approach
proposed by Wachter et al. [2017]. They propose the use of a loss or optimisation function
that assesses how well a given counterfactual suits some measurable aspects of a counterfac-
tual. Originally the only two aspects accounted for were whether the counterfactual caused
a different decision and the proximity of that counterfactual to the current situation. The
former could be viewed as the faithfulness of the explanation and most of the reviewed
papers perform such an analysis (59%). However, the latter only roughly approximates the
idea behind our property of feasibility. Our definition of feasibility applies to how feasible
115
Waa, Diggelen, Neerincx, & Jonker
the explicitly proposed actions are to perform by the human explainee. Since most methods
do not propose actions to achieve their identified counterfactual, they cannot adhere to the
property of feasibility.
In fact, only very few of the reviewed methods address the property of generating explicit
explanations (4% does). All of these methods introduce the notion of linking particular
actions to a change in a situational attribute. In particular the method by Ramakrishnan
et al. [2020] stands out. They describe the idea to link actions to situational changes and
take this a step further by also assigning weights to such actions. This allows their method
to personalize explanation towards what is feasible for the human to perform. A lower
weight would directly model a more feasible action and allow feasibility to be modelled in
the loss or optimisation function used so commonly to identify counterfactuals.
The most underrepresented property in the reviewed methods (3%) is that of using
counterfactuals that result only in a preferable decision for that particular human. Inter-
estingly it is the method proposed by Wachter et al. [2017] that attempts to adhere to this
property. Although their work formed the basis of most of the reviewed methods, none
of the reviewed methods continued with this aspect of their work. They argued that the
loss or optimisation function used to identify counterfactuals, should be tailored to include
what the human deems as a more favourable decisions. The other reviewed methods only
required the decision to be different, not necessarily favourable (or their exemplar use case
only involved two potential decisions).
Finally, the property of generating interpretable explanations is not often addressed
in the reviewed methods (only 16% did so). Those that did, performed a limited user
study to assess this property. Limited in the sense that they introduced participants with
a proxy task and relied upon subjective measurements only. An approach for which Doshi-
Velez and Kim [2017] argues that it mostly evaluates a method’s face validity but does not
contribute to the field of XAI with a more theoretical insight why certain explanations are
more interpretable then others. Nonetheless, these works did illustrate the interpretability
of their proposed methods.
To summarize our limited literature review, we noticed a research trend that does not
seem to recognize the human in an explicit sense. Attention is given to the identification
or generation of counterfactuals. Whereas little attention is given to more human-centered
properties. Properties such as the communication of explicit actions, their feasibility and
what decisions are in fact preferable to the human. So far, following our proposed properties
of what makes an actionable explanation, the approach proposed by Ramakrishnan et al.
[2020] that links counterfactual changes to actions and those actions to weights seems to be
the most promising method to generate actionable explanations so far. There also seems no
conflict between this approach and the general approach to generate actionable explanations
through the identification of counterfactuals based on a loss function. In fact, these two
can be easily combined following a multi-objective loss function Dandl et al. [2020].
8. Discussion
Here we discuss the implications of the six proposed properties and their interactions.
We defined that an accurate explanation is not only faithful to the AI agent’s decision
making but that it is also sufficiently interpretable by the human agent. Following our def-
116
Actionable explanation for contestable AI
Table 1: The percentage of the reviewed papers (75 in total) on explanation generating
methods to support the contestability of the decisions made by AI agents. ”Ad-
dressed” means that the property was validated or otherwise proven. ”Mentioned”
means that the property was only discussed or assumed. See Table 2 for a complete
overview per method.
Level 1:
Accurate
explanations
Level 2:
Indicative
explanations
Level 3:
Personalized
explanations
Faithful Interpretable Counterfactual Explicit Feasible Preferable
Addressed 59% 16% 81% 4% 7% 3%
Mentioned 28% 31% 5% 8% 29% 11%
Not present 13% 53% 13% 88% 64% 87%
inition, interpretability is the notion of an explanation’s content becoming part of a human
agent’s mental model. An explanation is said to be interpretable, if all of its content is
accurately made part of this mental model. According to our definition of faithfulness, this
explanation content should accurately reflect the AI agent’s decision-making process. Thus
an accurate explanation entails the consideration of both human and AI agent. Without
being interpretable, the explanation might be misinterpreted or parts of its omitted. This
results in an inaccurate mental model on how the AI agent makes decisions and thus hin-
ders an accurate inference for actions. Without being faithful, the explanation contains
falsehoods resulting in a similar inaccurate mental model.
Whether a faithful explanation should be favoured over an interpretable one or vice
versa, is a matter of debate. aez [2019] argues that a pragmatic view on explanation
would show that it is better to have an interpretable explanation the human can understand,
then to have a faithful one that cannot be understood. A faithful explanation of a complex
decision-process would require a completeness that hinders its interpretability thus reducing
the benefit of the explanation. On the other hand, the extreme case of completely unfaithful
explanation that is entirely interpretable is not a beneficial explanation either. A likely
balance should likely be found given a specific domain and use case. This might imply
that faithfulness and interpretability are scales instead of binary, as defined in this work.
However, assuming that given specific use case a correct balance between the two exists,
both properties can thus be defined as binary again.
Table 1 shows that only few methods of those we reviewed validate the interpretabil-
ity of generated explanations. This is likely due to such evaluations requiring rigorous
experiments with human subjects for which no good designs are currently available Doshi-
Velez and Kim [2017] nor are their established metrics available to measure explanation
effects van der Waa et al. [2020]. However, we also see an increase in the research address-
ing such issues Schoonderwoerd et al. [2021], Miller et al. [2017], Hoffman et al. [2018],
Doshi-Velez and Kim [2017]. Finally, we note that the XAI literature often uses various
117
Waa, Diggelen, Neerincx, & Jonker
different terms, adding to the field’s ambiguity Lipton [2018]. Terms such as consistency,
predictability, reliability, usability, readability and more. These all refer to a particular and
measurable element of an explanation’s faithfulness or interpretability.
Aside from an explanation’s accuracy, we defined an actionable explanation to be both
indicative and personalized. Within the XAI literature on contestability, Wachter et al.
[2017] proposed the use of counterfactuals and a methodology to find them. This caused
a trend to present accurate counterfactuals as actionable explanations. Within this work
we argue that actionable explanations should not only be accurate and counterfactual but
also explicit, feasible and preferable. This offers a natural way of addressing the issue of
identifying which counterfactuals should be provided, if any. Following these properties, a
valid counterfactual is one can be achieved through actions the human agent is capable of
and results in an AI agent’s decision it prefers over the current decision. By extension, this
also implies that if the made decision is already the most preferred one, there is no need
for an actionable explanation or contestability, something the AI agent can determine on
its own with sufficient awareness on what is feasible and preferred by the human agent.
The feasibility of a counterfactual is relatively often mentioned (29%) within the re-
viewed works. Many of such mentions are from XAI methods that are designed to specif-
ically identify feasible counterfactual. However, these do not follow the same definition of
what is deemed feasible. Instead, these methods take on an approximation of feasibility,
meaning that a counterfactual is often deemed feasible when it is as similar as possible to
the current world state Ramon et al. [2019], Sharma et al. [2019], Wachter et al. [2017] or
when it is very similar to past viewed world states Poyiadzi et al. [2020], Kanamori et al.
[2020]. Both are imply that similarity is a suitable proxy for feasibility, which may not be
the case Mahajan et al. [2019]. For example, a counterfactual where one’s age is 39 instead
of 40 is quite similar to the current world state but it is not a feasible one. Within this work,
we thus categorized such methods as mentioning the need for feasibility but not addressing
it. Instead, we follow the reasoning of Mahajan et al. [2019] and Ramakrishnan et al. [2020]
that argue that feasibility should be defined as the human agent’s capability of enacting
the counterfactual.
The level of personalization implied to achieve a feasible and preferable actionable ex-
planations, requires the AI agent to have a sound model of the human agent. According
to our definition of an actionable explanation, the AI agent requires 1) a (causal) model of
the world to identify correct actions, and 2) a model of the human agent on its capabilities
and preferences to identify a feasible and preferable counterfactual if needed.
9. Conclusion
In this work, we formally defined what constitutes as an actionable explanation to support
human’s contesting decisions made by AI agents. Six properties were defined using a formal
framework based on a socio-technical perspective towards contestable AI agents. Through
these formal definitions we aim to remove any ambiguity in their definitions to support
future research on actionable explanations through discussion and the design of methods
capable of generating explanations adhering to these properties. A literature survey showed
that most state of the art methods for generating them, address the explanation’s faithful-
ness and communicate counterfactuals while mentioning the need for their interpretability.
118
Actionable explanation for contestable AI
Explanation feasibility is often mentioned as well, but not addressed from the human’s per-
spective. Explicit action suggestions and preferable counterfactuals are not addressed or
mentioned in current methods. We propose for the research community to actively pursue
these research gaps in a joint research agenda towards methods that generate explanations
that support humans in maintaining there autonomy and self-determination by providing
them with the actionable knowledge needed to contest an AI agent’s decision.
Acknowledgments
This work was partially supported by the TNO funded program Appl.AI FATE, the Eu-
ropean Commission funded projects “Humane AI: Toward AI Systems That Augment and
Empower Humans by Understanding Us, our Society and the World Around Us” (grant
#820437), “Foundations of Trustworthy AI – Integrating Reasoning, Learning and Opti-
mization” (TAILOR) (grant #952215), and by the National Science Foundation (NWO)
project “Hybrid Intelligence” under (Grant No. 1136993). Any opinions, findings, and con-
clusions or recommendations expressed in this material are those of the author(s) and do
not necessarily reflect the views of supporting agencies. The support is gratefully acknowl-
edged.
References
Amina Adadi and Mohammed Berrada. Peeking inside the black-box: A survey on explain-
able artificial intelligence (xai). IEEE Access, 6:52138–52160, 2018.
Carlos Aguilar-Palacios, Sergio Mu˜noz-Romero, and Jos´e luis Rojo- ´
Alvarez. Cold-start
promotional sales forecasting through gradient boosted-based contrastive explanations.
IEEE Access, 8:137574–137586, 2020.
Alejandro Barredo Arrieta, Natalia D´ıaz-Rodr´ıguez, Javier Del Ser, Adrien Bennetot, Si-
ham Tabik, Alberto Barbado, Salvador Garc´ıa, Sergio Gil-L´opez, Daniel Molina, Richard
Benjamins, et al. Explainable artificial intelligence (xai): Concepts, taxonomies, oppor-
tunities and challenges toward responsible ai. Information Fusion, 58:82–115, 2020.
Andr´e Artelt and Barbara Hammer. On the computation of counterfactual explanations–a
survey. arXiv preprint arXiv:1911.07749, 2019a.
Andr´e Artelt and Barbara Hammer. Efficient computation of counterfactual explanations
of lvq models. arXiv preprint arXiv:1908.00735, 2019b.
Andr´e Artelt and Barbara Hammer. Convex density constraints for computing plausible
counterfactual explanations. In International Conference on Artificial Neural Networks,
pages 353–365. Springer, 2020.
Andr´e Artelt and Barbara Hammer. Convex optimization for actionable\& plausible coun-
terfactual explanations. arXiv preprint arXiv:2105.07630, 2021.
119
Waa, Diggelen, Neerincx, & Jonker
Emre Ates, Burak Aksar, Vitus J Leung, and Ayse K Coskun. Counterfactual explana-
tions for multivariate time series. In 2021 International Conference on Applied Artificial
Intelligence (ICAPAI), pages 1–8. IEEE, 2021.
Vincent Ballet, Xavier Renard, Jonathan Aigrain, Thibault Laugel, Pascal Frossard, and
Marcin Detyniecki. Imperceptible adversarial attacks on tabular data. arXiv preprint
arXiv:1911.03274, 2019.
Gagan Bansal. Explanatory dialogs: Towards actionable, interactive explanations. In Pro-
ceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 356–357,
2018.
Solon Barocas, Andrew D Selbst, and Manish Raghavan. The hidden assumptions behind
counterfactual explanations and principal reasons. In Proceedings of the 2020 Conference
on Fairness, Accountability, and Transparency, pages 80–89, 2020.
Alejandro Barredo-Arrieta and Javier Del Ser. Plausible counterfactuals: Auditing deep
learning classifiers with realistic adversarial examples. In 2020 International Joint Con-
ference on Neural Networks (IJCNN), pages 1–7. IEEE, 2020.
Leopoldo Bertossi. An asp-based approach to counterfactual explanations for classification.
In International Joint Conference on Rules and Reasoning, pages 70–81. Springer, 2020.
Jean-Fran¸cois Boulicaut and J´er´emy Besson. Actionability and formal concepts: A data
mining perspective. In International Conference on Formal Concept Analysis, pages 14–
31. Springer, 2008.
Matt Chapman-Rounds, Marc-Andre Schulz, Erik Pazos, and Konstantinos Geor-
gatzis. Emap: Explanation by minimal adversarial perturbation. arXiv preprint
arXiv:1912.00872, 2019.
Yatong Chen, Jialu Wang, and Yang Liu. Strategic recourse in linear classification. arXiv
e-prints, pages arXiv–2011, 2020.
Furui Cheng, Yao Ming, and Huamin Qu. Dece: Decision explorer with counterfactual expla-
nations for machine learning models. IEEE Transactions on Visualization and Computer
Graphics, 2020.
Zhicheng Cui, Wenlin Chen, Yujie He, and Yixin Chen. Optimal action extraction for
random forests and boosted trees. In Proceedings of the 21th ACM SIGKDD international
conference on knowledge discovery and data mining, pages 179–188, 2015.
Susanne Dandl, Christoph Molnar, Martin Binder, and Bernd Bischl. Multi-objective coun-
terfactual explanations. In International Conference on Parallel Problem Solving from
Nature, pages 448–469. Springer, 2020.
Amit Dhurandhar, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Paishun Ting, Karthikeyan
Shanmugam, and Payel Das. Explanations based on the missing: Towards contrastive
explanations with pertinent negatives. In Advances in Neural Information Processing
Systems, pages 592–603, 2018.
120
Actionable explanation for contestable AI
Finale Doshi-Velez and Been Kim. Towards a rigorous science of interpretable machine
learning. arXiv preprint arXiv:1702.08608, 2017.
Michael Downs, Jonathan L Chu, Yaniv Yacoby, Finale Doshi-Velez, and Weiwei Pan.
Cruds: Counterfactual recourse using disentangled subspaces. In ICML Workshop on
Human Interpretability in Machine Learning, 2020.
Carlos Fern´andez-Lor´ıa, Foster Provost, and Xintian Han. Explaining data-driven decisions
made by ai systems: the counterfactual approach. arXiv preprint arXiv:2001.07417, 2020.
M Georgeff and A Rao. Modeling rational agents within a bdi-architecture. In Proc. 2nd
Int. Conf. on Knowledge Representation and Reasoning (KR’91). Morgan Kaufmann,
pages 473–484. of, 1991.
Azin Ghazimatin, Oana Balalau, Rishiraj Saha Roy, and Gerhard Weikum. Prince:
Provider-side interpretability with counterfactual explanations in recommender systems.
In Proceedings of the 13th International Conference on Web Search and Data Mining,
pages 196–204, 2020.
Oscar Gomez, Steffen Holter, Jun Yuan, and Enrico Bertini. Vice: visual counterfactual
explanations for machine learning models. In Proceedings of the 25th International Con-
ference on Intelligent User Interfaces, pages 531–535, 2020.
Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-
making and a “right to explanation”. AI magazine, 38(3):50–57, 2017.
Yash Goyal, Ziyan Wu, Jan Ernst, Dhruv Batra, Devi Parikh, and Stefan Lee. Counter-
factual visual explanations. In International Conference on Machine Learning, pages
2376–2384. PMLR, 2019.
Rory Mc Grath, Luca Costabello, Chan Le Van, Paul Sweeney, Farbod Kamiab, Zhao
Shen, and Freddy Lecue. Interpretable credit application predictions with counterfactual
explanations. arXiv preprint arXiv:1811.05245, 2018.
Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Dino Pedreschi, Franco Turini, and
Fosca Giannotti. Local rule-based explanations of black box decision systems. arXiv
preprint arXiv:1805.10820, 2018a.
Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and
Dino Pedreschi. A survey of methods for explaining black box models. ACM computing
surveys (CSUR), 51(5):1–42, 2018b.
Riccardo Guidotti, Anna Monreale, Stan Matwin, and Dino Pedreschi. Black box ex-
planation by learning image exemplars in the latent feature space. In Joint European
Conference on Machine Learning and Knowledge Discovery in Databases, pages 189–205.
Springer, 2019.
Masoud Hashemi and Ali Fathi. Permuteattack: Counterfactual explanation of machine
learning credit scorecards. arXiv preprint arXiv:2008.10138, 2020.
121
Waa, Diggelen, Neerincx, & Jonker
Cl´ement Henin and Daniel Le M´etayer. A multi-layered approach for tailored black-box
explanations. In Proceedings of the ICPR’2020 Workshop Explainable AI. Springer, 2020.
Bilal Hmoud, Varallyai Laszlo, et al. Will artificial intelligence take over human resources
recruitment and selection. Network Intelligence Studies, 7(13):21–30, 2019.
Robert R Hoffman, Shane T Mueller, Gary Klein, and Jordan Litman. Metrics for explain-
able ai: Challenges and prospects. arXiv preprint arXiv:1812.04608, 2018.
Catholijn M Jonker, M Birna Van Riemsdijk, and Bas Vermeulen. Shared mental models.
In International Workshop on Coordination, Organizations, Institutions, and Norms in
Agent Systems, pages 132–151. Springer, 2010.
Shalmali Joshi, Oluwasanmi Koyejo, Warut Vijitbenjaronk, Been Kim, and Joydeep Ghosh.
Towards realistic individual recourse and actionable explanations in black-box decision
making systems. arXiv preprint arXiv:1907.09615, 2019.
Kentaro Kanamori, Takuya Takagi, Ken Kobayashi, and Hiroki Arimura. Dace:
Distribution-aware counterfactual explanation by mixed-integer linear optimization. In
IJCAI, pages 2855–2862, 2020.
Sin-Han Kang, Hong-Gyu Jung, Dong-Ok Won, and Seong-Whan Lee. Counterfac-
tual explanation based on gradual construction for deep networks. arXiv preprint
arXiv:2008.01897, 2020.
Amir-Hossein Karimi, Gilles Barthe, Borja Balle, and Isabel Valera. Model-agnostic coun-
terfactual explanations for consequential decisions. In International Conference on Arti-
ficial Intelligence and Statistics, pages 895–905, 2020a.
Amir-Hossein Karimi, Bernhard Sch¨olkopf, and Isabel Valera. Algorithmic recourse: from
counterfactual explanations to interventions. arXiv preprint arXiv:2002.06278, 2020b.
Amir-Hossein Karimi, Bernhard Sch¨
”olkopf, and Isabel Valera. Algorithmic recourse: from
counterfactual explanations to interventions. In Proceedings of the 2021 ACM Conference
on Fairness, Accountability, and Transparency, pages 353–362, 2021.
Mark T Keane and Barry Smyth. Good counterfactuals and where to find them: A case-
based technique for generating counterfactuals for explainable ai (xai). arXiv preprint
arXiv:2005.13997, 2020.
Maxim S Kovalev and Lev V Utkin. Counterfactual explanation of machine learning survival
models. arXiv preprint arXiv:2006.16793, 2020.
Todd Kulesza, Simone Stumpf, Weng-Keen Wong, Margaret M Burnett, Stephen Perona,
Andrew Ko, and Ian Oberst. Why-oriented end-user debugging of naive bayes text clas-
sification. ACM Transactions on Interactive Intelligent Systems (TiiS), 1(1):1–31, 2011.
Todd Kulesza, Simone Stumpf, Margaret Burnett, Sherry Yang, Irwin Kwan, and Weng-
Keen Wong. Too much, too little, or just right? ways explanations impact end users’
mental models. In 2013 IEEE Symposium on visual languages and human centric com-
puting, pages 3–10. IEEE, 2013.
122
Actionable explanation for contestable AI
Todd Kulesza, Margaret Burnett, Weng-Keen Wong, and Simone Stumpf. Principles of
explanatory debugging to personalize interactive machine learning. In Proceedings of the
20th international conference on intelligent user interfaces, pages 126–137, 2015.
Michael T Lash, Qihang Lin, Nick Street, Jennifer G Robinson, and Jeffrey Ohlmann. Gen-
eralized inverse classification. In Proceedings of the 2017 SIAM International Conference
on Data Mining, pages 162–170. SIAM, 2017a.
Michael T Lash, Qihang Lin, W Nick Street, and Jennifer G Robinson. A budget-constrained
inverse classification framework for smooth classifiers. In 2017 IEEE International Con-
ference on Data Mining Workshops (ICDMW), pages 1184–1193. IEEE, 2017b.
Thibault Laugel, Marie-Jeanne Lesot, Christophe Marsala, Xavier Renard, and Marcin De-
tyniecki. Comparison-based inverse classification for interpretability in machine learning.
In International Conference on Information Processing and Management of Uncertainty
in Knowledge-Based Systems, pages 100–111. Springer, 2018.
Benjamin J Lengerich, Sandeep Konam, Eric P Xing, Stephanie Rosenthal, and Manuela M
Veloso. Visual explanations for convolutional neural networks via input resampling. arXiv
preprint arXiv:1707.09641, 2017.
Martin Lindvall, Jesper Molin, and AB Sectra. Verification staircase: a design strategy for
actionable explanations. In ExSS-ATEC@ IUI, 2020.
Zachary C Lipton. The mythos of model interpretability. Queue, 16(3):31–57, 2018.
Shusen Liu, Bhavya Kailkhura, Donald Loveland, and Yong Han. Generative counterfactual
introspection for explainable deep learning. In 2019 IEEE Global Conference on Signal
and Information Processing (GlobalSIP), pages 1–5. IEEE, 2019.
Shusen Liu, Bhavya Kailkhura, Jize Zhang, Anna M Hiszpanski, Emily Robertson, Donald
Loveland, and T Han. Explainable deep learning for uncovering actionable scientific
insights for materials discovery and design. arXiv preprint arXiv:2007.08631, 2020.
Arnaud Van Looveren and Janis Klaise. Interpretable counterfactual explanations guided by
prototypes. In Joint European Conference on Machine Learning and Knowledge Discovery
in Databases, pages 650–665. Springer, 2021.
Ana Lucic, Harrie Oosterhuis, Hinda Haned, and Maarten de Rijke. Focus: Flexible opti-
mizable counterfactual explanations for tree ensembles. arXiv preprint arXiv:1911.12199,
2019.
Ana Lucic, Hinda Haned, and Maarten de Rijke. Why does my model fail? contrastive local
explanations for retail forecasting. In Proceedings of the 2020 Conference on Fairness,
Accountability, and Transparency, pages 90–98, 2020.
Divyat Mahajan, Chenhao Tan, and Amit Sharma. Preserving causal constraints in coun-
terfactual explanations for machine learning classifiers. arXiv preprint arXiv:1912.03277,
2019.
123
Waa, Diggelen, Neerincx, & Jonker
David Martens and Foster Provost. Explaining data-driven document classifications. Mis
Quarterly, 38(1):73–100, 2014.
Tim Miller. Contrastive explanation: A structural-model approach. arXiv preprint
arXiv:1811.03163, 2018.
Tim Miller. Explanation in artificial intelligence: Insights from the social sciences. Artificial
Intelligence, 267:1–38, 2019.
Tim Miller, Piers Howe, and Liz Sonenberg. Explainable ai: Beware of inmates running the
asylum or: How i learnt to stop worrying and love the social and behavioural sciences.
arXiv preprint arXiv:1712.00547, 2017.
Jonathan Moore, Nils Hammerla, and Chris Watkins. Explaining deep learning models with
constrained adversarial examples. In Pacific Rim International Conference on Artificial
Intelligence, pages 43–56. Springer, 2019.
Ramaravind K Mothilal, Amit Sharma, and Chenhao Tan. Explaining machine learning
classifiers through diverse counterfactual explanations. In Proceedings of the 2020 Con-
ference on Fairness, Accountability, and Transparency, pages 607–617, 2020.
Deirdre K Mulligan, Daniel Kluttz, and Nitin Kohli. Shaping our tools: Contestability as a
means to promote responsible algorithmic decision making in the professions. Available
at SSRN 3311894, 2019.
Enid Mumford. Socio-technical systems design: Evolving theory and practice. Computers
and democracy, 1987.
Mark A Neerincx, Jasper van der Waa, Frank Kaptein, and Jurriaan van Diggelen. Using
perceptual and cognitive explanations for enhanced human-agent team performance. In
International Conference on Engineering Psychology and Cognitive Ergonomics, pages
204–214. Springer, 2018.
Daniel Nemirovsky, Nicolas Thiebaut, Ye Xu, and Abhishek Gupta. Providing actionable
feedback in hiring marketplaces using generative adversarial networks. In Proceedings of
the 14th ACM International Conference on Web Search and Data Mining, pages 1089–
1092, 2021.
Andr´es P´aez. The pragmatic turn in explainable artificial intelligence (xai). Minds and
Machines, 29(3):441–459, 2019.
Martin Pawelczyk, Klaus Broelemann, and Gjergji Kasneci. On counterfactual explanations
under predictive multiplicity. In Conference on Uncertainty in Artificial Intelligence,
pages 809–818. PMLR, 2020a.
Martin Pawelczyk, Klaus Broelemann, and Gjergji Kasneci. Learning model-agnostic coun-
terfactual explanations for tabular data. In Proceedings of The Web Conference 2020,
pages 3126–3132, 2020b.
124
Actionable explanation for contestable AI
Thomas Ploug and Søren Holm. The four dimensions of contestable ai diagnostics-a patient-
centric approach to explainable ai. Artificial Intelligence in Medicine, 107:101901, 2020.
Rafael Poyiadzi, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and Peter Flach. Face:
feasible and actionable counterfactual explanations. In Proceedings of the AAAI/ACM
Conference on AI, Ethics, and Society, pages 344–350, 2020.
Goutham Ramakrishnan, Yun Chan Lee, and Aws Albarghouthi. Synthesizing action se-
quences for modifying model decisions. In Proceedings of the AAAI Conference on Arti-
ficial Intelligence, volume 34, pages 5462–5469, 2020.
Yanou Ramon, David Martens, Foster Provost, and Theodoros Evgeniou. Counterfactual
explanation algorithms for behavioral and textual data. arXiv preprint arXiv:1912.01819,
2019.
Shubham Rathi. Generating counterfactual and contrastive explanations using shap. arXiv
preprint arXiv:1906.09293, 2019.
Kaivalya Rawal and Himabindu Lakkaraju. Beyond individualized recourse: Interpretable
and interactive summaries of actionable recourses. arXiv preprint arXiv:2009.07165, 2020.
Alexis Ross, Himabindu Lakkaraju, and Osbert Bastani. Ensuring actionable recourse via
adversarial training. arXiv preprint arXiv:2011.06146, 2020.
Chris Russell. Efficient search for diverse coherent explanations. In Proceedings of the
Conference on Fairness, Accountability, and Transparency, pages 20–28, 2019.
Tjeerd AJ Schoonderwoerd, Wiard Jorritsma, Mark A Neerincx, and Karel van den Bosch.
Human-centered xai: Developing design patterns for explanations of clinical decision
support systems. International Journal of Human-Computer Studies, page 102684, 2021.
Shubham Sharma, Jette Henderson, and Joydeep Ghosh. Certifai: Counterfactual explana-
tions for robustness, transparency, interpretability, and fairness of artificial intelligence
models. arXiv preprint arXiv:1905.07857, 2019.
Donghee Shin. The effects of explainability and causability on perception, trust, and ac-
ceptance: Implications for explainable ai. International Journal of Human-Computer
Studies, 146:102551, 2021.
Ronal Singh, Paul Dourish, Piers Howe, Tim Miller, Liz Sonenberg, Eduardo Velloso, and
Frank Vetere. Directive explanations for actionable explainability in machine learning
applications. arXiv preprint arXiv:2102.02671, 2021.
Marlo Vieira dos Santos Souza. Choices that make you change your mind: a dynamic
epistemic logic approach to the semantics of BDI agent programming languages. PhD
thesis, Universidade Federal do Rio Grande do Sul, 2016.
Ezzeldin Tahoun and Andre Kassis. Beyond explanations: Recourse via actionable inter-
pretability.
125
Waa, Diggelen, Neerincx, & Jonker
Gabriele Tolomei, Fabrizio Silvestri, Andrew Haines, and Mounia Lalmas. Interpretable
predictions of tree-based ensembles via actionable feature tweaking. In Proceedings of the
23rd ACM SIGKDD international conference on knowledge discovery and data mining,
pages 465–474, 2017.
Eric L Trist. The evolution of socio-technical systems, volume 2. Ontario Quality of Working
Life Centre Toronto, 1981.
Berk Ustun, Alexander Spangher, and Yang Liu. Actionable recourse in linear classification.
In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages
10–19, 2019.
J van der Waa, M Robeer, J van Diggelen, M Brinkhuis, and M Neerincx. Contrastive
explanations with local foil trees. In Proceedings of the ICML Workshop on Human
Interpretability in Machine Learning (WHI 2018), Stockholm, Sweden, volume 37, 2018.
Jasper van der Waa, Elisabeth Nieuwburg, Anita Cremers, and Mark Neerincx. Evaluating
xai: A comparison of rule-based and example-based explanations. Artificial Intelligence,
2020.
Suresh Venkatasubramanian and Mark Alfano. The philosophical basis of algorithmic re-
course. In Proceedings of the 2020 Conference on Fairness, Accountability, and Trans-
parency, pages 284–293, 2020.
Tom Vermeire and David Martens. Explainable image classification with evidence counter-
factual. arXiv preprint arXiv:2004.07511, 2020.
Julius von K¨
”ugelgen, Amir-Hossein Karimi, Umang Bhatt, Isabel Valera, Adrian Weller,
and Bernhard Sch¨
”olkopf. On the fairness of causal algorithmic recourse. arXiv preprint
arXiv:2010.06529, 2020.
Sandra Wachter, Brent Mittelstadt, and Chris Russell. Counterfactual explanations without
opening the black box: Automated decisions and the gdpr. Harv. JL & Tech., 31:841,
2017.
Joel Walmsley. Artificial intelligence and the value of transparency. AI & SOCIETY, 36
(2):585–595, 2021.
Pei Wang and Nuno Vasconcelos. Scout: Self-aware discriminant counterfactual explana-
tions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 8981–8990, 2020.
James Wexler, Mahima Pushkarna, Tolga Bolukbasi, Martin Wattenberg, Fernanda Vi´egas,
and Jimbo Wilson. The what-if tool: Interactive probing of machine learning models.
IEEE transactions on visualization and computer graphics, 26(1):56–65, 2019.
Adam White and Artur d’Avila Garcez. Measurable counterfactual local explanations for
any classifier. arXiv preprint arXiv:1908.03020, 2019.
126
Actionable explanation for contestable AI
Xin Zhang, Armando Solar-Lezama, and Rishabh Singh. Interpreting neural network judg-
ments via minimal, stable, and symbolic corrections. arXiv preprint arXiv:1802.07384,
2018.
Yunxia Zhao. Fast real-time counterfactual explanations. arXiv preprint arXiv:2007.05684,
2020.
127
Waa, Diggelen, Neerincx, & Jonker
Table 2: An overview of all reviewed explanation generating methods used in the literature review. For each property and method,
it is illustrated whether the method addressed that property directly (denoted by a +) or only referred to it (denoted
by a ). Note that the table is split in two, with the right table continuing where the left table ended.
Level 1 Level 2 Level 3
Reference Faithful Interpretable Counterfactual Explicit Feasible Preferable
Ramakrishnan et al. [2020] + + + +
Cheng et al. [2020] + + + ∼ ∼
Dandl et al. [2020] + +∼ ∼
Gomez et al. [2020] + + ∼ ∼
Ghazimatin et al. [2020] + + + +
Kanamori et al. [2020] + + +
Lash et al. [2017b] + + +
Wachter et al. [2017] + + +
Chen et al. [2020] + +
Lash et al. [2017a] ++
Liu et al. [2020] + +
Rawal and Lakkaraju [2020] + +
Downs et al. [2020] +∼ ∼
Joshi et al. [2019] ∼ ∼ +
Mahajan et al. [2019] +∼ ∼
Singh et al. [2021] +∼ ∼
Mothilal et al. [2020] + + +
Tolomei et al. [2017] + + +
Wexler et al. [2019] + + +
Zhang et al. [2018] + + +
Aguilar-Palacios et al. [2020] + +
Artelt and Hammer [2019b] + +
Ates et al. [2021] + +
Chapman-Rounds et al. [2019] + +
Cui et al. [2015] + +
Dhurandhar et al. [2018] + +
Goyal et al. [2019] + +
Grath et al. [2018] + +
Karimi et al. [2020b] + +
Lucic et al. [2019] + +
Martens and Provost [2014] + +
Moore et al. [2019] + +
Pawelczyk et al. [2020a] + +
Poyiadzi et al. [2020] + +
Ross et al. [2020] + +
Ustun et al. [2019] + + ∼ ∼
Hashemi and Fathi [2020] ∼ ∼ +
Kang et al. [2020] ∼ ∼ +
Level 1 Level 2 Level 3
Reference Faithful Interpretable Counterfactual Explicit Feasible Preferable
Looveren and Klaise [2021] ∼ ∼ +
Lucic et al. [2020] +
Tahoun and Kassis +
Vermeire and Martens [2020] ∼ ∼ +
von K¨
”ugelgen et al. [2020] +
Wang and Vasconcelos [2020] + ∼ ∼
Keane and Smyth [2020] ∼ ∼
Ballet et al. [2019] + +
Fern´andez-Lor´ıa et al. [2020] + +
Guidotti et al. [2018a] + +
Guidotti et al. [2019] + +
Karimi et al. [2020a] + +
Kulesza et al. [2011] + +
Kulesza et al. [2013] + +
Kulesza et al. [2015] + +
Pawelczyk et al. [2020b] + +
Ramon et al. [2019] + +
White and Garcez [2019] + +
Artelt and Hammer [2020] ∼ ∼ +
Artelt and Hammer [2021] +
Bansal [2018] +
Barredo-Arrieta and Del Ser [2020] +
Bertossi [2020] +
Henin and Le M´etayer [2020] +
Kovalev and Utkin [2020] +
Laugel et al. [2018] +
Lengerich et al. [2017] +
Nemirovsky et al. [2021] +
Russell [2019] +
van der Waa et al. [2018] +
Zhao [2020] +
Barocas et al. [2020] ∼ ∼
Karimi et al. [2021] ∼ ∼
Artelt and Hammer [2019a] +
Lindvall et al. [2020] +
Liu et al. [2019] +
Rathi [2019] +
128
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The complexity of state-of-the-art modeling techniques for image classification impedes the ability to explain model predictions in an interpretable way. A counterfactual explanation highlights the parts of an image which, when removed, would change the predicted class. Both legal scholars and data scientists are increasingly turning to counterfactual explanations as these provide a high degree of human interpretability, reveal what minimal information needs to be changed in order to come to a different prediction and do not require the prediction model to be disclosed. Our literature review shows that existing counterfactual methods for image classification have strong requirements regarding access to the training data and the model internals, which often are unrealistic. Therefore, SEDC is introduced as a model-agnostic instance-level explanation method for image classification that does not need access to the training data. As image classification tasks are typically multiclass problems, an additional contribution is the introduction of the SEDC-T method that allows specifying a target counterfactual class. These methods are experimentally tested on ImageNet data, and with concrete examples, we illustrate how the resulting explanations can give insights in model decisions. Moreover, SEDC is benchmarked against existing model-agnostic explanation methods, demonstrating stability of results, computational efficiency and the counterfactual nature of the explanations.
Article
Full-text available
We propose a novel method for explaining the predictions of any classifier. In our approach, local explanations are expected to explain both the outcome of a prediction and how that prediction would change if 'things had been different'. Furthermore, we argue that satisfactory explanations cannot be dissociated from a notion and measure of fidelity, as advocated in the early days of neural networks' knowledge extraction. We introduce a definition of fidelity to the underlying classifier for local explanation models which is based on distances to a target decision boundary. A system called CLEAR: Counterfactual Local Explanations via Regression, is introduced and evaluated. CLEAR generates b-counterfactual explanations that state minimum changes necessary to flip a prediction's classification. CLEAR then builds local regression models, using the b-counterfactuals to measure and improve the fidelity of its regressions. By contrast, the popular LIME method [17], which also uses regression to generate local explanations, neither measures its own fidelity nor generates counterfactuals. CLEAR's regressions are found to have significantly higher fidelity than LIME's, averaging over 40% higher in this paper's five case studies.
Article
Full-text available
Much of the research on eXplainable Artificial Intelligence (XAI) has centered on providing transparency of machine learning models. More recently, the focus on human-centered approaches to XAI has increased. Yet, there is a lack of practical methods and examples on the integration of human factors into the development processes of AI-generated explanations that humans prove to uptake for better performance. This paper presents a case study of an application of a human-centered design approach for AI-generated explanations. The approach consists of three components: Domain analysis to define the concept & context of explanations, Requirements elicitation & assessment to derive the use cases & explanation requirements, and the consequential Multi-modal interaction design & evaluation to create a library of design patterns for explanations. In a case study, we adopt the DoReMi-approach to design explanations for a Clinical Decision Support System (CDSS) for child health. In the requirements elicitation & assessment, a user study with experienced paediatricians uncovered what explanations the CDSS should provide. In the interaction design & evaluation, a second user study tested the consequential interaction design patterns. This case study provided a first set of user requirements and design patterns for an explainable decision support system in medical diagnosis, showing how to involve expert end users in the development process and how to develop, more or less, generic solutions for general design problems in XAI.
Article
Artificial intelligence and algorithmic decision-making processes are increasingly criticized for their black-box nature. Explainable AI approaches to trace human-interpretable decision processes from algorithms have been explored. Yet, little is known about algorithmic explainability from a human factors’ perspective. From the perspective of user interpretability and understandability, this study examines the effect of explainability in AI on user trust and attitudes toward AI. It conceptualizes causability as an antecedent of explainability and as a key cue of an algorithm and examines them in relation to trust by testing how they affect user perceived performance of AI-driven services. The results show the dual roles of causability and explainability in terms of its underlying links to trust and subsequent user behaviors. Explanations of why certain news articles are recommended generate users trust whereas causability of to what extent they can understand the explanations affords users emotional confidence. Causability lends the justification for what and how should be explained as it determines the relative importance of the properties of explainability. The results have implications for the inclusion of causability and explanatory cues in AI systems, which help to increase trust and help users to assess the quality of explanations. Causable explainable AI will help people understand the decision-making process of AI algorithms by bringing transparency and accountability into AI systems.