Conference PaperPDF Available

Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations


Abstract and Figures

We introduce AI rationalization, an approach for generating explanations of autonomous system behavior as if a human had performed the behavior. We describe a rationalization technique that uses neural machine translation to translate internal state-action representations of an autonomous agent into natural language. We evaluate our technique in the Frogger game environment, training an autonomous game playing agent to rationalize its action choices using natural language. A natural language training corpus is collected from human players thinking out loud as they play the game. We motivate the use of rationalization as an approach to explanation generation and show the results of two experiments evaluating the effectiveness of rationalization. Results of these evaluations show that neural machine translation is able to accurately generate rationalizations that describe agent behavior, and that rationalizations are more satisfying to humans than other alternative methods of explanation.
Content may be subject to copyright.
Rationalization: A Neural Machine Translation Approach to Generating Natural
Language Explanations
Upol Ehsan ∗† , Brent Harrison ∗‡, Larry Chan,and Mark Riedl
Georgia Institute of Technology, Atlanta, GA, USA
University of Kentucky, Lexington, KY, USA,,,
We introduce AI rationalization, an approach for generating
explanations of autonomous system behavior as if a human
had performed the behavior. We describe a rationalization
technique that uses neural machine translation to translate
internal state-action representations of an autonomous agent
into natural language. We evaluate our technique in the Frog-
ger game environment, training an autonomous game playing
agent to rationalize its action choices using natural language.
A natural language training corpus is collected from human
players thinking out loud as they play the game. We motivate
the use of rationalization as an approach to explanation gen-
eration and show the results of two experiments evaluating
the effectiveness of rationalization. Results of these evalua-
tions show that neural machine translation is able to accu-
rately generate rationalizations that describe agent behavior,
and that rationalizations are more satisfying to humans than
other alternative methods of explanation.
Autonomous systems must make complex sequential deci-
sions in the face of uncertainty. Explainable AI refers to
artificial intelligence and machine learning techniques that
can provide human understandable justification for their be-
havior. With the proliferation of AI in everyday use, ex-
plainability is important in situations where human opera-
tors work alongside autonomous and semi-autonomous sys-
tems because it can help build rapport, confidence, and un-
derstanding between the agent and its operator. For instance,
a non-expert human collaborating with a robot for a search
and rescue mission requires confidence in the robot’s action.
In the event of failure—or if the agent performs unexpected
behaviors—it is natural for the human operator to want to
know why. Explanations help the human operator understand
why an agent failed to achieve a goal or the circumstances
whereby the behavior of the agent deviated from the expec-
tations of the human operator. They may then take appropri-
ate remedial action: trying again, providing more training to
machine learning algorithms controlling the agent, reporting
bugs to the manufacturer, etc.
Harrison and Ehsan equally contributed to this work
Copyright c
2018, Association for the Advancement of Artificial
Intelligence ( All rights reserved.
Explanation differs from interpretability, which is a fea-
ture of an algorithm or representation that affords inspection
for the purposes of understanding behavior or results. While
there has been work done recently on the interpretability
of neural networks (Yosinski et al. 2015; Zeiler and Fergus
2014), these studies mainly focus on interpretability for ex-
perts on non-sequential problems. Explanation, on the other
hand, focuses on sequential problems, is grounded in natural
language communication, and is theorized to be more use-
ful for non-AI-experts who need to operate autonomous or
semi-autonomous systems.
In this paper we introduce a new approach to explainable
AI: AI rationalization. AI rationalization is a process of pro-
ducing an explanation for agent behavior as if a human had
performed the behavior. AI rationalization is based on the
observation that there are times when humans may not have
full conscious access to reasons for their behavior and conse-
quently may not give explanations that literally reveal how
a decision was made. In these situations, it is more likely
that humans create plausible explanations on the spot when
pressed. However, we accept human-generated rationaliza-
tions as providing some lay insight into the mind of the other.
AI rationalization has a number of potential benefits over
other explainability techniques: (1) by communicating like
humans, rationalizations are naturally accessible and intu-
itive to humans, especially non-experts (2) humanlike com-
munication between autonomous systems and human oper-
ators may afford human factors advantages such as higher
degrees of satisfaction, confidence, rapport, and willingness
to use autonomous systems; (3) rationalization is fast, sac-
rificing absolute accuracy for real-time response, appropri-
ate for real-time human-agent collaboration. Should deeper,
more accurate explanations or interpretations be necessary,
rationalizations may need to be supplemented by other ex-
planation, interpretation, or visualization techniques.
We propose a technique for AI rationalization that treats
the generation of explanations as a problem of translation
between ad-hoc representations of states and actions in an
autonomous system’s environment and natural language. To
do this, we first collect a corpus of natural language utter-
ances from people performing the learning task. We then
use these utterances along with state information to train an
encoder-decoder neural network to translate between state-
action information and natural language.
To evaluate this system, we explore how AI rationaliza-
tion can be applied to an agent that plays the game Frogger.
This environment is notable because conventional learning
algorithms, such as reinforcement learning, do not learn to
play Frogger like human players, and our target audience
would not be expected to understand the specific informa-
tion about how an agent learns to play this game or why it
makes certain decisions during execution. We evaluate our
approach by measuring how well it can generate rationaliza-
tions that accurately describe the current context of the Frog-
ger environment. We also examine how humans view ratio-
nalizations by measuring how satisfying rationalizations are
compared to other baseline explanation techniques. The con-
tributions of our paper are as follows:
We introduce the concept of AI rationalization as an ap-
proach to explainable AI.
We describe a technique for generating rationalizations
that treats explanation generation as a language transla-
tion problem from internal state to natural language.
We report on an experiment using semi-synthetic data to
assess the accuracy of the translation technique.
We analyze how types of rationalization impact human
satisfaction and use these findings to inform design con-
siderations of current and future explainable agents.
Background and Related Work
For a model to be interpretable it must be possible for hu-
mans to explain why it generates certain outputs or behaves
in a certain way. Inherently, some machine learning tech-
niques produce models that are more interpretable than oth-
ers. For sequential decision making problems, there is often
no clear guidance on what makes a good explanation. For
an agent using Q-learning (Watkins and Dayan 1992), for
example, explanations of decisions could range from “the
action had the highest Qvalue given this state” to “I have
explored numerous possible future state-action trajectories
from this point and deemed this action to be the most likely
to achieve the highest expected reward according to iterative
application of the Bellman update equation.”
An alternate approach to creating interpretable machine
learning models involves creating separate models of ex-
plainability that are often built on top of black box tech-
niques such as neural networks. These approaches, some-
times called model-agnostic (Ribeiro, Singh, and Guestrin
2016; Zeiler and Fergus 2014; Yosinski et al. 2015) ap-
proaches, allow greater flexibility in model selection since
they enable black-box models to become interpretable.
Other approaches seek to learn a naturally interpretable
model which describes predictions that were made (Krause,
Perer, and Ng 2016) or by intelligently modifying model in-
puts so that resulting models can describe how outputs are
affected (Ribeiro, Singh, and Guestrin 2016).
Explainable AI has been explored in the context of ad-hoc
techniques for transforming simulation logs to explanations
(van Lent, Fisher, and Mancuso 2004), intelligent tutoring
systems (Core et al. 2006), transforming AI plans into nat-
ural language (van Lent et al. 2005), and translating mul-
tiagent communication policies into natural language (An-
dreas, Dragan, and Klein 2017). Our work differs in that the
generated rationalizations do not need to be truly represen-
tative of the algorithm’s decision-making process. This is a
novel way of applying explainable AI techniques to sequen-
tial decision-making in stochastic domains.
AI Rationalization
Rationalization is a form of explanation that attempts to jus-
tify or explain an action or behavior based on how a human
would explain a similar behavior. Whereas explanation im-
plies an accurate account of the underlying decision-making
process, AI rationalization seeks to generate explanations
that closely resemble those that a human would most likely
give were he or she in full control of an agent or robot.
We hypothesize that rationalizations will be more accessible
to humans that lack the significant amount of background
knowledge necessary to interpret explanations and that the
use of rationalizations will result in a greater sense of trust
or satisfaction on the part of the user. While Rationaliza-
tions generated by an autonomous or semi-autonomous sys-
tem need not accurately reflect the true decision-making pro-
cess underlying the agent system, they must still give some
amount of insight into what the agent is doing.
Our approach for translating representations of states and
actions to natural language consists of two general steps.
First, we must create a training corpus of natural language
and state-action pairs. Second, we use this corpus to train an
encoder-decoder network to translate the state-action infor-
mation to natural language (workflow in Figure 1).
Training Corpus
Our technique requires a training corpus that consists of
state-action pairs annotated with natural language explana-
tions. To create this corpus, we ask people to complete the
agent task’s in a virtual environment and “think aloud” as
they complete the task. We record the visited states and per-
formed actions along with the natural language utterances
of critical states and actions. This method of corpus cre-
ation ensures that the annotations gathered are associated
with specific states and actions. In essence we create paral-
lel corpora, one of which contains state representations and
actions, the other containing natural language utterances.
The precise representation of states and actions in the au-
tonomous system does not matter as long as they can be
converted to strings in a consistent fashion. Our approach
emphasizes that it should not matter how the state represen-
tation is structured and the human operator should not need
to know how to interpret it.
Translation from Internal Representation to
Natural Language
We use encoder-decoder networks to translate between com-
plex state and action information and natural language ra-
tionalizations. Encoder-decoder networks, which have pri-
marily been used in machine translation and dialogue sys-
tems, are a generative architecture comprised of two compo-
nent networks that learn how to translate an input sequence
Figure 1: The workflow for our proposed neural machine translation approach to rationalization generation.
X= (x1, ..., xT)into an output sequence Y= (y1, ..., yT0).
The first component network, the encoder, is a recurrent neu-
ral network (RNN) that learns to encode the input vector
Xinto a fixed length context vector v. This vector is then
used as input into the second component network, the de-
coder, which is a RNN that learns how to iteratively decode
this vector into the target output Y. We specifically use an
encoder-decoder network with an added attention mecha-
nism (Luong, Pham, and Manning 2015).
In this work, we test the following two hypotheses:
1. Encoder-Decoder networks can accurately generate ratio-
nalizations that fit the current situational context of the
learning environment and
2. Humans will find rationalizations more satisfying than
other forms of explainability
To test these hypotheses, we perform two evaluations in an
implementation of the popular arcade game, Frogger. We
chose Frogger as an experimental domain because computer
games have been demonstrated to be good stepping stones
toward real-world stochastic environments (Laird and van
Lent 2001; Mnih et al. 2015) and because Frogger is fast-
paced, has a reasonably rich state space, and yet can be
learned optimally without too much trouble.
Rationalization Generation Study Methodology
Evaluating natural language generation is challenging; ut-
terances can be “correct” even if they do not exactly match
known utterances from a testing dataset. To facilitate the
assessment of rationalizations generated by our technique,
we devised a technique whereby semi-synthetic natural lan-
guage was paired against state-action representations inter-
nal to an autonomous system. The semi-synthetic language
was produced by observing humans “thinking out loud”
while performing a task and then creating grammar that re-
produced and generalized the utterances (described below).
This enables us to use the grammar to evaluate the accu-
racy of our system since we can compare the rationaliza-
tions produced by our system to the most likely rule that
would have generated that utterance in the grammar. Similar
approaches involving the use of semi-synthetic corpora have
been adopted in scenarios, such as text understanding (We-
Figure 2: Testing and training maps made with 25% obsta-
cles (left), 50% obstacles (center), and 75% obstacles (right).
ston et al. 2015), where ground truth is necessary to evaluate
the system.
We conducted the experiments by generating rationaliza-
tions for states and actions in a custom implementation of
the game Frogger. In this environment, the agent must nav-
igate from the bottom of the map to the top while avoiding
obstacles in the environment. The actions available to the
agent in this environment are movement actions in the four
cardinal directions and action for standing still.
We evaluate our rationalization technique against two
baselines. The first baseline, the random baseline, randomly
selects any sentence that can be generated by the testing
grammar as a rationalization. The second baseline, the ma-
jority vote baseline, always selects sentences associated with
the rule that is most commonly used to generate rationaliza-
tions on a given map.
Below we will discuss the process for creating the gram-
mar, our training/test sets, and the results of this evaluation
in more detail.
Grammar Creation In order to translate between state in-
formation and natural language, we first need ground truth
rationalizations that can be associated explicitly with state
and action information. To generate this information, we
used crowdsourcing to gather a set of gameplay videos
of 12 human participants from 3continents playing Frog-
ger while engaging in a think-aloud protocol, following
the work by (Dorst and Cross 2001; Fonteyn, Kuipers, and
Grobe 1993).
After players completed the game, they uploaded their
gameplay video to an online speech transcription service
and assigned their own utterances to specific actions. This
layer of self-validation in the data collection process facili-
tates the robustness of the data. This process produced 225
action-rationalization trace pairs of gameplay.
We then used these action-rationalization annotations to
construct a grammar for generating synthetic sentences,
grounded in natural language. This grammar uses a set of
rules based on in-game behavior of the Frogger agent to gen-
erate rationalizations that resemble the crowdsourced data
gathered previously. Since the grammar contains the rules
that govern when certain rationalizations are generated, it al-
lows us to compare automatically generated rationalizations
against a ground-truth that one would not normally have if
the entire training corpus was crowdsourced.
Training and Test Set Generation Since we use a gram-
mar to produce ground truth rationalizations, one can in-
terpret the role of the encoder-decoder network as learning
to reproduce the grammar. In order to train the network to
do this, we use the grammar to generate rationalizations for
each state in the environment. The rules that the grammar
uses to generate rationalizations are based on a combina-
tion of the world state and the action taken. Specifically, the
grammar uses the following triple to determine which ra-
tionalizations to generate: (s1, a, s2). Here, s1is the initial
state, ais the action performed in s1, and s2is the resulting
state of the world after action ais executed. States s1and s2
consist of the (x, y)coordinates of the agent and the current
layout of grid environment. We use the grammar to generate
a rationalization for each possible (s1, a, s2)triple in the en-
vironment and then group these examples according to their
associated grammar rules. For evaluation, we take 20% of
the examples in each of these clusters and set them aside for
testing. This ensures that the testing set contains a represen-
tative sample of the parent population while still containing
example triples associated with each rule in the grammar.
To aid in training we duplicate the remaining training ex-
amples until the training set contains 1000 examples per
grammar rule and then inject noise into these training sam-
ples in order to help avoid overfitting. Recall that the in-
put to the encoder-decoder network is a triple of the form
(s1, a, s2)where s1and s2are states. To inject noise, we
randomly select 30% of the rows in this map representation
for both s1and s2and redact them by replacing them with a
dummy value.
To evaluate how our technique for rationalization per-
forms under different environmental conditions, we devel-
oped three different maps. The first map was randomly gen-
erated by filling 25% of the bottom with car obstacles and
filling 25% of the top with log platforms. The second map
was 50% cars/logs and the third map was 75% cars/logs (see
Figure 2). For the remainder of the paper, we refer to these
maps as the 25% map, the 50% map, and the 75% map re-
spectively. We also ensured that it was possible to complete
each of these maps to act as a loose control on map quality.
Training and Testing the Network The parallel corpus of
state-action representations and natural language are used to
train an encoder-decoder neural translation algorithm based
on (Luong, Pham, and Manning 2015). We use a 2-layered
encoder-decoder network with attention using long short-
term memory (LSTM) nodes with a hidden node size of 300.
We train the network for 50 epochs and then use it to gener-
ate rationalizations for each triple in the testing set.
To evaluate the accuracy of the encoder-decoder network,
we need to have a way to associate the sentence generated by
our model with a rule that exists in our grammar. The gen-
erative nature of encoder-decoder networks makes this dif-
ficult as its output may accurately describe the world state,
but not completely align with the test example’s output. To
determine the rule most likely to be associated with the gen-
erated output, we use BLEU score (Papineni et al. 2002) to
calculate sentence similarity between the sentence generated
by our predictive model with each sentence that can be gen-
erated by the grammar and record the sentence that achieves
the highest score. We then identify which rule in the gram-
mar could generate this sentence and use that to calculate
accuracy. If this rule matches the rule that was used to pro-
duce the test sentence then we say that it was a match.
Accuracy is defined as the percentage of the predictions
that matched their associated test example. We discard any
predicted sentence with a BLEU score below 0.7when com-
pared to the set of all generated sentences. This threshold
is put in place to ensure that low quality rationalizations in
terms of language syntax do not get erroneously matched to
rules in the grammar.
It is possible for a generated sentence to be associated
with more than one rule in the grammar if, for example,
multiple rules achieve the same, highest BLEU score. If the
rule that generated the testing sentence matches at least one
of the rules associated with the generated sentence, then we
count this as a match.
Rationalization Generation Results The results of our
experiments validating our first hypothesis can be found in
Table . As can be seen in the table, the encoder-decoder
network was able to consistently outperform both the ran-
dom baseline and majority baseline models. Comparing the
maps to each other, the encoder-decoder network produced
the highest accuracy when generating rationalizations for the
75% map, followed by the 25% map and the 50% map re-
spectively. To evaluate the significance of the observed dif-
ferences between these models, we ran a chi-squared test
between the models produced by the encoder-decoder net-
work and random predictor as well as between the encoder-
decoder network models and the majority classifier. Each
difference was deemed to be statistically significant (p <
0.05) across all three maps.
Rationalization Generation Discussion The models pro-
duced by the encoder-decoder network significantly outper-
formed the baseline models in terms of accuracy percent-
age. This means that this network was able to better learn
when it was appropriate to generate certain rationalizations
when compared to the random and majority baseline models.
Given the nature of our test set as well, this gives evidence to
the claim that these models can generalize to unseen states
as well. While it is not surprising that encoder-decoder net-
Table 1: Accuracy values for Frogger environments with dif-
ferent obstacle densities. Accuracy values for sentences pro-
duced by the encoder-decoder network (full) significantly
outperform those generated by a random model and a ma-
jority classifier as determined by a chi-square test.
Map Full Random Majority vote
25% obstacles 0.777 0.00 0.168
50% obstacles 0.687 0.00 0.204
75% obstacles 0.80 0.00 0.178
works were able to outperform these baselines, the margin
of difference between these models is worth noting. The per-
formances of both the random and majority classifiers are a
testament to the complexity of this problem.
These results give strong support to our claim that our
technique for creating AI rationalizations using neural ma-
chine translation can accurately produce rationalizations that
are appropriate to a given situation.
Rationalization Satisfaction Study Methodology
The results of our previous study indicate that our tech-
nique is effective at producing appropriate rationalizations.
This evaluation is meant to validate our second hypothe-
sis that humans would find rationalizations more satisfying
than other types of explanation for sequential decision mak-
ing problems. To do this, we asked people to rank and jus-
tify their relative satisfaction with explanations generated by
three agents (described below) as each performs the same
task in identical ways, only differing in the way they express
themselves. The three agents are:
The rationalizing robot, uses our neural translation ap-
proach to generate explanations.
The action-declaring robot, states its action without any
justification. For instance, it states “I will move right”.
The numerical robot, simply outputs utility values with no
natural language rationalizations.
We will discuss our human subjects protocol and experimen-
tal results below.
Participants Fifty-three adults (age range = 22 – 64
years, M = 34.1, SD = 9.38) were recruited from Ama-
zon Mechanical Turk (AMT) through a management ser-
vice called TurkPrime (Litman, Robinson, and Abberbock
2017). Twenty-one percent of the participants were women,
and only three countries were reported when the participants
were asked what country they reside in. Of these, 91% of
people reported that they live in the United States.
Procedure After reading a brief description of our study
and consenting to participate, participants were introduced
to a hypothetical high-stakes scenario. In this scenario, the
participant must remain inside a protective dome and rely on
autonomous agents to retrieve food packages necessary for
survival. The environment is essentially a “re-skinned” ver-
sion of Frogger (see figure 3) that is contextually appropriate
for the high-stakes hypothetical scenario. To avoid effects of
Figure 3: The rationalizing robot navigating the modified
Frogger environment
Figure 4: Count of 1st,2nd , and 3rd place ratings given to each
robot. The rationalization robot received the most 1st place,
the action-declaring robot received the most 2nd place, and
the numeric robot received the most 3rd place ratings.
preconceived notions, we did not use the agents’ descrip-
tive names in the study; we introduced the agents as “Robot
A” for the rationalizing robot, “Robot B” for the action-
declaring robot, and “Robot C” for the numerical robot.
Next, the participants watched a series of six videos in
two groups of three: three depicting the agents succeeding
and three showing them failing. Participants were quasi-
randomly assigned to one of the 12 possible presentation
orderings, such that each ordering was designed to have the
same number of participants. After watching the videos, par-
ticipants were asked to rank their satisfaction with the ex-
pressions given by each of the three agents and to justify
their choices in their own words.
Satisfaction Results and Analysis Figure 4 shows that
the rationalizing robot (Robot A) received the most 1st place
ratings, the action-declaring robot (Robot B) received the
most 2nd place ratings, and the numerical robot (Robot C)
received the most 3rd place ratings. To determine whether
any of these differences in satisfaction ratings were signifi-
cant, we conducted a non-parametric Friedman test of differ-
ences among repeated measures. This yielded a Chi-square
value of 45.481, which was significant (p < 0.001).
To determine which of the ratings differences were
significant, we made pairwise comparisons between the
agents, using the Wilcoxon-Nemenyi-McDonald-Thompson
test (Hollander, Wolfe, and Chicken 2013). All three com-
parisons yielded a significant difference in ratings. The sat-
isfaction ratings for the rationalization robot were signifi-
cantly higher than those for both the action-declaring robot
(p= 0.0059) as well as the numerical robot (p < 0.001).
Furthermore, the ratings for the action-declaring robot were
significantly higher than those for the numeric robot (p <
We also analyzed the justifications that participants pro-
vided for their rankings using approaches inspired by the-
matic analysis (Aronson 1995) and grounded theory (Strauss
and Corbin 1994). Starting with an open coding scheme, we
developed a set of codes that covered various reasonings be-
hind the ranking of the robots. Using the codes as analytic
lenses, we clustered them under emergent themes, which
shed light into the dimensions of satisfaction. Through an
iterative process performed until consensus was reached, we
distilled the most relevant themes into insights that can be
used to understand the “whys” behind satisfaction of expla-
nations. In our discussion of these responses, we refer to par-
ticipants using the following abbreviation: P1 is used to refer
to participant 1, P2 is used to refer to participant 2, etc.
Findings and Discussion As we hypothesized, the ratio-
nalizing agent’s explanations were rated higher than were
those of the other two agents, implying that rationaliza-
tion enhances satisfaction over action-declaring, natural lan-
guage description and over numeric expressions.
In addition to the preference for a natural language sub-
strate, four attributes emerged from our thematic analy-
sis that characterize prototypical satisfactory rationalization:
explanatory power,relatability,ludic nature, and adequate
detail. These same attributes can be used to distinguish the
rationalizing robot from the action-declaring robot.
In terms of explanatory power, the rationalizing robot’s
ability to explain its actions was the most cited reasons for its
superior placement in the satisfaction rankings. Human ra-
tionalizations allow us to form a theory of mind for the other
(Goldman and others 2012), enabling us to better understand
motivations and actions of others. Similarly, the rationaliz-
ing robot’s ability to show participants “. . . what it’s doing
and why” (P6) enabled them to “. . . get into [the rational-
izing robot’s] mind” (P17), boosting satisfaction and confi-
dence. Despite using natural language, the action declaring
robot yielded dissatisfaction. As P38 puts it, “[The action-
declaring robot] explained almost nothing. . .which was dis-
appointing.” The explanatory attribute of the rationalizing
robot reduces friction of communication and results in im-
proved satisfaction.
With respect to relatability, the personality expressed
through rationalizing robot’s explanation allowed partici-
pants to relate to it:
[The rationalizing robot] was relatable. He felt like a
friend rather than a robot. I had a connection with [it]
that would not be possible with the other 2 robots be-
cause of his built-in personality. (P21)
Participants also engaged with the rationalizing robot’s ludic
quality, expressing their appreciation of its perceived play-
fulness: “[The rationalizing robot] was fun and entertaining.
I couldn’t wait to see what he would say next!” (P2).
A rationalization yields higher satisfaction if it is ade-
quately detailed. The action-declaring robot, despite its lack
of explainability, received some positive comments. People
who preferred the action-declaring robot over the rational-
izing robot claimed that “[the rationalizing robot] talks too
much” (P47), the action-declaring robot is “nice and simple”
(P48), and that they “would like to experience a combination
of [the action-declaring robot] and [the rationalizing robot]”
(P41). Context permitting, there is a need to balance level of
detail with information overload.
These findings also align with our proposed benefits of AI
Rationalization, especially in terms accessible explanations
that are intuitive to the non-expert. We also observed how
the human-centered communication style facilitates higher
degrees of rapport. The insights not only help evaluate the
quality of responses generated by our system, but also sheds
light into design considerations that can be used to build the
next generation of explainable agents.
Future Work
Our next step is to build on our current work and investi-
gate hypotheses about how types of rationalizations impact
human preferences of AI agents in terms of confidence, per-
ceived intelligence, tolerance to failure, etc. To address these
questions, it will be necessary to conduct experiments simi-
lar to the one described above. It will be interesting to see
how inaccurate rationalizations can be before feelings of
confidence and rapport are significantly affected. Our exper-
imental methodology can be adapted to inject increasingly
more error into the rationalizations and understand human
AI rationalization provides a new lens through which we can
explore the realms of Explainable AI. As society and AI in-
tegrates further, we envision the increase in human operators
who will want to know why an agent does what it does in an
intuitive and accessible manner.
We have shown that creating rationalizations using neu-
ral machine translation techniques produces rationalizations
with accuracies above baselines. We have also shown that ra-
tionalizations produced using this technique were more sat-
isfying than other alternative means of explanation.
Rationalization allows autonomous systems to be relat-
able and human-like in their decision-making when their in-
ternal processes can be non-intuitive. We believe that AI ra-
tionalization can be an important step towards the democra-
tization of real-world commercial robotic systems in health-
care, accessibility, personal services, and military teamwork.
Andreas, J.; Dragan, A. D.; and Klein, D. 2017. Translating
neuralese. CoRR abs/1704.06960.
Aronson, J. 1995. A pragmatic view of thematic analysis.
The qualitative report 2(1):1–3.
Core, M.; Lane, H. C.; van Lent, M.; Gomboc, D.; Solomon,
S.; and Rosenberg, M. 2006. Building Explainable Artificial
Intelligence Systems. In Proceedings of the 18th Innovative
Applications of Artificial Intelligence Conference.
Dorst, K., and Cross, N. 2001. Creativity in the design
process: co-evolution of problem–solution. Design studies
Fonteyn, M. E.; Kuipers, B.; and Grobe, S. J. 1993. A de-
scription of think aloud method and protocol analysis. Qual-
itative Health Research 3(4):430–441.
Goldman, A. I., et al. 2012. Theory of mind. The Oxford
handbook of philosophy of cognitive science 402–424.
Hollander, M.; Wolfe, D. A.; and Chicken, E. 2013. Non-
parametric statistical methods. John Wiley & Sons.
Krause, J.; Perer, A.; and Ng, K. 2016. Interacting with
predictions: Visual inspection of black-box machine learn-
ing models. In Proceedings of the 2016 CHI Conference on
Human Factors in Computing Systems, 5686–5697. ACM.
Laird, J., and van Lent, M. 2001. Human-level ais killer
application: Interactive computer games. AI Magazine
Litman, L.; Robinson, J.; and Abberbock, T. 2017.
Turkprime. com: A versatile crowdsourcing data acquisi-
tion platform for the behavioral sciences. Behavior research
methods 49(2):433–442.
Luong, M.-T.; Pham, H.; and Manning, C. D. 2015. Ef-
fective approaches to attention-based neural machine trans-
lation. In Proceedings of the 2015 Conference on Empirical
Methods in Natural Language Processing, 1412–1421. Lis-
bon, Portugal: Association for Computational Linguistics.
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.; Veness,
J.; Bellemare, M.; Graves, A.; Riedmiller, M.; Fidjeland,
A.; Ostrovski, G.; Petersen, S.; Beattie, C.; Sadik, A.;
Antonoglou, I.; King, H.; Kumaran, D.; Wierstra, D.; Legg,
S.; and Hassabis, D. 2015. Human-level control through
deep reinforcement learning. Nature 518(7540):529533.
Papineni, K.; Roukos, S.; Ward, T.; and Zhu, W.-J. 2002.
Bleu: a method for automatic evaluation of machine transla-
tion. In Proceedings of the 40th annual meeting on associa-
tion for computational linguistics, 311–318. Association for
Computational Linguistics.
Ribeiro, M. T.; Singh, S.; and Guestrin, C. 2016. Why
should i trust you?: Explaining the predictions of any clas-
sifier. In Proceedings of the 22nd ACM SIGKDD Interna-
tional Conference on Knowledge Discovery and Data Min-
ing, 1135–1144. ACM.
Strauss, A., and Corbin, J. 1994. Grounded theory method-
ology. Handbook of qualitative research 17:273–85.
van Lent, M.; ; Carpenter, P.; McAlinden, R.; and Brobst, P.
2005. Increasing replayability with deliberative and reactive
planning. In 1st Conference on Artificial Intelligence and
Interactive Digital Entertainment (AIIDE), 135–140.
van Lent, M.; Fisher, W.; and Mancuso, M. 2004. An ex-
plainable artificial intelligence system for small-unit tactical
behavior. In Proceedings of the 16th conference on Innova-
tive Applications of Artifical Intelligence.
Watkins, C., and Dayan, P. 1992. Q-learning. Machine
Learning 8(3-4):279292.
Weston, J.; Bordes, A.; Chopra, S.; Rush, A. M.; van
enboer, B.; Joulin, A.; and Mikolov, T. 2015. Towards
ai-complete question answering: A set of prerequisite toy
tasks. arXiv preprint arXiv:1502.05698.
Yosinski, J.; Clune, J.; Fuchs, T.; and Lipson, H. 2015. Un-
derstanding neural networks through deep visualization. In
In ICML Workshop on Deep Learning. Citeseer.
Zeiler, M. D., and Fergus, R. 2014. Visualizing and under-
standing convolutional networks. In European conference
on computer vision, 818–833. Springer.
... Their model can jointly explain the case of failure and provide a solution for recovery. In the work of reference [20], authors investigated rationale generation trained by a corpus of state-action pairs formed by volunteers annotated with natural language explanations in a "think aloud" manner. Since this data set is entirely made of human gameplays, it does not account for the generation of explicit explanations for RL agents. ...
... We focus on generating explanations that imitate a domain expert. Unlike [20], our paper explicitly introduces a complete ARG system pipeline for RL agents. To furtherm address the limitations of the papers [11], we construct two models for generating explanations for a whole strategy and a single action performed by the RL agents. ...
Full-text available
Advances in reinforcement learning (RL) algorithms have made them become increasingly capable in many tasks in recent years.However, the vastmajority of RL algorithms are not readily interpretable off the shelf. Moreover, the task of generating explanations in the form of human language has not been sufficiently addressed for these RL algorithms in previous works. Human language explanations have the advantages of being easy to understand, and they can help increase the satisfaction experienced by the end user while using the product. In this paper, we propose a method for generating explanations in the form of free-text human language to help the end user better understand the behaviors of RL agents. Our work involves generating explanations for both single actions and sequences of actions. We also create an open dataset as a baseline for future research. Our proposed method is evaluated in two simulated environments: Pong and the Minimalistic Gridworld Environment (MiniGgrid). The results demonstrate that our models are able to consistently generate accurate rationales, which are highly correlated with the expert rationales. Hence, this work offers a solution for bridging the gap of trust encountered when employing RL agents in virtual world or real-world applications.
... Such is the case where analogies and examples are introduced or motivations and thought processes are cited that were not originally used by the explainer to derive her/its conclusion. For instance, an AI system might be optimized to produce psychologically satisfying explanations by mimicking the kinds of explanations offered by human experts (Harrison et al., 2017). This mimicry comes at the expense of accurate reporting on the processes employed by the AI system to derive its conclusion. ...
The relationship between human experts and those that seek their advice (novices), and between AI-enabled expert systems and users, are epistemically imbalanced relationships. An epistemically imbalanced relationship is one in which the information source (expert/AI) occupies an epistemically privileged position relative to the novice/user; she/it can utilize capacities, resources, and reasoning techniques to draw conclusions that the novice/user would be unable to access, reproduce, or in some cases, comprehend on her own. The interesting and problematic thing about epistemically imbalanced relationships is that when the epistemically disadvantaged party seeks out expert/AI aid, then in virtue of the novice’s epistemically disadvantaged position, she is not well-equipped to independently confirm the expert/AI’s response. Consider for example, a physician who outlines a cancer treatment regime to a patient. If the physician were then to try to explain to the patient how she decided on that specific regime (including drug doses, timings, etc.) it is not clear how the explanation would help the patient justify her belief in the physician’s claims. If an expert outlines her reasoning in such detail that it provides strong evidence in support of her claim – for instance, such that a series of true premises logically leads to a conclusion – then the novice is unlikely to have the expertise necessary to recognize the evidence as supporting the claim. Accordingly, the question stands, how can the novice, while remaining a novice, acquire justification for her belief in an expert claim? A similar question can be asked of user-AI interactions: How can an AI user, without becoming an expert in the domain in which the AI system is applied, justify her belief in AI outputs? If an answer can be provided in the expert-novice case, then it would seem that we are at least on our way to acquiring an answer for the AI-user case. This dissertation serves a dual purpose as it responds to the above questions. The primary purpose is as an investigation into how AI users can acquire a degree of justification for their belief in AI outputs. I pursue this objective by using the epistemically imbalanced novice-expert relationship as a model to help identify key challenges to user appraisal of AI systems. In so doing, the primary objective is achieved while pursuing the dissertation’s secondary purpose of addressing standing questions about the justification of novice belief in human expert claims. The discussions that follow are framed against an overarching conceptual concern about preserving epistemic security in technologically advanced societies. As my colleagues and I have defined it (Seger et al., 2020), an epistemically secure society is one in which information recipients can reliably identify true information or epistemically trustworthy information sources (human or technological). An investigation into how novices and users might make epistemically well-informed decisions about believing experts and AI systems is therefore an investigation into how we might address challenges to epistemic security posed by epistemically imbalanced relationships.
... Or what training data were most responsible for that choice? The internal state and action representations of the system are translated into NL by techniques such as recurrent neural networks [51], rationale generation [52], adding explanations to a supervised training set such that a model learns to output a prediction as well as an explanation [32]. ...
Full-text available
The development of autonomous vehicles, capable of peer-to-peer communication, as well as the interest in on-demand solutions (e.g., Uber, Lyft, Heetch), are the primary motivations for this study. More precisely, we are interested here in solving the problem of allocating autonomous vehicles in a decentralized manner. A fleet of autonomous vehicles is deployed to respond to numerous requests from different locations in the city. Typically, this problem is solved by centralizing the requests in a portal where a fleet manager assigns them to vehicles. This implies that the vehicles have continuous access to the portal (via a cellular network, for example). However, accessing such a global switching infrastructure (for data collection and order delivery) is costly and represents a critical bottleneck. The idea is to use low-cost vehicle-to-vehicle (V2V) communication technologies to allow vehicles to coordinate without a global communication infrastructure. We propose to model the different aspects of decision and optimization problems related to this more general challenge. Then, the question arises as to the choice between centralized and decentralized solution methods. Methodologically, we explore the directions and compare the performance of distributed constraint optimization techniques (DCOP), self-organized multiagent techniques, market-based approaches, and centralized operations research solutions.
... Authors (Park et al., 2018) also used multimodal justification pairs to generate the model prediction and corresponding textual justification. Further, authors (Ehsan et al., 2018) coined the term AI rationalization as a process to generate human-like explanations of the recommender system behaviour. They have proposed an LSTM encoder-decoder-based narrative generation system that accepts the states, actions, and corresponding annotations from the model. ...
Full-text available
The current XAI techniques present explanations mainly as visuals and structured data. However, these explanations are difficult to interpret for a non-expert user. Here, the use of natural language generation (NLG)-based techniques can help to represent explanations in a human-understandable format. The paper addresses the issue of automatic generation of narratives using a modified transformer approach. Further, due to the unavailability of a relevant annotated dataset for development and testing, the authors also propose a verbalization template approach to generate the same. The input of the transformer is linearized to convert the data-to-text task into text-to-text task. The proposed work is evaluated on a verbalized explained PIMA Indians diabetes dataset and exhibits significant improvement as compared to existing baselines for both manual and automatic evaluation. Also, the narratives provide better comprehensibility to be trusted by human evaluators than the non-NLG counterparts. Lastly, an ablation study is performed in order to understand the contribution of each component.
Full-text available
Artificial Intelligence (AI) systems are increasingly pervasive: Internet of Things, in-car intelligent devices, robots, and virtual assistants, and their large-scale adoption makes it necessary to explain their behaviour, for example to their users who are impacted by their decisions, or to their developers who need to ensure their functionality. This requires, on the one hand, to obtain an accurate representation of the chain of events that caused the system to behave in a certain way (e.g., to make a specific decision). On the other hand, this causal chain needs to be communicated to the users depending on their needs and expectations. In this phase of explanation delivery, allowing interaction between user and model has the potential to improve both model quality and user experience. The XAINES project investigates the explanation of AI systems through narratives targeted to the needs of a specific audience, focusing on two important aspects that are crucial for enabling successful explanation: generating and selecting appropriate explanation content, i.e. the information to be contained in the explanation, and delivering this information to the user in an appropriate way. In this article, we present the project’s roadmap towards enabling the explanation of AI with narratives.
Recent applications of autonomous agents and robots have brought attention to crucial trust-related challenges associated with the current generation of artificial intelligence (AI) systems. AI systems based on the connectionist deep learning neural network approach lack capabilities of explaining their decisions and actions to others, despite their great successes. Without symbolic interpretation capabilities, they are ’black boxes’, which renders their choices or actions opaque, making it difficult to trust them in safety-critical applications. The recent stance on the explainability of AI systems has witnessed several approaches to eXplainable Artificial Intelligence (XAI); however, most of the studies have focused on data-driven XAI systems applied in computational sciences. Studies addressing the increasingly pervasive goal-driven agents and robots are sparse at this point in time. This paper reviews approaches on explainable goal-driven intelligent agents and robots, focusing on techniques for explaining and communicating agents’ perceptual functions (e.g., senses, vision) and cognitive reasoning (e.g., beliefs, desires, intention, plans, and goals) with humans in the loop. The review highlights key strategies that emphasize transparency, understandability, and continual learning for explainability. Finally, the paper presents requirements for explainability and suggests a road map for the possible realization of effective goal-driven explainable agents and robots.
Conference Paper
Artificial Intelligence (AI) has gained notable momentum, culminating in the rise of intelligent machines that deliver unprecedented levels of performance in many application sectors across the field. In recent years, the sophistication of these systems has increased to an extent where almost no human intervention is required for their deployment. A crucial feature for the practical deployment of AI-powered systems in critical decision-making processes is the ability to understand how these systems derive their decisions. Accordingly, the AI community is confronted with the barrier of explaining the reasoning behind machine-made decisions. Paradigms underlying this problem fall within the field of eXplainable AI (XAI). Research in this field has introduced various methods to shed light into black box models such as deep neural networks. While local explanation methods explain the reasoning behind an output for a single decision, global explanations aim to describe the general behaviour of a model, i.e. for all decisions. This paper investigates users' perceptions of local and global explanations generated with popular XAI methods --- LIME, SHAP, and PDP --- by conducting a survey to find which of the explanations are preferred by different users. Meanwhile, two hypotheses are tested: first, explanations increase users' trust in a system, and second, AI novices prefer local over global explanations. The results show that explanations from PDP achieved the best user evaluation among the considered XAI methods.
Stirred media mills are used in many industries from mechanochemistry, to ore crushing, battery production and pharmaceutics. Investigation and modelling can be a costly, difficult and time consuming endeavour, limiting the investigable parameter space with some aspects only being accessible via numerical methods such as simulations or artificial intelligence (AI). Simulations allow investigations of inner mechanisms and generate large quantities of data while being computationally demanding. AI requires large data quantities yet is applicable with limited computing power for large parameter spaces. For this reason, the intelligent integration of different methods holds great promise to achieve maximum benefits at minimum cost by synergistically combining respective advantages. Here, a multi-modal framework is demonstrated, combining experiments, simulations and AI. In this first paper direct magnetic measurements via a tracer particle are compared with simulations of wet mills. With two way coupled CFD-DEM simulations inner mechanisms like the impact of disc geometry, tip speed and grinding bead size are investigated. Differences in translational and relative velocity distributions are investigated, showing a consistent displacement of a factor three toward higher energies throughout the complete distribution for the hole disc in comparison to the full disc, and less consistent deviations for translational velocity, specifically for the higher energy collisions, being larger by a factor of 8. Also comparisons with the existing mechanistic model of Kwade are performed. An additional indirect influence of diameter on tip speed could be detected. In a second publication AI modelling will be performed on the basis of the simulations.
Explainable AI techniques that describe agent reward functions can enhance human-robot collaboration in a variety of settings. However, in order to effectively explain reward information to humans, it is important to understand the efficacy of different types of explanation techniques in scenarios of varying complexity. In this letter, we compare the performance of a broad range of explanation techniques in scenarios of differing reward function complexity through a set of human-subject experiments. To perform this analysis, we first introduce a categorization of reward explanation information types and then apply a suite of assessments to measure human reward understanding. Our findings indicate that increased reward complexity (in number of features) corresponded to higher workload and decreased reward understanding, while providing direct reward information was an effective approach across reward complexities. We also observed that providing full or near full reward information was associated with increased workload and that providing abstractions of the reward was more effective at supporting reward understanding than other approaches (besides direct information) and was associated with decreased workload and improved subjective assessment in high complexity settings.
Conference Paper
Full-text available
Understanding predictive models, in terms of interpreting and identifying actionable insights, is a challenging task. Often the importance of a feature in a model is only a rough estimate condensed into one number. However, our research goes beyond these naïve estimates through the design and implementation of an interactive visual analytics system, Prospector. By providing interactive partial dependence diagnostics, data scientists can understand how features affect the prediction overall. In addition, our support for localized inspection allows data scientists to understand how and why specific datapoints are predicted as they are, as well as support for tweaking feature values and seeing how the prediction responds. Our system is then evaluated using a case study involving a team of data scientists improving predictive models for detecting the onset of diabetes from electronic medical records.
Full-text available
In recent years, Mechanical Turk (MTurk) has revolutionized social science by providing a way to collect behavioral data with unprecedented speed and efficiency. However, MTurk was not intended to be a research tool, and many common research tasks are difficult and time-consuming to implement as a result. TurkPrime was designed as a research platform that integrates with MTurk and supports tasks that are common to the social and behavioral sciences. Like MTurk, TurkPrime is an Internet-based platform that runs on any browser and does not require any downloads or installation. Tasks that can be implemented with TurkPrime include: excluding participants on the basis of previous participation, longitudinal studies, making changes to a study while it is running, automating the approval process, increasing the speed of data collection, sending bulk e-mails and bonuses, enhancing communication with participants, monitoring dropout and engagement rates, providing enhanced sampling options, and many others. This article describes how TurkPrime saves time and resources, improves data quality, and allows researchers to design and implement studies that were previously very difficult or impossible to carry out on MTurk. TurkPrime is designed as a research tool whose aim is to improve the quality of the crowdsourcing data collection process. Various features have been and continue to be implemented on the basis of feedback from the research community. TurkPrime is a free research platform.
Full-text available
Recent years have produced great advances in training large, deep neural networks (DNNs), including notable successes in training convolutional neural networks (convnets) to recognize natural images. However, our understanding of how these models work, especially what computations they perform at intermediate layers, has lagged behind. Progress in the field will be further accelerated by the development of better tools for visualizing and interpreting neural nets. We introduce two such tools here. The first is a tool that visualizes the activations produced on each layer of a trained convnet as it processes an image or video (e.g. a live webcam stream). We have found that looking at live activations that change in response to user input helps build valuable intuitions about how convnets work. The second tool enables visualizing features at each layer of a DNN via regularized optimization in image space. Because previous versions of this idea produced less recognizable images, here we introduce several new regularization methods that combine to produce qualitatively clearer, more interpretable visualizations. Both tools are open source and work on a pre-trained convnet with minimal setup.
Conference Paper
Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally varound the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.
Conference Paper
Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust in a model. Trust is fundamental if one plans to take action based on a prediction, or when choosing whether or not to deploy a new model. Such understanding further provides insights into the model, which can be used to turn an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction. We further propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). The usefulness of explanations is shown via novel experiments, both simulated and with human subjects. Our explanations empower users in various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and detecting why a classifier should not be trusted.
We aim to produce predictive models that are not only accurate, but are also interpretable to human experts. Our models are decision lists, which consist of a series of if...then... statements (e.g., if high blood pressure, then stroke) that discretize a high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. We introduce a generative model called Bayesian Rule Lists that yields a posterior distribution over possible decision lists. It employs a novel prior structure to encourage sparsity. Our experiments show that Bayesian Rule Lists has predictive accuracy on par with the current top algorithms for prediction in machine learning. Our method is motivated by recent developments in personalized medicine, and can be used to produce highly accurate and interpretable medical scoring systems. We demonstrate this by producing an alternative to the CHADS$_2$ score, actively used in clinical practice for estimating the risk of stroke in patients that have atrial fibrillation. Our model is as interpretable as CHADS$_2$, but more accurate.
An attentional mechanism has lately been used to improve neural machine translation (NMT) by selectively focusing on parts of the source sentence during translation. However, there has been little work exploring useful architectures for attention-based NMT. This paper examines two simple and effective classes of attentional mechanism: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time. We demonstrate the effectiveness of both approaches over the WMT translation tasks between English and German in both directions. With local attention, we achieve a significant gain of 5.0 BLEU points over non-attentional systems which already incorporate known techniques such as dropout. Our ensemble model using different attention architectures has established a new state-of-the-art result in the WMT'15 English to German translation task with 25.9 BLEU points, an improvement of 1.0 BLEU points over the existing best system backed by NMT and an n-gram reranker.