ArticlePDF Available

Abstract

What makes for an explanation of “black box” AI systems such as Deep Nets? We reviewed the pertinent literatures on explanation and derived key ideas. This set the stage for our empirical inquiries, which include conceptual cognitive modeling, the analysis of a corpus of cases of "naturalistic explanation" of computational systems, computational cognitive modeling, and the development of measures for performance evaluation. The purpose of our work is to contribute to the program of research on “Explainable AI.” In this report we focus on our initial synthetic modeling activities and the development of measures for the evaluation of explainability in human-machine work systems.
EXPLAINING EXPLANATION FOR “EXPLAINABLE AI”
Robert R. Hoffman
Institute for Human and Machine Cognition
Pensacola, FL
Gary Klein
Macrocognition, LLC
Washington, DC
Shane T. Mueller
Michigan Technological University
Houghton, MI
ABSTRACT
What makes for an explanation of "black box" AI systems such as Deep Nets? We reviewed the
pertinent literatures on explanation and derived key ideas. This set the stage for our empirical
inquiries, which include conceptual cognitive modeling, the analysis of a corpus of cases of
"naturalistic explanation" of computational systems, computational cognitive modeling, and the
development of measures for performance evaluation. The purpose of our work is to contribute to
the program of research on “Explainable AI.” In this report we focus on our initial synthetic
modeling activities and the development of measures for the evaluation of explainability in
human-machine work systems.
INTRODUCTION
The importance of explanation in AI has been
emphasized in the popular press, with considerable
discussion of the explainability of Deep Nets and
Machine Learning systems (e.g., Kuang, 2017). For
such “black box” systems, there is a need to explain
how they work so that users and decision makers can
develop appropriate trust and reliance. As an
example, referencing Figure 1, a Deep Net that we
created was trained to recognize types of tools.
Figure 1. Some examples of Deep Net classification.
Outlining the axe and overlaying bird silhouettes
on it resulted in a confident misclassification. While a
fuzzy hammer is correctly classified, an embossed
rendering is classified as a saw. Deep Nets can
classify with high hit rates for images that fall within
the variation of their training sets, but are nonetheless
easily spoofed using instances that humans find easy
to classify. Furthermore, Deep Nets have to provide
some classification for an input. Thus, a Volkswagen
might be classified as a tulip by a Deep Net trained to
recognize types of flowers. So, if Deep Nets do not
actually possess human-semantic concepts (e.g., that
axes have things that humans call "blades"), what do
the Deep Nets actually "see"? And more directly,
how can users be enabled to develop appropriate trust
and reliance on these AI systems?
Articles in the popular press highlight the
successes of Deep Nets (e.g., the discovery of
planetary systems in Hubble Telescope data;
Temming 2018), and promise diverse applications "...
the recognition of faces, handwriting, speech...
navigation and control of autonomous vehicles... it
seems that neural networks are being used
everywhere" (Lucky, 2018, p. 24).
And yet "models are more complex and less
interpretable than ever... Justifying [their] decisions
will only become more crucial" (Biran and Cotton,
2017, p. 4). Indeed, a proposed regulation before the
European Union (Goodman and Flaxman, 2016)
asserts that users have the "right to an explanation.”
What form must an explanation for Deep Nets take?
This is a challenge in the DARPA "Explainable
AI" (XAI) Program: To develop AI systems that can
engage users in a process in which the mechanisms
and "decisions" of the AI are explained. Our tasks on
the Program are to:
(1). Integrate philosophical studies and psychological
research in order to identify consensus points, key
concepts and key variables of explanatory reasoning,
(2). Develop and validate measures of explanation
goodness, explanation satisfaction, mental models
and human-XAI performance,
(3) Develop and evaluate a computational model of
how people understand computational devices, and
Copyright 2018 by Human Factors and Ergonomics Society. DOI 10.1177/1541931218621047
Proceedings of the Human Factors and Ergonomics Society 2018 Annual Meeting 197
then evaluate the model using the validated
measures,
(4). Generate a corpus of cases in which people try to
explain the workings of complex systems, especially,
computational systems work, and
(5) From the case analysis create a "naturalistic
decision making" model of explanation that can guide
the development of XAI systems by computer
scientists.
In this presentation we report progress on our
synthesis of ideas and concepts of explanation, the
development of models of the explanation process,
and the development of metrics.
LITERATURE SYNTHESIS
A thorough analysis of the subject of explanation
would have to cover literatures spanning the history
of Western philosophy: Disciplines including
philosophy and psychology of science, cognitive
psychology, psycholinguistics, and expert systems.
The archive we created includes over 700 papers.
Psychology
The challenge of XAI entrains concepts of
representation, modeling, language understanding,
and learning. Concepts that are entrained include
abductive inference, causal reasoning, mental
models, and self-explanation. Potentially measurable
features of explanation include: various forms of
explanation (e.g., contrastive explanation,
counterfactual reasoning, mechanistic explanation,
etc.); various utilities or uses of explanation (e.g.,
diagnosis, prediction), the limitations or foibles of
explanatory reasoning (e.g., people will believe
explanations to be good even when they contain
flaws or gaps in reasoning) (Lombrozo & Carey,
2006).
Many researchers present a list of the features
that are believed to characterize “good” explanations
(e.g., Brezillon and Pomerol, 1997). These include
context- or goal-relevance, reference to cause-effect
covariation and temporal contiguity, and plausibility.
There are also some contradictions in the literature:
Some assert that good explanations are simple; others
assert that good explanations are complete. Clearly,
good explanations fall in the sweet spot between
detail and comprehensibility.
A number of conceptual psychological models of
the explanation process have been presented in the
research literature. The first step in the model of
Krull and Anderson (1994), the noticing of an event,
is reminiscent of the first step in C.S. Peirce's model
of abduction (1891), that is, the observation of
something that is interesting or surprising.
Subsequent steps are Intuitive Explanation, Problem
Formulation and Problem Resolution. The model is
not specific about what is involved in these steps, but
is explicit about the role of motivation and effort.
Johnson and Johnson (1993) studied an
explanation process in which experts explained to
novices the processes of statistical data analysis.
Transcripts of explainer-learner dialogs were
analyzed. A key finding was that the explainer would
present additional declarative or procedural
knowledge at those points in the task tree where sub-
goals had been achieved. The Johnson and Johnson
model is expressed as a chain of events in which the
explainer provides analogies, instructions, and
justifications.
Artificial Intelligence
AI has has a history of work on explanation. (A
review of the literature, with a bibliography, is
available from the authors.) Starting with the first
generation of expert systems, it has generally been
held that explanations must present easy-to-
understand coherent stories in order to ensure good
use of the AI or good performance of the human-
machine work system (Biran & Cotton, 2017;
Clancey, 1986).
Attempts to explain Deep Nets have often taken
contrastive approaches. These include occlusion (e.g.,
Zeiler & Fergus, 2014), which shows how
classifications differ as regions are removed from an
image, and counter-examples (e.g., Shafto, Goodman,
& Griffiths, 2014). A limitation of these approaches
is that they conflate explanation and justification. So,
for example, one team of computer scientists might
“explain” how their Deep Net works by showing a
matrix of node weights at the multiple layers within a
network. This works as a justification of the
architecture to computer scientists but does not work
for explaining the Deep Net to a human user who is
not a computer scientist. Furthermore, the focus of
the contrastive approaches is "local" explanation, that
is, explaining why the AI made a particular
determination for a particular case. An example
would be to show the user a heat map that highlights
the eyes and beak of a bird, accompanied by a brief
statement that the beak and eye features make this
bird a sparrow. This is different from "global”
explanation, which is aimed at explaining how an AI
system works in general (e.g., Doshi-Velez and Kim,
2017). Finally, explainability is often conflated with
interpretability, which is a formal/logical notion in
computer science. The fact that a computer system is
interpretable does not mean that it is human
understandable; the formal interpretation has
explanatory value only to computer scientists.
From these literatures, we have identified some
key concepts that serve as guidelines to consider in
the development of XAI systems.
Proceedings of the Human Factors and Ergonomics Society 2018 Annual Meeting 198
KEY CONCEPTS
(1). Explaining is a Continuous Process.
Humans are motivated to “understand the goals,
intent, contextual awareness, task limitations, [and]
analytical underpinnings of the system in an attempt
to verify its trustworthiness” (Lyons, et al., 2017).
One of the consensus points coming from the
philosophy of science is that explanations have a
heuristic function: They guide further inquiry. The
delivery of an explanation is not always an end point.
Indeed, it must be thought of as a continuous process
since the XAI system that provides explanations must
enable the user to develop appropriate trust and
reliance in the AI system with continued experience.
The user must be able to actively explore the states
and choices of the AI, especially when the system is
operating close to its boundary conditions, including
when it makes errors (see Amerishi, et al., 2015).
How can XAI work in concert with the AI to
empower learning-during-use?
(2). Explaining is a Co-adaptive Process. Many
conceptual models, such as that of Johnson and
Johnson (1993) assume that the explanation process
is a one-way street: The explainer presents
information and instruction to the explainee. In
addition, conceptual models typically assume that an
explanation can be “satisfying,” implying that it is a
process with clear-cut beginning and end points (the
delivery of instructional material that the user simply
assimilates). An alternative view is that explanation
is a collaboration or co-adaptive process involving, in
the case of XAI, the learner/user and the system.
“Explanations improve cooperation, cooperation
permits the production of relevant explanations”
(Brezillon and Pomerol, 1997, p. 7; Moore &
Swartout, 1991). This is the concept of “participatory
explanation,” similar to the notion of “recipient
design" in the conversation analysis literature, i.e.,
that messages must be composed so as to be sensitive
to what the recipient of the message is understanding
(Sacks & Schegloff, 1974). An assumption in some
of the first generation of AI-explanation and
intelligent tutoring systems was that it is only the
human who has to learn, or change, as a result of
explanations offered by the machine.
(3). Explanation Triggers. Not everything needs
to be explained, and explanations are quite often
triggered by violations of expectation. Explanations
among people serve the purpose of clarifying
unexpected behavior, and so a good explainable
system may need to understand what are the
appropriate triggers of explanation.
(4). Self-explanation. Psychological research has
demonstrated that self-explanation improves learning
and understanding. This finding holds for both self-
motivated explanation and self-explanation that is
prompted by the instructor (Chi, Leeuw, Chiu, &
LaVancher, 1994).
(5). Explanation as Exploration. An important
mode of explanation is helping the user understand
the boundaries of the intelligent system (Mueller, et
al. 2011). System developers are often reluctant to
tell the user what the system cannot do—until they
misuse it. Famously, Tesla’s autopilot system is
touted as a self-driving car, except when accidents
occur and the user is blamed for operating it in
circumstances in which it was not intended to be
used. Clarifying boundary conditions can help
produce appropriate trust, so that the user knows
when to rely on the system, and when to take over.
(6). Contrast Cases. When forming explanations
of intelligent systems, it can be as important to tell
what is not being done as to tell what is being done.
Contrastive reasoning has been identified as central
to all explanation (e.g., Miller, Howe, & Sonenberg,
2017) and it can be an effective way to help the user
understand why an expectation was violated. For
example, an explainable GPS system might explain
why a turn was made by describing why a (normally
shorter) route was not taken.
MEASURES
One purpose of our cognitive modeling is to
highlight the key concepts that must be mated with
measures and metrics. The creation of AI systems
that can explain themselves will require a number of
types of measures.
Explanations generated by the AI can be
evaluated in terms of the goodness criteria, of what
makes an explanation good, according to the
research literature. From a roster of those criteria we
developed an "Explanation Satisfaction Scale," which
has been evaluated using the Content Validity Ratio
method (Lawshe, 1975), and following that a test of
discriminant validity which resulted in a very high
Cohen's alpha ~.80. The final scale consists of seven
Likert items that reference understandability,
satisfyingness, detail, accuracy, completeness,
usability, usefulness, and trustworthiness. This scale
may be used by AI researchers in the XAI Program to
evaluate the explanations that their systems produce
but might be used in other applications as well.
Effective use of intelligent systems depends on
user mental models (Kass & Finn, 1988). These have
to be elicited and evaluated. In the XAI Program they
can be elicited using some forms of structured
interview in which users express their understanding
of the AI system, with the protocols compared for
their propositional concordance with explanations
provided by experts. Based on the literature, we have
Proceedings of the Human Factors and Ergonomics Society 2018 Annual Meeting 199
developed a guidebook that details a variety of
methods for eliciting mental models.
Finally, the evaluation of XAI systems will
measure the change in performance attributable to the
explaining process, via controlled experimentation.
Performance can be evaluated in a number of ways.
Good explanations should enable the user to:
Efficiently and effectively use the AI in their work,
for the purposes that the AI is intended to serve.
Correctly predict what the AI system will do for
given cases. This can include cases that the AI gets
right and also cases it gets wrong (e.g., failures,
anomalies).
Explain how the AI works to other people.
Correctly assess whether a system determination is
correct, and thereby have appropriate trust.
Judge when and how to rely on the AI even while
knowing the boundary conditions of the
competence of the AI, and thereby having
appropriate reliance.
Experiments will have to evaluate the learning
that occurs during training as well as during
performance. These experiments will have to take
into account the difference between global and local
explanations. These key variables are modeled in
Figure 1, which appears following the References.
DEVELOPING A NATURALISTIC MODEL
Another aspect of our effort in XAI is to
develop a “naturalistic” model of explanation based
on the analysis of a corpus of cases in which people
create explanations of complex situations or systems.
The trigger for local explanations is typically a
violated expectancy. “Why did it do that?” signifies a
surprise, and calls for an account to revise the
violated expectancy. And this process requires the
explainer to diagnose what user expectations need
revision — where is the learner's mental model
flawed or incomplete. Second, many AI systems start
with a complete account and then try to whittle this
account down into something manageable, but if the
trigger for a local explanation is a violated
expectancy then the process of explaining is aimed at
the flawed expectancy, and no whittling down is
needed. Third, what is the stopping point for
explaining something? AI systems do not have a
clear stopping point whereas our initial review of
naturalistic cases suggests that the stopping point is a
perspective shift in which the user moves from “Why
did it do that?” to “Now I see that in this situation I
would have done the same.” The current state of art
for AI systems does not take perspective shifts into
account.
ACKNOWLEDGEMENT
This material is based on research sponsored DARPA
under agreement number FA8650-17-2-7711 The U.S.
Government is authorized to reproduce and distribute
reprints for Governmental purposes notwithstanding any
copyright notation thereon. The views and conclusions
contained herein are those of the authors and should not be
interpreted as necessarily representing the official policies
or endorsements, either expressed or implied, of DARPA,
AFRL or the U.S. Government.
REFERENCES
Amershi, S., Chickering, M., Drucker, S.M., Lee, B., Simard, P., &
Suh, J. (2015). Modeltracker: Redesigning performance analysis
tools for machine learning. In Proceedings of the 33rd Annual
ACM Conference on Human Factors in Computing Systems (pp.
337–346). New York: Association for Computing Machinery.
Biran, O., & Cotton, C. (2017). Explanation and Justification in
Machine Learning: A Survey. IJCAI-17 Workshop on Explainable
Artificial Intelligence (XAI).
Brézillon, P., & Pomerol, J.-C. (1997). Joint cognitive systems,
cooperative systems and decision support systems: A cooperation
in context. In Proceedings of the European Conference on
Cognitive Science, Manchester (pp. 129–139).
Chi, M.T., Leeuw, N., Chiu, M.-H., & LaVancher, C. (1994).
Eliciting self explanations improves understanding. Cognitive
Science, 18(3), 439–477.
Clancey, W.J. (1986a). From GUIDON to NEOMYCIN and
HERACLES in twenty short lessons. AI Magazine, 7(3), 40.
Doshi-Velez, F., & Kim, B. (2017). A Roadmap for a Rigorous
Science of Interpretability. ArXiv Preprint ArXiv:1702.08608.
Retrieved from https://arxiv.org/abs/1702.08608
Goodman, B., & Flaxman, S. (2016). European Union regulations
on algorithmic decision-making and a “right to explanation.”
Presented at the ICML Workshop on Human Interpretability in
Machine Learning, New York, NY.
Johnson, H., & Johnson, P. (1993). Explanation Facilities and
Interactive Systems. In Proceedings of the 1st International
Conference on Intelligent User Interfaces (pp. 159–166). New
York: Association for Computing Machinery.
Kass, R., & Finin, T. (1988). The Need for User Models in
Generating Expert System Explanation. International Journal of
Expert Systems, 1(4), 345–375.
Krull, D. S., & Anderson, C. A. (1997). The process of explanation.
Current Directions in Psychological Science, 6(1), 1–5.
Kuang, C. (2017, 21 November). Can A.I. be taught to explain
itself? The New York Times. Retrieved from
https://www.nytimes.com/2017/11/21/magazine/can-ai-be-taught-
to-explain-itself.html
Lawshe, C. H. (1975). A quantitative approach to content validity.
Personnel Psychology, 28, 563–575.
Lombrozo, T., & Carey, S. (2006). Functional explanation and the
function of explanation. Cognition, 99, 167–204.
https://doi.org/10.1016/j.cognition.2004.12.009
Lucky, R.W. (2018, January). The mind of neural networks. IEEE
Spectrum, p. 24.
Lyons, J.B., Clark, M.A., Wagner, A.R., & Schuelke, M.J. (2017).
Certifiable trust in autonomous systems: Making the intractable
tangible. AI Magazine, 38(3), 37–49.
Proceedings of the Human Factors and Ergonomics Society 2018 Annual Meeting 200
Miller, T., Howe, P., & Sonenberg, L. (2017). Explainable AI:
Beware of Inmates Running the Asylum. In Proceedings of the
International joint Conference on Artificial Intelligence (IJCAI-17)
Workshop on Explainable Artificial Intelligence (XAI).
Moore, J. D., & Swartout, W. R. (1991). A reactive approach to
explanation: taking the user’s feedback into account. In C. Paris,
W.R. Swartyout, & W.C. Mann (Eds.), Natural language
generation in artificial intelligence and computational linguistics
(pp. 3–48). New York: Springer.
Mueller, S.T. & Klein, G. (March-April 2011). Improving users’
mental models of intelligent software tools. IEEE: Intelligent
Systems, 26(2), 77—83.
Pirolli, P., & Card, S. (2005). The sensemaking process and
leverage points for analyst technology as identified through
cognitive task analysis. In Proceedings of International
Conference on Intelligence Analysis (pp. 2–4). Washington, DC:
Office of the Assistant Director of Central Intelligence for
Analysis and Production.
Sacks, H. & Schegloff, E. (1974). A simplest systematics for the
organization of turn-taking for conversation. Language, 50, 696-
735
Shafto, P., & Goodman, N. (2008). Teaching games: Statistical
sampling assumptions for learning in pedagogical situations. In
Proceedings of the 30th annual conference of the Cognitive
Science Society (pp. 1632–1637). Austin, TX: Cognitive Science
Society Austin.
Temming, M. (2018, 20 January). AI has found an 8-planet system
like ours in Keppler data. Science News, p. 12.
Zeiler, M. D., Krishnan, D., Taylor, G. W., & Fergus, R. (2010).
Deconvolutional networks. In Proceedings of the 2010 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR)
(pp. 2528–2535). New York: IEEE.
Figure 1. Explanation spans the training and performance contexts, but
in doing so requires different kinds of explanation.
Proceedings of the Human Factors and Ergonomics Society 2018 Annual Meeting 201
... The theory of mental models (TMM) posits that perceived trust in AI-DSS can be increased through explanations for how the model works (e.g., Hoffman et al. 2018a, Hoffman et al. 2018b. ...
Preprint
Full-text available
In the context of AI decision support systems (AI-DSS), we argue that meeting the demands of ethical and explainable AI (XAI) is about developing AI-DSS to provide human decision-makers with three types of human-grounded explanations: reasons, counterfactuals, and confidence, an approach we refer to as the RCC approach. We begin by reviewing current empirical XAI literature that investigates the relationship between various methods for generating model explanations (e.g., LIME, SHAP, Anchors), the perceived trustworthiness of the model, and end-user accuracy. We demonstrate how current theories about what constitutes good human-grounded reasons either do not adequately explain this evidence or do not offer sound ethical advice for development. Thus, we offer a novel theory of human-machine interaction: the theory of epistemic quasi-partnerships (EQP). Finally, we motivate adopting EQP and demonstrate how it explains the empirical evidence, offers sound ethical advice, and entails adopting the RCC approach.
... Human-centred AI (HCAI) can provide an adequate framework and mindset to achieve this goal. HCI professionals should explore how people and AI work together, understand human-AI collaboration, and consider users' perspectives when developing AI systems [29,53]. ...
Conference Paper
Full-text available
In the standard interaction model of clinical decision support systems, the system makes a recommendation, and the clinician decides whether to act on it. However, this model can compromise the patient-centeredness of care and the level of clinician involvement. There is scope to develop alternative interaction models, but we need methods for exploring and comparing these to assess how they may impact clinical decision-making. Through collaborating with clinical, AI safety, and HCI experts, and patient representatives, we co-designed a number of alternative human-AI interaction models for clinical decision-making. We then translated these models into ‘Wizard of Oz’ prototypes, where we created clinical scenarios and designed user interfaces with different types of AI output. In this paper, we present alternative models of human-AI interaction and illustrate how we used a co-design approach to translate them into functional prototypes that can be tested with users to explore potential impacts on clinical decision-making.
... Examples of such techniques are Local Interpretable Model-agnostic Explanation (LIME), SHap-ley Additive exPlanations (SHAPE), surrogate models, partial dependency plots, sensitivity analysis, and feature importance [28]. From the perspective of the philosophy of science, explanations have a heuristic function, should guide for further inquiry and, since their delivery is not always an end point, must be considered a continuous process [30]. All of these VOLUME 11, 2023 functions are typically not provided for the aforementioned techniques. ...
Article
Full-text available
The widespread use of artificial intelligence (AI) in more and more real-world applications is accompanied by challenges that are not obvious at first glance. In machine learning, class imbalance, characterized by an imbalance in the frequency of classes, is one key challenge that poses essential problems for many common machine learning algorithms. This challenge led to the development of various countermeasures to tackle class imbalance. Although these countermeasures improve the prediction performance of models, they often jeopardize interpretability for both AI users and AI experts. Especially in sensitive domains where class imbalance is regularly present, for example, medicine, meteorology, or fraud detection, interpretability is of utmost importance. In this paper, we evaluate the effect of class imbalance countermeasures on interpretability with methods of explainable AI (XAI). Our work contributes to a more in-depth understanding of these countermeasures and connects the research fields of class imbalance learning and XAI. Our experimental results suggest that only feature selection and cost-sensitive approaches are the only class imbalance countermeasures that preserve interpretability for both AI users and AI experts. In contrast, resampling and most classification algorithms for imbalance learning are not suitable in settings where knowledge should be derived and where interpretability is a key requirement.
... The creation of these novel systems has brought new challenges. For instance, since these models are sometimes perceived as non-deterministic "black-boxes, " users can have a hard time discerning how these interfaces produce their outputs [11,62]. Likewise, given the massive amount of data these models are trained on, there are new concerns around the bias and privacy of such systems [17,85,193]. ...
Preprint
Full-text available
Large language models (LLMs) exhibit dynamic capabilities and appear to comprehend complex and ambiguous natural language prompts. However, calibrating LLM interactions is challenging for interface designers and end-users alike. A central issue is our limited grasp of how human cognitive processes begin with a goal and form intentions for executing actions, a blindspot even in established interaction models such as Norman's gulfs of execution and evaluation. To address this gap, we theorize how end-users `envision' translating their goals into clear intentions and craft prompts to obtain the desired LLM response. We define a process of Envisioning by highlighting three misalignments on not knowing: (1) what the task should be, (2) how to instruct the LLM to do the task, and (3) what to expect for the LLM’s output in meeting the goal. Finally, we make recommendations to narrow the gulf of envisioning in human-LLM interactions.
Article
As artificial intelligence (AI) methods are increasingly used to develop new guidance intended for operational use by forecasters, it is critical to evaluate whether forecasters deem the guidance trustworthy. Past trust-related AI research suggests that certain attributes (e.g., understanding how the AI was trained, interactivity, and performance) contribute to users perceiving the AI as trustworthy. However, little research has been done to examine the role of these and other attributes for weather forecasters. In this study, we conducted 16 online interviews with National Weather Service (NWS) forecasters to examine (i) how they make guidance use decisions and (ii) how the AI model technique used, training, input variables, performance, and developers as well as interacting with the model output influenced their assessments of trustworthiness of new guidance. The interviews pertained to either a random forest model predicting the probability of severe hail or a 2D convolutional neural network model predicting the probability of storm mode. When taken as a whole, our findings illustrate how forecasters’ assessment of AI guidance trustworthiness is a process that occurs over time rather than automatically or at first introduction. We recommend developers center end users when creating new AI guidance tools, making end users integral to their thinking and efforts. This approach is essential for the development of useful and used tools. The details of these findings can help AI developers understand how forecasters perceive AI guidance and inform AI development and refinement efforts. Significance Statement We used a mixed-methods quantitative and qualitative approach to understand how National Weather Service (NWS) forecasters 1) make guidance use decisions within their operational forecasting process and 2) assess the trustworthiness of prototype guidance developed using artificial intelligence (AI). When taken as a whole, our findings illustrate that forecasters’ assessment of AI guidance trustworthiness is a process that occurs over time rather than automatically and suggest that developers must center the end user when creating new AI guidance tools to ensure that the developed tools are useful and used .
Chapter
This research seeks to analyze how the use of smart data analytics solutions by tourism stakeholders can promote the prediction of responsible customer behavior. After a theoretical and conceptual framework, the theoretical study is based on an exploratory literature review. It revealed the adoption of a range of smart data analytics solutions by businesses and tourist destinations in order to better predict attitudes and concrete responsible actions reflecting the eco-responsible behavior of tourists. Despite the inherent limitations of the approach, the results show multiple good practices that can be adopted by industry players to better understand, anticipate, and even guide the responsible behavior of their customers and prospects.
Article
As Artificial Intelligence (AI) systems increase in capability, so there are growing concerns over the ways in which the recommendations they provide can affect people's everyday life and decisions. The field of Explainable AI (XAI) aims to address such concerns but there is often a neglect of the human in this process. We present a formal definition of human-centred XAI and illustrate the application of this formalism to the design of a user interface. The user interface supports users in indicating their preferences relevant to a situation and to compare their preferences with those of a computer recommendation system. A user trial is conducted to evaluate the resulting user interface. From the user trial, we believe that users are able to appreciate how their preferences can influence computer recommendations, and how these might contrast with the preferences used by the computer. We provide guidelines of implementing human-centred XAI.
Conference Paper
[Context and motivation] Explainability is a software quality aspect that is gaining relevance in the field of requirements engineering. The complexity of modern software systems is steadily growing. Thus, understanding how these systems function becomes increasingly difficult. At the same time, stakeholders rely on these systems in an expanding number of crucial areas, such as medicine and finance. [Question/problem] While a lot of research focuses on how to make AI algorithms explainable, there is a lack of fundamental research on explainability in requirements engineering. For instance, there has been little research on the elicitation and verification of explainability requirements. [Principal ideas/results] Quality models provide means and measures to specify and evaluate quality requirements. As a solid foundation for our quality model, we first conducted a literature review. Based on the results, we then designed a user-centered quality model for explainability. We identified ten different aspects of explainability and offer criteria and metrics to measure them. [Contribution] Our quality model provides metrics that enable software engineers to check whether specified explainability requirements have been met. By identifying different aspects of explainability, we offer a view from different angles that consider different goals of explanations. Thus, we provide a foundation that will improve the management and verification of explainability requirements.
Article
Full-text available
Scholars often recommend incorporating context into the design of an explainable artificial intelligence (XAI) model in order to ensure successful real-world adoption. However, contemporary literature has so far failed to delve into the detail of what constitutes context. This paper addresses that gap by firstly providing normative and XAI-specific definitions of key concepts, thereby establishing common ground upon which further discourse can be built. Second, far from pulling apart the body of literature to argue that one element of context is more important than another, this paper advocates a more holistic perspective which unites the recent discourse. Using a thematic review, this paper establishes that the four concepts of setting, audience, goals and ethics (SAGE) are widely recognized as key tools in the design of operational XAI solutions. Moreover, when brought together they can be employed as a scaffold to create a user-centric XAI real-world solution.
Article
Full-text available
In his seminal book `The Inmates are Running the Asylum: Why High-Tech Products Drive Us Crazy And How To Restore The Sanity' [2004, Sams Indianapolis, IN, USA], Alan Cooper argues that a major reason why software is often poorly designed (from a user perspective) is that programmers are in charge of design decisions, rather than interaction designers. As a result, programmers design software for themselves, rather than for their target audience; a phenomenon he refers to as the `inmates running the asylum'. This paper argues that explainable AI risks a similar fate. While the re-emergence of explainable AI is positive, this paper argues most of us as AI researchers are building explanatory agents for ourselves, rather than for the intended users. But explainable AI is more likely to succeed if researchers and practitioners understand, adopt, implement, and improve models from the vast and valuable bodies of research in philosophy, psychology, and cognitive science; and if evaluation of these models is focused more on people than on technology. From a light scan of literature, we demonstrate that there is considerable scope to infuse more results from the social and behavioural sciences into explainable AI, and present some key results from these fields that are relevant to explainable AI.
Article
Full-text available
Intelligent software algorithm are increasingly becoming a tool in consumers daily lives. Users understand the basic mechanics of the intelligent software systems they rely on, but often novices have no direct knowledge of their intelligent devices algorithm, data requirements, limitations, and representations. Problems can go beyond those caused by a poor user interface design and a user's ability to under stand a tool's simple components, which could be alleviated with proper instruction. This article describes the Experiential User Guide (EUG), a concept designed to address these challenges.
Article
This article discusses verification and validation (V&V) of autonomous systems, a concept that will prove to be difficult for systems that were designed to execute decision initiative. V&V of such systems should include evaluations of the trustworthiness of the system based on transparency inputs and scenario-based training. Transparency facets should be used to establish shared awareness and shared intent between the designer, tester, and user of the system. The transparency facets will allow the human to understand the goals, social intent, contextual awareness, task limitations, analytical underpinnings, and team-based orientation of the system in an attempt to verify its trustworthiness. Scenario-based training can then be used to validate that programming in a variety of situations that test the behavioral repertoire of the system. This novel method should be used to analyze behavioral adherence to a set of governing principles coded into the system.
Article
As machine learning systems become ubiquitous, there has been a surge of interest in interpretable machine learning: systems that provide explanation for their outputs. These explanations are often used to qualitatively assess other criteria such as safety or non-discrimination. However, despite the interest in interpretability, there is very little consensus on what interpretable machine learning is and how it should be measured. In this position paper, we first define interpretability and describe when interpretability is needed (and when it is not). Next, we suggest a taxonomy for rigorous evaluation and expose open questions towards a more rigorous science of interpretable machine learning.
Article
We summarize the potential impact that the European Union's new General Data Protection Regulation will have on the routine use of machine learning algorithms. Slated to take effect as law across the EU in 2018, it will restrict automated individual decision-making (that is, algorithms that make decisions based on user-level predictors) which "significantly affect" users. The law will also create a "right to explanation," whereby a user can ask for an explanation of an algorithmic decision that was made about them. We argue that while this law will pose large challenges for industry, it highlights opportunities for machine learning researchers to take the lead in designing algorithms and evaluation frameworks which avoid discrimination.
Conference Paper
Model building in machine learning is an iterative process. The performance analysis and debugging step typically involves a disruptive cognitive switch from model building to error analysis, discouraging an informed approach to model building. We present ModelTracker, an interactive visualization that subsumes information contained in numerous traditional summary statistics and graphs while displaying example-level performance and enabling direct error examination and debugging. Usage analysis from machine learning practitioners building real models with ModelTracker over six months shows ModelTracker is used often and throughout model building. A controlled experiment focusing on ModelTracker's debugging capabilities shows participants prefer ModelTracker over traditional tools without a loss in model performance.
Article
Much of learning and reasoning occurs in pedagogical situ- ations - situations in which teachers choose examples with the goal of having a learner infer the concept the teacher has in mind. In this paper, we present a model of teaching and learning in pedagogical settings which predicts what examples teachers should choose and what learners should infer given a teachers' examples. We present two experiments using an experimental paradigm called the rectangle game. The first experiment compares people's inferences to qualitative model predictions. The second experiment tests people in a situation where pedagogical sampling is not appropriate, ruling out al- ternative explanations, and suggesting that people use context- appropriate sampling assumptions. We conclude by discussing connections to broader work in inductive reasoning and cogni- tive development, and outline areas of future work. Much of human learning and reasoning goes on in ped- agogical settings. In schools, teachers impart their knowl- edge to students about mathematics, science, and literature through examples and problems. From early in life, parents teach children words for objects and actions, and cultural and personal preferences through subtle glances and outright ad- monitions. Pedagogical settings - settings where one agent is choosing information to transmit to another agent for the pur-