ArticlePDF Available

Abstract and Figures

COGLE (COmmon Ground Learning and Explanation) is an explainable artificial intelligence (XAI) system where autonomous drones deliver supplies to field units in mountainous areas. The mission risks vary with topography, flight decisions, and mission goals. The missions engage a human plus AI team where users determine which of two AI-controlled drones is better for each mission. This paper reports on the technical approach and findings of the project and reflects on challenges that complex combinatorial problems present for users, machine learning, user studies, and the context of use for XAI systems. COGLE creates explanations in multiple modalities. Narrative “What” explanations compare what each drone does on a mission and “Why” based on drone competencies determined from experiments using counterfactuals. Visual “Where” explanations highlight risks on maps to help users to interpret flight plans. One branch of the research studied whether the explanations helped users to predict drone performance. In this branch, a model induction user study showed that post-decision explanations had only a small effect in teaching users to determine by themselves which drone is better for a mission. Subsequent reflection suggests that supporting human plus AI decision making with pre-decision explanations is a better context for benefiting from explanations on combinatorial tasks. This article is protected by copyright. All rights reserved.
This content is subject to copyright. Terms and conditions apply.
PERSPECTIVE
Explaining autonomous drones: An XAI journey
Mark Stefik
1
| Michael Youngblood
1
| Peter Pirolli
2
|
Christian Lebiere
3
| Robert Thomson
4
| Robert Price
1
| Lester D. Nelson
1
|
Robert Krivacic
1
| Jacob Le
1
| Konstantinos Mitsopoulos
3
|
Sterling Somers
3
| Joel Schooler
2
1
Intelligent Systems Laboratory, PARC A
Xerox Company, Palo Alto,
California, USA
2
The Institute for Human and Machine
Cognition, Pensacola, Florida, USA
3
Human-Computer Interaction Institute,
School of Computer Science, Carnegie
Mellon University, Pittsburgh,
Pennsylvania, USA
4
Army Cyber Institute, United States
Military Academy, New York,
New York, USA
Correspondence
Mark Stefik, PARC, Intelligent Systems
Laboratory, 3333 Coyote Hill Road, Palo
Alto, CA. 94304.
Email: stefik@parc.com
Funding information
Defense Advanced Research Projects
Agency, Grant/Award Number:
FA8650-17-C-7710
Abstract
COGLE (COmmon Ground Learning and Explanation) is an explainable artifi-
cial intelligence (XAI) system where autonomous drones deliver supplies to
field units in mountainous areas. The mission risks vary with topography,
flight decisions, and mission goals. The missions engage a human plus AI team
where users determine which of two AI-controlled drones is better for each
mission. This article reports on the technical approach and findings of the pro-
ject and reflects on challenges that complex combinatorial problems present
for users, machine learning, user studies, and the context of use for XAI sys-
tems. COGLE creates explanations in multiple modalities. Narrative What
explanations compare what each drone does on a mission and Whybased on
drone competencies determined from experiments using counterfactuals.
Visual Whereexplanations highlight risks on maps to help users to interpret
flight plans. One branch of the research studied whether the explanations hel-
ped users to predict drone performance. In this branch, a model induction user
study showed that post-decision explanations had only a small effect in teach-
ing users to determine by themselves which drone is better for a mission. Sub-
sequent reflection suggests that supporting human plus AI decision making
with pre-decision explanations is a better context for benefiting from explana-
tions on combinatorial tasks.
KEYWORDS
autonomous systems, explainable artificial intelligence, hierarchical multifunction
modeling
Material within this technical publication is based upon the work supported by the Defense Advanced Research Projects Agency (DARPA) under
contract FA8650-17-C-7710. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do
not necessarily reflect the official policy or position of the Department of Defense or the U.S. Government. Approved for Public Release, Distribution
Unlimited.
Received: 5 June 2021 Revised: 16 November 2021 Accepted: 17 November 2021
DOI: 10.1002/ail2.54
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided
the original work is properly cited.
© 2021 PARC, a Xerox Company. Applied AI Letters published by John Wiley & Sons Ltd. This article has been contributed to by US Government employees and their
work is in the public domain in the USA.
Applied AI Letters. 2021;2:e54. wileyonlinelibrary.com/journal/ail2 1of12
https://doi.org/10.1002/ail2.54
1|INTRODUCTION
COGLE (COmmon Ground Learning and Explanation) is an explainable artificial intelligence (XAI) system for autono-
mous drones that deliver supplies to field units in mountainous areas. Missions in COGLE take place in a simulated
world with mountainous and forested settings, bodies of water, and structures. Figure 1 shows a mission map and a
flight plan for an AI-controlled drone. The yellow stick figure shows where the hiker is. The curved arrow shows a
drone's flight plan. The timeline below the map shows the altitude of the drone along its flight plan. Symbols on the
maps indicate objects. The pointy symbols are high mountains that are too high to fly over. The curve-topped symbols
are low and high foothills. Green areas are meadows. The tree-shaped symbols represent forests.
Initially, we used ArduPilot SITL,
1
which simulates low-level craft actions at high fidelity. The computational
resources required for ArduPilot's detailed simulations proved unwieldy and unnecessary for the strategic planning of
missions. Low-level flight control is widely implemented in commercial autopilots and hobbyist drones. To focus on
mission planning, we developed a lower precision simulation model (ArduPilot Light) with five levels of altitude and
eight unique directions in a turn-based grid world. We modeled a compatible programming interface (API) of ArduPilot
Light on the API of ArduPilot SITL. Figure 2 illustrates the coarse granularity of COGLE's simulation grid world for
mission planning.
FIGURE 1 Example map for a mission in COmmon Ground Learning and Explanation (COGLE's) domain
FIGURE 2 Illustrations from COGLE's flight school showing illustrating the model with five discrete altitudes and the increasing
spread of the drop zone when packages are dropped from different altitudes
2of12 STEFIK ET AL.
When drones fly too close to obstacles that are at the same altitude or higher, they risk crashing. If a drone releases
its package above forests, high foothills, or water, then its package may be damaged. A package may become inaccessi-
ble landing in a river, trees, or high foothills. The higher a drone flies the further its packages may drift during the para-
chute drop. An AI pilot might take risks at the beginning, middle, or end of a mission. A pilot's early decisions on a
mission can interact in subtle ways with later ones. For example, choices about how to avoid an obstacle early in a
flight plan can lead to a route that precludes safe approaches to a chosen point to drop a package much later.
Using an early version of COGLE, we carried out self-explanation studies with users, as described
2
by Gary Klein,
Robert Hoffman, and Shane Mueller among others. Such studies can provide a lens on the kinds of explanations that
participants want and use for themselves. The AI pilots used for the drones were based on our early deep reinforcement
learner (RL). They exhibited strange and suboptimal looping behaviors on very simple missions. The study participants
cited observed patterns in the drone's behavior referring to inferred goals, utility, and drone preferences.
When asked to make predictions during the study, a frequent participant response was I don't know.Study partic-
ipants were creative in their self-explanations (It is afraid of water!), but they had no reliable basis for determining
whether their interpretations were correct. It turned out that the strange behaviors of our early AI-controlled drones
were due to their limited training.
2|CREATING COMPETENT AI PILOTS
Training a drone to a high level of competence required architecting a neural network to generalize more effectively,
generating enough earth-realistic training examples to cover the diverse topographical configurations of missions, and
arranging training examples in a staged curriculum to facilitate machine learning.
The deep RL used a convolutional neural network (CNN).
3
Agents were trained using A2C,
4
a synchronous advan-
tage actor critic. The deep RL architecture had two separate pipelines encoding two different image views of the drone's
activity as shown in Figure 3. The allocentric view with two fully connected blue layers is used in the top pipeline.
The egocentric view with two fully connected green layers is used in the bottom pipeline. The two pipelines are
concatenated in a fully connected layer shown in purple. The network has two outputs: the expected
reward/outcome V(s
t
) from a given state (ie, how much reward the drone should expect from current state), and the
learned policy π(a
t
js
t
) that maps a state s
t
to a probability distribution over actions. The approach described above is a
model-free RL approach. The CNN and action-layers learn in concert. The whole network learns and encodes compe-
tencies to maximize reward.
CNNs are typically used to analyze visual imagery. CNNs support shift invariance, that is, they learn to detect visual
features no matter where they appear in the image. With shift invariance, the CNN does not need more training exam-
ples to learn to recognize a feature everywhere it could appear.
FIGURE 3 The architecture of COmmon Ground Learning and Explanation (COGLE's) deep reinforcement learner
STEFIK ET AL.3of12
There are additional invariants and abstractions that people use in free space navigation. They learn these abstrac-
tions when they are very young and do not notice them later. Within the COGLE project, we referred to these abstrac-
tions and competencies informally as toddler common sense(TCS). Although such informal language is not typically
part of the terminology of the machine learning research community, we found it useful for describing what the agents
needed to learn.
Some of the TCS competencies about navigation in free space are suggested in the panels in Figure 4. The top row
in panel (A) shows a craft flying east towards a hiker with no blocking obstacles. This case is easy because only two
moves are required to go from the craft location to the hiker. Once a deep RL learns this case for flying east in the first
row, it can handle a similar case in the second row because it has learned shift invariance and feature invariance. Typi-
cal CNNs also do not have rotational invariance. The case in the rightmost column flies the drone north rather than
east. Flying north requires additional learning cases, as does flying in other directions including changes of altitude in a
three-dimensional (3D) space. The craft in the bottom row flies east but requires more steps to reach the destination.
The case shown in the right tile board requires the craft not only to reach a specific location, but also to arrive with a
specific heading.
Panel (B) gives examples of navigation competencies for going over or around obstacles. An RL-based system needs
to generalize its policy to apply to various kinds of obstacles, such as trees, watch towers, and so on. Panel (C) shows
examples of competencies for choosing safe package release points. The bottom case in panel (C) is about penalizing
the dropping of packages on the far side of a river that acts as a blocking object for the hiker. An AI pilot needs to be
able to make successful plans for any arrangement of mountains, foothills, forests, water bodies, and hiker locations
that can appear in a mission setting. Flight planning is sensitive to small changes in terrain. For example, adding an
obstacle in a canyon or other tight space can make flying through it or turning around impossible or prevent descent to
a low package dropping altitude, requiring routing changes to earlier parts of a plan.
We developed a procedural content generator to create diverse and realistic terrain examples. COGLE's procedural
generator draws on technical approaches developed for movie and video game industries to generate realistic artificial
and imagined landscapes. Roland Fischer et al. review
5
this technology. Figure 5 shows how COGLE's generator
FIGURE 4 Example cases where changes to the convolutional neural network (CNN) architecture were needed to pick up abstractions
beyond shift invariance
FIGURE 5 COmmon Ground Learning and Explanation (COGLE's) procedural content generator used earth-realistic generative models
with layers for terrain, rainfall, biome, and roads and buildings
4of12 STEFIK ET AL.
created and composed layers representing terrain, rainfall and water bodies, biomes and terrain types, roads, and
buildings.
The low-level competencies of our retooled machine learning approach are far removed from the concerns and
interests of COGLE's intended end users and study participants. When our earlier self-explanation study participants
tried to understand the strange behavior of the under-trained drones, they were used to reasoning about adults who
possess mature navigational competencies. The low-level competencies needed to be acquired for the AI agents to be
worth using.
3|EXPERIMENTS WITH A COGNITIVE ARCHITECTURE
Like other model-free deep RL systems, COGLE represents what it learns in a distributed neural network. Causal expla-
nations of an AI's behavior use symbolic representations. They are hierarchical and compositional. Producing such
explanations requires bridging the gap from distributed representations to symbolic representations.
This section describes our experiments with a cognitive architecture
6
to bridge this gap. A cognitive architecture is a
computational framework for representing cognitive processes including attention, memory, and pattern-matching,
organized in a structure analogous to that of the human brain. We used the ACT-R architecture
7
developed by John
Anderson and colleagues. ACT-R stands for Adaptive Control of Thought and the Rstands for Rational.Anderson
developed ACT-R to explain how the brain works when it is learning. ACT-R is organized into modules for declarative
memory, procedural knowledge, vision, audition, and motor control. In previous work
8
Lebiere et al. showed how an
ACT-R cognitive architecture can replicate human sensemaking processes to transform loosely structured data from
heterogeneous sources into a structured symbolic form.
Building on an approach that some of us had used previously to capture human reasoning, we set up drone missions
to be annotated by human users using a domain ontology. COGLE agents would then plan the same missions. The next
steps involved training a cognitive mapper and using the mapper to generate appropriate symbol structures for explana-
tions. To train the mapper, we would analyze patterns of activations in the nodes in the upper levels of the neural net-
work for missions, and then correlate those activation patterns with activation patterns in an ACT-R cognitive model of
how humans solved the same missions. This training would provide a foundation for generating XAI explanations.
When we started developing this mapping approach, COGLE's learning system was not yet competent at planning
drone missions. We opted to first develop a proof-of-concept system for the cognitive architecture approach using sim-
ple problems and missions in StarCraft 2 (SC2).
9
SC2 is a real-time strategy game where players control units from a
third-person perspective with the aim of eliminating opponents. We used a system configuration for our experiments
that included a neural network architecture like COGLE's. The focus was to understand what an analysis would tell us
about the ascribed mental states of the learned model.
The analyses of our experiments and the sample problems in SC2 are detailed elsewhere.
10,11
While the research
progress on analyzing the cognitive activity of the neural networks was encouraging, there were too many engineering
tasks to use this approach for our COGLE end-user studies. We continued this research but developed an alternative
explanation approach in parallel.
4|FROM INTROSPECTIVETO EXTROSPECTIVEEXPLANATIONS
The cognitive architecture approach to explanation is introspectivein that it analyzes patterns of activity in the
agents' neural networks. We pivoted to an alternative extrospectiveapproach where the explainer employs the simu-
lator to conduct experiments that analyze the performance of agents without inspecting their internal representations.
To create shared and precise language about COGLE's decisions and rationale, we adopted practices and terminol-
ogy from formal games,
12
with game rules, outcomes, strategies, and strategic interdependence. In this approach, a
game is any rule-governed situation with a well-defined outcome. Outcomes are game states that result from player
actions. Strategy refers to a plan of action intended to lead to a desired terminal outcome. Strategic interdependence
means that outcomes depend on the strategies used by the players.
COGLE's simulator has rules for actions and for rules for winning. Action rules govern how a drone, hiker, or pack-
age can move in the world model as depicted in Figures 1 and 2. The outcome rules determine changes of state in a
turn, including changes to location, passage of time, and risk. The data describing the outcomes are stored in simulator
STEFIK ET AL.5of12
state variables. For example, local outcome rules define the increased risks for a drone when it flies too close to objects
at the same or higher altitude. Additional state variables represent delays experienced by the hiker when a package
lands out of sight or the hiker needs to climb a hill, walk around an obstacle, or swim in a body of water. Figure 6
shows the hierarchies of state variables representing risks and delay times. The rules for winningare that the drone
whose plan incurs the lowest risk wins if it does not violate any time or risk constraint.
The panels in Figure 7 show block diagrams for two configurations of COGLE, with a simulator, an explainer, and
AI pilots based on deep reinforcement learning or symbolic planning. The simulator models the world. AI pilots make
mission decisions. The explainer analyzes flight plans and compares the pilots and their plans. Programmatic interfaces
between the blocks enable a learning system to develop a policy, an AI pilot to take actions in the world, and an
explainer to set up mission situations as experiments with the drones so that it can compare plans and create
explanations.
Panel (A) shows the configuration with the deep RL. A learning phase precedes the human-machine task of
choosing the better drone for a mission. The deep RL learner considers mission cases generated by the procedural
content generator. It builds up from the TCS level concerns to task level concerns, incrementally improving the
competence of an AI pilot based on feedback about outcomes in the simulated world. In the performance phase, the
learned policy is fixed and functions as an AI pilot. Presented with a challenge mission, the AI pilot chooses a
sequence of actions.
Panel (B) shows an analogous configuration using a planner. We used a variation of the A* search algorithm
13
with
the provision that any path that violates a constraint is rejected. A* searches for an optimal solution for the given drone
FIGURE 6 Accessible state variables in the simulator corresponded to factors for risk and time
FIGURE 7 Block diagram showing two versions of COmmon Ground Learning and Explanation (COGLE's) system architecture. One
version uses learned policies to fly the drone. The other uses a variation of a simple A* planner. In both configurations, the explainer
produces explanations by comparing two drone performances in the simulated world
6of12 STEFIK ET AL.
on the given mission. In our implementation, this computation typically takes about 20 minutes, making this the more
convenient configuration for experimenting with game rules without days of training.
The simulator provides a set of terms with consistent meanings that can be used in explanations to users. An exam-
ple of a narrative explanation comparing the flight plans of two agents is:
Drone 1 drops the package where there is lower package inaccessibility risk, because only it can safely fly
close to obstacles.
The term lower package inaccessibility riskcorresponds to the aggregated state variable package inaccessibility
risk.It is computed by the simulator as part of the outcome state representing where a package landed. Similarly,
observations about drone performance such as only it can safely fly close to obstacles(or paraphrased it incurs lower
risks when flying close to obstacles) reference the state variable craft safety risk.
Martin Erwig et al. describe
14
a method for decomposing rewards or penalties into different types and explaining
choices of actions by a minimal sufficient explanation (MSE). An MSE is a minimal set of rewards that are sufficient for
distinguishing the benefit of one action over another. Extending this work, we developed a hierarchical multi-factor
framework (HMF) for decision problems. The HMF framework aggregates factors. For example, one drone may incur
greater risks to craft safety to reach a drop point where a package can be released to land with lower inaccessibility risk,
lower package safety risk, or lower package delivery time. The different kinds of risks such as package risk or craft
safety risk correspond to the types of rewards or penalties described in the paper by Erwig et al. but extended to have
type hierarchies. Risk factors quantitatively relate to probabilities of damage that imperil the success of a mission. Time
factors correspond to the time it takes a drone or hiker to reach a destination. In HMF, the conditions for winning are
described by a combination of constraints on state variables and optimization criteria. In COGLE, the drone with the
lowest total risk penalty wins unless it violates a time constraint.
Figure 8 shows COGLE's narrative and visual explanations that are intended to help users to understand the drones
and their plans. The explanation system creates narrative explanations using terminology from the HMF hierarchies.
The narrative explanations explain advantages for what the drones did and competency differences as reasons for why
the drones could do it.
COGLE's computation of the narrative explanations combines MSEs with counterfactual cases. The whatphrase
in the narrative explanation is the factor with the biggest risk difference. To simplify the narrative, sibling factors low
FIGURE 8 Example mission with explanations with two candidate drones and their different flight plans. Narrative explanations say
what the winning drone did and why it had an advantage. The interactive visual explanation shows where on the map the crafts
encountered risks. As a user scrubs through the timeline, obstacles on the map light up with increasing intensity indicating greater risk
STEFIK ET AL.7of12
in the hierarchy can be combined and accounted for in terms of their parent risk unless the contribution is mostly from
one low-level risk. For example, multiple low-level package accessibility factors can be aggregated to the parent pack-
age accessibilityfactor.
Consider the earlier explanation example,
Drone 1 drops the package where there is lower package inaccessibility risk, because only it can safely fly
close to obstacles.
The whatphrase about lower package accessibility risk identifies which action resulted in the greatest risk differ-
ence. The explanation generator assigns credit to a single factor if it is responsible for a clear majority of the risk differ-
ence. When the risk is aggregated from several contributing sibling risks in the factor hierarchy, it simplifies the
explanation by assigning credit to their parent risk. In this example, the explanation generator associates the action of
dropping a package at a location with the factor for lower package accessibility risk. When a drone wins because the
other drone violates a constraint on a factor, the competency associated with that factor is assigned the credit.
Producing the whyphrase requires collecting counterfactual data using the simulator to conduct experiments. The
explainer instructs the simulator to fly the losing drone using the flight plan of the winning drone. Low-level craft flight
differences show up as different amounts of risks for the two drones taking the same actions. In this example, the losing
drone (Falcon) flying Eagle's plan would incur excessive craft risk. This is whyit did not choose Eagle's plan and why
it was forced to choose an inferior location to drop the package.
To help users understand these flight differences and risks in the context of the mission map, the narrative explana-
tions are augmented by visual explanations. The visual explanations overlay the map with highlights to call attention to
places where risks occur. The risky obstacles are identified and displayed on the map by the simulator. As a participant
scrubs the timeline, the drones move, and the simulator highlights the risks accumulated so far in the flight plan.
Drones can also win because of superior information processing. Differences in information processing competencies
show up as differences in state variables representing the information that the drone uses.
To test the effectiveness of our XAI explanations we set up a model induction user task as described in Appendix.
The weak model induction improvements with post-decision explanations motivated us to pivot to a decision support
context with pre-decision explanations.
5|SUMMARY AND REFLECTIONS
One way that AIs can have value is by carrying out important cognitive tasks better than people. Like other multi-factor opti-
mization problems, the nature of COGLE's mission planning is combinatorial. During the XAI project, we came to recognize
that combinatorial problems create challenges for people, challenges for training deep reinforcement learners, challenges for
using cognitive models to generate XAI explanations, and challenges for carrying out model induction user studies.
Stepping back again, our experience recalled earlier AI examples where humans and machines solve combinatorial puz-
zles. Based on heuristic search, the DENDRAL program
15
routinely solved structure elucidation problems on mass spectros-
copy data. Carl Djerassi
16
reported on a graduate chemistry class that he taught at Stanford University where he assigned
students to use DENDRAL to check the correctness of chemistry papers that had been published in scientific journals. They
found that practically every published article missed solutions or reported incorrect ones. This showed that even in profes-
sional scientific publications,AIscouldfinderrorsthatweremissedbyexpertauthorsandreviewers.
Another analogous and widely reported example of AI superior performance is move 37 in the Go match between
AlphaGo and Lee Sedol. The rules of Go are well known. The combinatorial challenge is to thoroughly explore possible
solutions. As described in an article
17
in Nature, AlphaGo reached a superhumanlevel in the strategy game Go, with-
out learning from any human moves. How even world-class human players were surprised by the winning line of play
has been analyzed and discussed for several years.
These observations suggest framing the teaming of people with AIs in decision support tasks that exploit human and AI
strengths. The role of the AI can be to efficiently solve given problems and the role of humans can be to oversee and adjust
the problem criteria. The XAI explanations help people to check the reasonableness of AI solutions. By analogy, it is easier
to check a proof than to create one. In contrast to the post-decision explanations required for a model induction study, pre-
decision explanations support human understanding of AI reasoning in a context of decision support. Pre-decision explana-
tions enable a person to check an AI's solution, even when finding the optimal solution is difficult for an unaided person to
8of12 STEFIK ET AL.
do routinely. Analogous divisions of labor across human plus computer teams have been studied previously in analytic tools
for sensemaking based on interactive information visualization.
18
Appendix. Model Induction User Study
User study participants are presented with a mission and two different AI-controlled drones as shown in Figure 9. The
drones perceive their environment differently, perform differently, and make different flight plans. Participants are asked to
choose the best drone for each mission. After participants pick a drone, the system shows COGLE's explanations.
The structure of our model induction user study is shown in Figure 10. Our hypothesis was that the users receiving
XAI explanations after choosing the best drone for a mission would learn to make better choices about the best drone
in later missions. The study task missions were arranged in a curated curriculum that introduced domain challenges
incrementally. Initially, participants were exposed to situations where one drone's plan was better for a single main rea-
son. In later missions, the terrains were more complex and the reasons for a plan's advantage varied. Missions were ran-
domly ordered as to which drone had the advantage.
FIGURE 9 Users are presented with two drones and their flight plans for a mission. They are asked to select the best drone for a
mission
FIGURE 10 Study plan for the user study. In each case, users were asked to select the best drone for a mission and afterwards were
shown the correct answer and given the explainable artificial intelligence (XAI) explanations. We removed the qualification step in the final
study because flight school did not improve user performance in predicting the performance of the drones
STEFIK ET AL.9of12
N=92 participants were recruited for this study from Amazon Mechanical Turk. Of these N =92, N =76 success-
fully completed the study (17% excluded from analysis for incomplete responses). Participants were randomly assigned
to either receive explanation first or explanation second. Furthermore, participants were also randomly assigned one of
the pairs of drones to see first.
We used a 2 2 within-participants (repeated measures) design with a within-participant variable of explanation
interface (explanation, no-explanation) and a between-participants variable of order (explanation first, no-explanation
first) and counterbalanced with Drone Pairs (Eagle and Falcon First, Osprey, and Peregrin First).
Each participant proceeded through a training block followed by two testing blocks. In a training block, we gave
participants a description of and training on the domain world and how to select which drone would win a mission.
Approximately, half the participants received the explanation interface in testing block 1 and then the no-explanation
in testing block 2 at random. The remaining participants had the opposite order (no-explanation in testing block 1
followed by explanation in testing block 2). Each testing block consisted of 30 mission trials. In each testing block trial,
participants were presented with two different drones on the screen. The first 10 mission trials were designed to have
participants experience a successive revelation of drone behavior as described above. Of those 10 missions, the first four,
Eagle or Osprey, would win. The following two missions were won by Falcon or Peregrin. Then the missions would
vary winning and losing. From missions 11 to 30 within a testing block, the terrains were arranged randomly so that
different drones had the advantage.
The primary objective of this experiment was to determine whether more people correctly predicted the better of
two drone flights in the explanation condition than in the no-explanation conditions. We defined success rate to be the
proportion of correct predictions over 30 trials in the explanation or no-explanation condition. We ran a paired t-test
testing the difference between the explanation conditions success rate (M=.57, SD =.16) and no-explanation's rate
(M =.54, SD =.16). The mean of the differences (0.04) suggests that the effect is positive, significant, and small (95%
CI [1.78e03, 0.08], t(75) =2.08, P< .05; Cohen's d=0.24, 95% CI [0.01, 0.47]).
Figure 11 shows that there was a small positive model induction effect of explanations as measured by the partici-
pants ability to correctly predict which drone was better for the mission. The explanation condition produced a small
but reliable effect on the ability to predict future drone success. It is specific to the drones explained.
Figure 12 illustrates the performance of the participants over the aggregated test conditions using a logistic histo-
gram as described
19
by Jennifer Smart and colleagues. The red line is the fit of a logistic regression to the percentage of
accurate predictions. The case numbers refer to different missions and drones in the balanced conditions of the study.
Participants in both conditions improved in performance as they trained in more missions. The difference between the
explanation and no-explanation conditions contributes approximately as much benefit as one additional case in the 30
cases of each condition.
FIGURE 11 Boxplot showing the explanation and no-explanation success rates (proportion of correct predictions for 30 trials in each
condition)
10 of 12 STEFIK ET AL.
The model induction user study showed that the effect of the after-decision explanations was weak in improving the
performance of participant prediction on future missions.
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.
ORCID
Mark Stefik https://orcid.org/0000-0003-3134-6031
Michael Youngblood https://orcid.org/0000-0001-9490-0015
Peter Pirolli https://orcid.org/0000-0002-9018-4880
Christian Lebiere https://orcid.org/0000-0003-4865-3062
Lester D. Nelson https://orcid.org/0000-0001-8556-7644
Konstantinos Mitsopoulos https://orcid.org/0000-0002-0076-2334
Sterling Somers https://orcid.org/0000-0003-0129-7080
REFERENCES
1. ArduPilot Development Team. ArduPilot SITL (Software in the Loop) Simulator. https://ardupilot.org/dev/docs/sitl-simulator-software-
in-the-loop.html
2. Klein, G, Hoffman, R, Mueller, S.T. Naturalistic psychological model of explanatory reasoning: how people explain things to others and
to themselves. Conference on Naturalistic Decision Making, San Francisco, 2019.
3. Le Cun Y, Boser B, Denker JS, et al. Handwritten digit recognition with a back-propagation network. In: Touretzky D, ed. Advances in
Neural Information Processing Systems 2. Vol 89. Denver, CO: Morgan Kaufman, NIPS; 1990:1990.
4. Mnih, V., Badia, A.P., Mirza, M., Graves, A. Lillicrap, T.P., Harley, T., Silver, D., Kavukcuoglu, K. Asynchronous methods for deep rein-
forcement learning. International Conference on Machine Learning 19281937, ICML, 2016.
5. Fischer R, Dittmann P, Weller R, Zachmann G. AutoBiomes: procedural generation of multi-biome landscapes. The Visual Computer.
2020;36:2263-2272.
6. Kotseruba I, Tsotsos JK. 40 years of cognitive architectures: Core cognitive abilities and practical applications. Artificial Intelligence Rev.
2020;53:17-94.
7. Anderson JR. Rules of the Mind. New York: Psychology Press. Lawrence Erlbaum Associates; 1993.
FIGURE 12 Logistic histogram plot showing prediction accuracy by trial as well as frequency histograms of successes (1) and failures
(0) by trial. The y-axis for the histograms is on the right. Top box plot is the distribution of success over trial; the bottom box plot is the
distribution of failure over trial. The boxplots can be interpreted in the usual manner: the central notch is the median, the gray box
encompasses the second and third quartiles, and the outer whiskers are the extremes (1.5*inter-quartile range). The red line is the fit of a
logistic regression. The y-axis for the fit line is on the left
STEFIK ET AL.11 of 12
8. Best BJ, Gerhart N, Lebiere C. Extracting the ontological structure of OpenCyc for reuse and portability of cognitive models. Proceedings
of the Behavioral Representation in Modeling and Simulations (BRIMS) Conference. Charleston, SC; 2010.
9. Vinyals O, Ewalds T, Bartunov S, Georgiev P, Vezhnevets AS, Yeo M. Starcraft ii: a new challenge for reinforcement learning. arXiv pre-
print arXiv:1708.04782. 2017.
10. Somers S, Mitsopoulos K, Thomson R, Lebiere C. Explaining decisions of a deep reinforcement learner with a cognitive architecture.
Proceedings of the 16th International Conference on Cognitive Modeling. Montreal, QC, Canada: ICCM; 2018:144-149.
11. Mitsopoulos K, Somers S, Thomson R, Lebiere C. Cognitive architectures for introspecting deep reinforcement learning agents.
Workshop on Bridging AI and Cognitive Science, at the 8th International Conference on Learning Representations. ICLR; 2020.
12. Gardner R. Games for Business and Economics. John Wiley and Sons; 2003.
13. Hart PE, Nilsson NJ, Raphael B. A formal basis for the heuristic determination of minimum cost paths. IEEE Transact Syst Sci Cybernet.
1967;4(2):100-107.
14. Erwig M, Fern A, Murali M, Koul A. Explaining deep adaptive programs via reward decomposition. International Joint Conference on
Artificial Intelligence (IJCAI) Workshop on Explainable Artificial Intelligence. 2018;40-44:100-107. https://www.ijcai.org/proceedings/
2018/
15. Lindsay R, Buchanan B, Feigenbaum E, Lederberg J. DENDRAL: a case study of the first expert system for scientific hypothesis
formation. Artif Intell. 1993;61(2):209-261.
16. Djerassi C. The Pill, Pygmy Chimps, and Degas' Horse. Lexington, Massachusetts: Basic Books; 1993.
17. Gibney E. Self-taught AI is best yet at strategy game go. Nature. 2017.
18. Heer J, Viégas FB, Wattenberg M. Voyagers and voyeurs: supporting asynchronous collaborative information visualization. Communica-
tions of the ACM. 2009;52:1.
19. Smart J, Sutherland WJ, Watkinson AR, Gill JA. A new means of presenting the results of logistic regression. Bull Ecol Soc Am. 2004;
85(3):100-102.
How to cite this article: Stefik M, Youngblood M, Pirolli P, et al. Explaining autonomous drones: An XAI
journey. Applied AI Letters. 2021;2(4):e54. doi:10.1002/ail2.54
12 of 12 STEFIK ET AL.
... Yorumlayıcılar gelecek öngörüsünde bulunur ve bunu geçmişten gelen verilerle saglar. Daha önce yapılmış olan çalışmalarda da ilgili ortamın modeli oluşturulup yorumlayıcıların kullanıldıgı algoritmalarda açıklanabilir yapay zekâ teknolojisinden yararlanılmıştır [13,14,15]. ...
Conference Paper
In recent years, there have been intensive efforts to develop autonomous driving software systems that enable vehicles to perceive and interpret the dynamic environment at the human level and make their own decisions by utilizing the developments in artificial intelligence and especially deep learning methods. In addition, with the high-security and high-speed network systems that are expected to become widespread in the near future, vehicles will have the ability to communicate with each other and the environment, thus transforming vehicles into cooperative systems that provide safer and more economical driving. Artificial intelligence-based systems used in autonomous driving software systems are expected to be explainable, transparent, reliable and compliant with standards. In this study, a five-layer autonomous driving software architecture, which has morphological and functional similarities to the V-model software architecture, is discussed and it is aimed that the software architecture can meet the above-mentioned requirements in the automotive industry. In addition, the place of explainable artificial intelligence (XAI) and deep learning methods developed in recent years in the relevant layer of the autonomous driving software architecture and their relationship with other layers are presented in detail.
... This is the first of two position papers. The second paper proposes a bootstrapping approach to developmental AI (Stefik & Price, 2023). Most human children ultimately develop adult-level intelligence and learn to collaborate. ...
Preprint
Full-text available
The vision of AI collaborators has long been a staple of science fiction, where artificial agents understand nuances of collaboration and human communication. They bring advantages to their human collaborators and teams by contributing their special talents. Government advisory groups and leaders in AI have advocated for years that AIs should be human compatible and be capable of effective collaboration. Nonetheless, robust AIs that can collaborate like talented people remain out of reach. This position paper draws on a cognitive analysis of what effective and robust collaboration requires of human and artificial agents. It sketches a history of public and AI visions for artificial collaborators, starting with early visions of intelligence augmentation (IA) and artificial intelligence (AI). It is intended as motivation and context for a second position paper on collaborative AI (Stefik & Price, 2023). The second paper reviews the multi-disciplinary state-of-the-art and proposes a roadmap for bootstrapping collaborative AIs.
... Among such associated with technology challenges are several. For example, operational Artificial Intelligence (AI), which, while assisting the flight, is not responsible for making ethical decisions in the cases of a rapidly escalating problem that may lead to a crash [42]. BVLOS and VTOL operations are heavily controlled by the on-board automation that relies on sensors and may engage the operating AI. ...
Conference Paper
Full-text available
The Arctic is one of the least developed and most under-invested regions in the world, primarily due to its harsh environment and remoteness. However, with the current retreat of Arctic sea ice, the pristine Arctic region is gradually open for economic activities and has been of great interest for Arctic council members. For example, the European Union (EU) Integrated Policy for the Arctic has a clear emphasis on the issues specific to the European Arctic. In particular, it focuses on sustainable Arctic maritime economic growth and launches a process of identifying and developing its relevant key enabling technologies. Unmanned Aircraft System (UAS) or drone is a key enabling technology for missions in harsh and remote Arctic environment, such as finding shorter and safer shipping routes, geological survey for undiscovered rare earth elements, delivering medical supplies to remote areas, Search & Rescue operations, and tracking Arctic ice and vegetation for climate change study. In this paper, we present challenges, opportunities, and enabling technologies related to the development of UAS for Arctic applications. Furthermore, we discuss our hands-on experience of flying drones in harsh Arctic environment and provide a list of operational risks and recommendations.
Article
While previous studies of trust in artificial intelligence have focused on perceived user trust, the paper examines how an external agent (e.g., an auditor) assigns responsibility, perceives trustworthiness, and explains the successes and failures of AI. In two experiments, participants (university students) reviewed scenarios about automation failures and assigned perceived responsibility, trustworthiness, and preferred explanation type. Participants’ cumulative responsibility ratings for three agents (operators, developers, and AI) exceeded 100%, implying that participants were not attributing trust in a wholly rational manner, and that trust in the AI might serve as a proxy for trust in the human software developer. Dissociation between responsibility and trustworthiness suggested that participants used different cues, with the kind of technology and perceived autonomy affecting judgments. Finally, we additionally found that the kind of explanation used to understand a situation differed based on whether the AI succeeded or failed.
Article
The use of Deep Reinforcement Learning (DRL) schemes has increased dramatically since their first introduction in 2015. Though uses in many different applications are being found, they still have a problem with the lack of interpretability. This has bread a lack of understanding and trust in the use of DRL solutions from researchers and the general public. To solve this problem, the field of Explainable Artificial Intelligence has emerged. This entails a variety of different methods that look to open the DRL black boxes, ranging from the use of interpretable symbolic Decision Trees to numerical methods like Shapley Values. This review looks at which methods are being used and for which applications. This is done to identify which models are the best suited to each application or if a method is being underutilised.
Preprint
Full-text available
Although some current AIs surpass human abilities especially in closed worlds such as board games, their performance in the messy real world is limited. They make strange mistakes and do not notice them. They cannot be instructed easily, fail to use common sense, and lack curiosity. They do not make good collaborators. Neither systems built using the traditional manually-constructed symbolic AI approach nor systems built using generative and deep learning AI approaches including large language models (LLMs) can meet the challenges. They are not well suited for creating robust and trustworthy AIs. Although it is outside of mainstream AI approaches, developmental bootstrapping shows promise. In developmental bootstrapping, AIs develop competences like human children do. They start with innate competences. Like humans, they interact with the environment and learn from their interactions. They incrementally extend their innate competences with self-developed competences. They interact and learn from people and establish perceptual, cognitive, and common grounding. Following a bootstrapping process, they acquire the competences that they need. However, developmental robotics has not yet produced AIs with robust adult-level competences. Projects have typically stopped at the Toddler Barrier corresponding to human infant development at about two years of age, before speech is fluent. They also do not bridge the Reading Barrier, where they can skillfully and skeptically tap into the vast socially developed recorded information resources that power LLMs. The next competences in human cognitive development involve intrinsic motivation, imitation learning, imagination, coordination, and communication. This paper lays out the logic, prospects, gaps, and challenges for extending the practice of developmental bootstrapping to create robust and resilient AIs.
Article
Full-text available
This paper summarizes the psychological insights and related design challenges that have emerged in the field of Explainable AI (XAI). This summary is organized as a set of principles, some of which have recently been instantiated in XAI research. The primary aspects of implementation to which the principles refer are the design and evaluation stages of XAI system development, that is, principles concerning the design of explanations and the design of experiments for evaluating the performance of XAI systems. The principles can serve as guidance, to ensure that AI systems are human-centered and effectively assist people in solving difficult problems.
Article
Full-text available
Advances in computer technology and increasing usage of computer graphics in a broad field of applications lead to rapidly rising demands regarding size and detail of virtual landscapes. Manually creating huge, realistic looking terrains and populating them densely with assets is an expensive and laborious task. In consequence, (semi-)automatic procedural terrain generation is a popular method to reduce the amount of manual work. However, such methods are usually highly specialized for certain terrain types and especially the procedural generation of landscapes composed of different biomes is a scarcely explored topic. We present a novel system, called AutoBiomes, which is capable of efficiently creating vast terrains with plausible biome distributions and therefore different spatial characteristics. The main idea is to combine several synthetic procedural terrain generation techniques with digital elevation models (DEMs) and a simplified climate simulation. Moreover, we include an easy-to-use asset placement component which creates complex multi-object distributions. Our system relies on a pipeline approach with a major focus on usability. Our results show that our system allows the fast creation of realistic looking terrains.
Conference Paper
Full-text available
The process of explaining something to another person is more than offering a statement. Explaining means taking the perspective and knowledge of the Learner into account, and determining whether the Learner is satisfied. While the nature of explanationconceived of as a set of statements has been explored philosophically, empirically, and experimentally, the process of explaining, as an activity, has received less attention. We conducted a naturalistic study, looking at 74 cases of explaining. The results imply three models: local explaining about why a device (such an intelligent system) acted in a surprising way, global explaining about how a device works, and self-explaining in which people craft their own understanding. The examination of the process of explaining as it occurs in natural settings helped us to identify a number of mistaken beliefs about how explaining works.
Article
Full-text available
In this paper we present a broad overview of the last 40 years of research on cognitive architectures. To date, the number of existing architectures has reached several hundred, but most of the existing surveys do not reflect this growth and instead focus on a handful of well-established architectures. In this survey we aim to provide a more inclusive and high-level overview of the research on cognitive architectures. Our final set of 84 architectures includes 49 that are still actively developed, and borrow from a diverse set of disciplines, spanning areas from psychoanalysis to neuroscience. To keep the length of this paper within reasonable limits we discuss only the core cognitive abilities, such as perception, attention mechanisms, action selection, memory, learning, reasoning and metareasoning. In order to assess the breadth of practical applications of cognitive architectures we present information on over 900 practical projects implemented using the cognitive architectures in our list. We use various visualization techniques to highlight the overall trends in the development of the field. In addition to summarizing the current state-of-the-art in the cognitive architecture research, this survey describes a variety of methods and ideas that have been tried and their relative success in modeling human cognitive abilities, as well as which aspects of cognitive behavior need more research with respect to their mechanistic counterparts and thus can further inform how cognitive science might progress.
Article
Full-text available
This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game. This domain poses a new grand challenge for reinforcement learning, representing a more difficult class of problems than considered in most prior work. It is a multi-agent problem with multiple players interacting; there is imperfect information due to a partially observed map; it has a large action space involving the selection and control of hundreds of units; it has a large state space that must be observed solely from raw input feature planes; and it has delayed credit assignment requiring long-term strategies over thousands of steps. We describe the observation, action, and reward specification for the StarCraft II domain and provide an open source Python-based interface for communicating with the game engine. In addition to the main game maps, we provide a suite of mini-games focusing on different elements of StarCraft II gameplay. For the main game maps, we also provide an accompanying dataset of game replay data from human expert players. We give initial baseline results for neural networks trained from this data to predict game outcomes and player actions. Finally, we present initial baseline results for canonical deep reinforcement learning agents applied to the StarCraft II domain. On the mini-games, these agents learn to achieve a level of play that is comparable to a novice player. However, when trained on the main game, these agents are unable to make significant progress. Thus, SC2LE offers a new and challenging environment for exploring deep reinforcement learning algorithms and architectures.
Article
Full-text available
We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network controllers. The best performing method, an asynchronous variant of actor-critic, surpasses the current state-of-the-art on the Atari domain while training for half the time on a single multi-core CPU instead of a GPU. Furthermore, we show that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task involving finding rewards in random 3D mazes using a visual input.
Conference Paper
In this work we demonstrate the use of Cognitive Architectures in the Reinforcement Learning domain, specifically to serve as a common ground to understand and explain Reinforcement Learning agents in Human Ontology terms. Cognitive Architectures could potentially act as an adaptive bridge between Cognition and modern AI, sensitive to the cognitive dynamics of human user and the learning dynamics of AI agents. 1 WHY COGNITIVE ARCHITECTURES Cognitive models leverage scalable and efficient learning mechanisms, capable of replicating human sensemaking processes to transform loosely-structured data from heterogeneous sources into a more structured form in a manner similar to human learning (Lebiere et al., 2013). Cognitive models can be instantiated within a cognitive architecture. A cognitive architecture is a computational framework representing cognitive processes including attention, memory, and pattern-matching, organized in a structure analogous to that of the human brain (Laird et al., 2017). It provides a principled platform for developing behavior models constrained by human processes and predictive of human performance, but also with the means to both reproduce and mitigate limitations in human reasoning such as cognitive biases. The cognitive models developed for this work use the ACT-R cognitive architecture (Anderson et al., 2004; Anderson, 2007). ACT-R is organized into modules for declarative memory, procedural knowledge, vision, audition, and motor control. In this project we use only the declarative memory module and, in particular, the instance-based learning (IBL) theory of decision making based on storage and retrieval processes from memory. ACT-R is a hybrid architecture, composed of both symbolic and sub-symbolic processing. The hybrid nature of ACT-R makes for a particularly compelling candidate for bridging AI and Cognitive Science. The symbolic aspects of the architecture are inherently interpretable: the models not only fit data, but the introspectability of the models themselves provide an understanding of the representational content and cognitive processes underlying behavior. Further, the sub-symbolic components, which constrain the architecture, has similar characteristics of learning and adaptivity to machine learning models and thus has the potential to interface with other sub-symbolic systems like neural networks (Jilk et al., 2008). 2 METHODS In order to showcase the usage of Cognitive Architectures along with an RL agent, in various settings , we developed a modular turn-based OpenAI Gym gridworld. For our purposes we designed 1