Conference PaperPDF Available

Empirical investigation of how robot's pointing gesture influences trust in and acceptance of heatmap-based XAI

Authors:

Abstract and Figures

This study investigated how displaying a robot's attention heatmap while the robot point gesture at it influences human trust and acceptance of its outputs. We conducted an experiment using two types of visual tasks. In these tasks, the participants were required to decide whether to accept or reject the answers of an AI or robot. The participants could see the answers with an AI attention heatmap, the heatmap with AI pointing (displayed as a laser dot cursor), a robot attention heatmap with robot pointing (pointing at a certain location on the heatmap displayed on a tablet with a stick), or no heatmap. The experimental results revealed that the AI and robot pointing at their attention heatmaps lowered the participants' acceptance of their answers when the heatmaps had low interpretability in a more difficult task. Also, the robot pointing at the heatmaps showed the possibility of increasing acceptance of its answer when the heatmaps had high interpretability in a more difficult task. In addition, the acceptance of the robot's answers correlated with emotional trust in the robot. This study demonstrates that a robot pointing gesture at its attention heatmap could be used to control human behaviors and emotional trust in human-robot interactions.
Content may be subject to copyright.
Empirical investigation of how robot’s pointing gesture influences trust
in and acceptance of heatmap-based XAI
Akihiro Maehigashi1, Yosuke Fukuchi2and Seiji Yamada2,3
Abstract This study investigated how displaying a robot’s
attention heatmap while the robot point gesture at it influences
human trust and acceptance of its outputs. We conducted an
experiment using two types of visual tasks. In these tasks,
the participants were required to decide whether to accept
or reject the answers of an AI or robot. The participants
could see the answers with an AI attention heatmap, the
heatmap with AI pointing (displayed as a laser dot cursor),
a robot attention heatmap with robot pointing (pointing at a
certain location on the heatmap displayed on a tablet with
a stick), or no heatmap. The experimental results revealed
that the AI and robot pointing at their attention heatmaps
lowered the participants’ acceptance of their answers when the
heatmaps had low interpretability in a more difficult task. Also,
the robot pointing at the heatmaps showed the possibility of
increasing acceptance of its answer when the heatmaps had
high interpretability in a more difficult task. In addition, the
acceptance of the robot’s answers correlated with emotional
trust in the robot. This study demonstrates that a robot pointing
gesture at its attention heatmap could be used to control human
behaviors and emotional trust in human-robot interactions.
I. INTRODUCTION
In recent years, artificial intelligence (AI) has been per-
vasive, and it will be used in more human activities in our
daily lives. Also, social robots with AI are expected to be
used to perform activities on behalf of humans or assist in
human activities.
In human-AI and human-robot interactions, trust is a
fundamental factor in deciding the level of reliance on an
AI or robot [1], [2]. Trust can be defined as “the attitude
that an agent will help achieve an individual’s goals in a
situation characterized by uncertainty and vulnerability” [2].
Proper reliance on AI or robots is achieved by appropriately
calibrating trust to their actual reliability, which leads to
maximizing task performance. This process is called trust
calibration [2], [3].
However, miscalibration can occur in which trust exceeds
or falls short of AI or robot reliability. One of the factors
causing miscalibration is considered to be the black-box
nature of their internal processes, that is, a lack of trans-
parency [4]. Transparency refers to the extent to which the
underlining rules or internal logic of a technology is apparent
to humans, and it is critical for trust development in human-
AI and human-robot interactions [5].
This problem can be addressed by providing explanations
on machine processes to help people understand the under-
lying rationale for the machine’s performance [6]. Recently,
explainable AI (XAI) has been developed to make AI un-
derstandable to humans by providing an explanation of its
processes [7].
In this study, we used AI or robot attention heatmaps to ex-
plain their internal processes in visual tasks and investigated
how a pointing gesture made by a social robot on a robot
attention heatmap influences human trust and acceptance of
the robot’s outputs.
II. RELATED WORK
A. Transparency of robots and AI
Regarding human-AI interaction, explaining how a system
works by showing its internal processes is considered to
influence trust in AI [2], [8]. Dzindolet et al. [9] experimen-
tally showed that explaining why a decision-making agent
makes mistakes inhibited an extreme decrease of trust in and
reliance on the agent when the agent made wrong decisions.
Also, Wang and Benbasat [10] experimentally manipulated
the levels of transparency for recommendation agent pro-
cesses by giving explanations about how the agent made its
decision. As a result, they found out that the explanations
increased belief in the competence and benevolence of the
agent, which are related to trust in the agent.
Moreover, in regard to human-robot interaction, since so-
cial robots can have anthropomorphic features, those features
allow people to apply social reasoning toward the robots [11].
This is because people activate theory of mind, the ability
to infer thoughts, feelings, and beliefs of others [12], when
interacting with a robot. Also, when a robot displays human-
like behaviors, such as nodding, shrugging its shoulders, or
throwing up its hands, it helps people understand the robot’s
internal state, and such behaviors are known to influence trust
in a robot [13].
Recently, various studies have been focusing on XAI. The
structure of the explanation has variations such as decision
trees, rules, classifiers, and saliency maps [14]. The most
well-known approach in XAI for image recognition is Grad-
CAM [15], [16], which analyzes a CNN model and generates
a heatmap showing the important features of an original
image as AI attention. However, there is a problem in that it
is very difficult for humans to interpret how the AI perceives
the image on the basis of a heatmap.
To overcome this limitation of attention-based XAI, novel
studies on XAI have focused on human cognitive processes,
including purpose and interpretation, to develop explanations
of AI processes [17]. However, there are only a few studies
investigating the influence of attention-based XAI on trust
and acceptance of AI outputs. This study investigated how a
heatmap of an AI or social robot’s attention influences trust
in the AI or robot and acceptance of their outputs while they
point at the heatmap.
B. Pointing gestures of robots and AI
Various studies on robot pointing gestures have been done
in HRI thus far. Gulzar et. al [18] proposed a probabilistic
model for pointing and gesture detection accuracy. This
model can make plans for optimal pointing actions by
minimizing the probability of pointing errors. They also
proposed a measure of the accuracy of a pointing gesture
and a method to calibrate the model. The results of their
experiment suggested that the proposed model could reveal
the important properties of a successful pointing behavior.
Also, Wang et. al [19] experimentally investigated human
understanding of a humanoid robot’s pointing gestures. They
prepared a humanoid robot programmed to point to markers
on a screen and conducted experiments in which participants
were asked to identify the target that the robot was pointing
to. As a result, they found that participants could identify
these gestures and increase their performance on a location
detection task.
Moreover, there are studies on using physical robots
as pedagogical agents in educational environments. In the
study [20], various humanoid robot expressions including
pointing gestures have been utilized to emphasize important
information on a display, guiding students’ attention to it
and promoting their understanding of the information. It
has been experimentally confirmed that these robot gestures
effectively work to emphasize information.
To the best of our knowledge, there are few studies on
how a humanoid robot making pointing gestures influences
trust in the robot. Thus, we think that the originality of this
paper could contribute greatly to both HRI and XAI.
Regarding attention-based explanation with a heatmap,
Park et al. developed a justification explanation model (PJ-
X) [21]. They claimed and experimentally verified the effec-
tiveness of PJ-X multi-modal explanations that include both
pointing to visual evidence in heatmap-based explanations
for decisions and providing textual justifications. In their
models, an AI pointing to a heatmap is just highlighting the
hottest region in the map, which is similar to what is done
in our experiments.
III. HYP OTHE SES
In this study, we investigated how displaying a robot’s
attention heatmap while the robot points to it influences
human trust and acceptance of its outputs. Particularly,
we compared the influence with that of displaying an AI
attention heatmap with AI pointing, an AI attention heatmap
without AI pointing, and just an original image without a
heatmap.
First, regarding the AI attention heatmap, previous studies
showed that XAI has disadvantages in that people generally
have difficulty interpreting how AI recognizes a target image
only with heatmaps [15], [16]. Therefore, just displaying the
heatmaps of AI attention would not influence trust in AI and
acceptance of its outputs.
Next, in this study, AI pointing is displayed as a laser
dot cursor, which is simple and similar to a traditional
method [21]. The pointing of an AI is considered to capture
human attention the same as the pointing done by a social
robot [19], [20]. However, there is difficulty in interpreting
how an AI recognizes the focused location in an image [15],
[16]. Therefore, AI pointing to an AI attention heatmap
would not influence trust in an AI and acceptance of its
outputs.
Moreover, the pointing gesture of a social robot could
capture human attention [19], [20]. Since a social robot
has anthropomorphic features, these features might cause
people to interpret the robot’s pointing as the robot having
human attentional information processing based on theory
of mind [11]. Therefore, a social robot pointing to heatmaps
would increase trust in it and acceptance of its outputs.
The hypotheses are summarized as follows.
H1: Displaying an AI attention heatmap does not influ-
ence human trust in an AI and acceptance of its
outputs.
H2: An AI pointing to a heatmap of its attention does
not influence human trust in it and acceptance of its
outputs.
H3: A robot pointing to a heatmap of its attention
increases human trust in it and acceptance of its
outputs.
IV. EXP ERI MEN T
A. Materials and experimental task
There were two types of tasks. One was a drowsiness
detection task where the participants were required to answer
whether a human face displayed as an image was awake or
drowsy. The other was an obesity screening task where the
participants were required to answer whether the body shape
displayed was normal or obese. The facial images used were
from a drowsiness dataset [22], and the body-shape images
used were from a body-shape database [23]. In a previous
study, the drowsiness detection task was confirmed to be
more difficult than the obesity screening task [24].
Fig. 1 shows examples of the original human facial and
body-shape images, the heatmaps, and the heatmaps with AI
and robot pointing. In the heatmaps, the color red indicates
a higher attention focus, and blue indicates a lower one.
These heatmaps were used from a previous study where
heatmaps with high and low interpretability, rated by human
participants, were generated by two different AI models for
each task [24]. The AI’s accuracy rate for screening or
detecting each image was also generated by the AI models
when the heatmap was generated in the previous study, and
the accuracy rate was reflected in this experiment.
Also, the locations to which the AI and robot pointed were
calculated on the basis of the RGB values of each heatmap.
In particular, we looked for a certain pixel location where
the AI attention was most focused, calculating the highest
value of R value (G value +B value). In addition, if
there were multiple pixels that had the same highest value,
the highest location on the y-axis was selected in this study.
Although some heatmaps had multiple pixels with the same
highest value, they were placed next to each other.
Fig. 1. Examples of original human facial and body-shape images, heatmaps, and heatmaps to which AI and robot are pointing. In these example images,
heatmaps for drowsiness detection task have high interpretability, and those for obesity screening task have low interpretability.
Fig. 2. Task procedure in obesity screening task
The pointing of the AI was displayed as a pink laser dot
cursor, and that of the robot was displayed as an image in
which Sota (Vstone Co., Ltd.) pointed at a certain location
in a heatmap displayed on a tablet with a stick. The heatmap
which the robot pointed to was displayed as a still image,
and therefore, the robot did not move.
The task procedure is shown in Fig. 2:
(1) An original image was displayed at the center of the
display as a detection (or screening) problem for 5
seconds.
(2) The AI or robot showed its answer in red without a
heatmap, with a heatmap, or with a heatmap and the AI
or robot pointing to it.
(3) The participant decided to accept or reject the answer
by clicking.
When the AI or robot showed its answer, the original
image and the heatmap with or without the pointing were
simply located side by side. In the control condition, only
the original image was displayed with the AI answer.
B. Method
1) Experimental design and participants: The experiment
had a two-factor between-participants design. The experi-
mental factors were the task (obesity screening and drowsi-
ness detection) and the heatmap (no heatmap, heatmap,
heatmap with AI pointing, and heatmap with robot pointing).
A priori power analysis with G*Power indicated that at
least 179 participants were needed for a medium effect size
(f=.25) with the power at .80 and alpha at .05 [25].
To account for potential participant exclusions, a total of
250 participants were recruited through a cloud-sourcing
service provided by Yahoo! Japan. They were randomly
assigned to one of the conditions and conducted a task.
However, participants detected as inattentive by the attention
check items of the Directed Questions Scale (DQS) [26]
were excluded from the analysis. As a result, data of 243
participants (183 male and 60 female from 19 to 82 y/o,
M= 48.42,SD = 11.46) were used for the following
analysis: 34, 29, 31, and 32 participants in the drowsiness
detection task and 28, 28, 30, 31 participants in the obesity
screening task for the no-heatmap, heatmap, heatmap-with-
AI-pointing, and heatmap-with-robot-pointing conditions.
Fig. 3. AI or robot reliability, acceptance rate, and task accuracy rate in four heatmap conditions of each task. HM stands for heatmap.
2) Procedure: The participants first agreed with the in-
formed consent and read the explanations about the task pro-
cedure. They were required to get as many correct answers
as possible in the tasks.
Except for the heatmap-with-robot-pointing condition, the
participants were told that they were to perform a task with
an AI. The participants in the heatmap-with-robot-pointing
condition were told that they were to perform a task with
a robot. Also, the heatmap was explained as displaying
the AI’s attention in the heatmap and the heatmap-with-
AI-pointing conditions, and the heatmap was explained as
displaying the robot’s attention in the heatmap-with-robot-
pointing condition.
After that, the participants started the task. Each partic-
ipant answered regarding the original image 30 times. The
same 30 images were used across the four heatmap condi-
tions in each task. The order of the images was basically
randomized for each participant. However, the order was
only manipulated in a way that heatmaps with high and low
interpretability were alternately displayed. This manipulation
did not have any effect on the no-heatmap condition since
no heatmap was displayed.
During the task, after every 10 problems, the participants
answered two types of trust questionnaires to measure cog-
nitive and emotional trust: Cognitive trust is defined as “a
trustor’s rational expectations that a trustee will have the
necessary attributes to be relied upon, and emotional trust
is defined as “the extent to which one feels secure and
comfortable about relying on the trustee” [27].
To measure cognitive trust, the Multi-Dimensional Mea-
sure of Trust (MDMT) [28] was used. MDMT was developed
to measure a task partner’s reliability and competence; corre-
sponding to the definition of cognitive trust. The participants
rated how much the AI or robot fit each word (reliable, pre-
dictable, dependable, consistent, competent, skilled, capable,
and meticulous) on an 8-point scale (0: not at all 7: very).
Also, for emotional trust, we asked participants to answer
how much the AI or robot fit each word (secure, comfortable,
and content) on a 7-point scale (1: strongly disagree 7:
strongly agree) as in the previous study [27].
C. Results
1) Analyses for hypothetical tests: 2(task)×4(heatmap)
between-participants ANOVAs were conducted for the de-
pendent variables (Fig. 3). First, reliability, the rate at which
the AI or robot accurately answered the problems, was
calculated in each condition of each task. This analysis was
performed to verify that the reliabilities of the AI and a
robot were evenly distributed across the conditions in each
task. As a result, there were no significant interaction effect
(F(3,235) = 0.25, p = 0.86, η2
p<0.01) and main effect
of the heatmap factor (F(3,235) = 1.67, p = 0.17, η2
p=
0.02). Therefore, the reliabilities of the AI and a robot were
confirmed to be equivalent across the conditions in each task.
Also, there was a significant main effect of the task factor,
showing that the reliability was higher in the drowsiness
detection than in the obesity screening task (F(1,235) =
83.44, p < 0.001, η2
p= 0.26).
Next, the acceptance rate, the rate at which the participants
accepted the AI or robot answer, was calculated in each con-
dition of each task. As a result, there were no significant in-
teraction effect (F(3,235) = 1.22, p = 0.30, η2
p= 0.02) and
main effect on the heatmap factor (F(3,235) = 1.93, p =
0.13, η2
p= 0.02). Also, there was a significant main effect of
the task factor, showing that the acceptance rate was higher
in the drowsiness detection than in the obesity screening task
(F(1,235) = 11.54, p < 0.001, η 2
p= 0.05). This difference
was assumed to be because the drowsiness detection task
is more difficult for humans, and therefore, the participants
tended to rely on AI or robot answers as confirmed in the
previous study [24].
Moreover, the task accuracy rate, the rate at which the
participants accurately accepted or rejected the AI or robot
answer, was calculated in each condition of each task.
As a result, there were no significant interaction effect
(F(3,235) = 0.55, p = 0.66, η2
p= 0.01) and main effect on
the heatmap factor (F(3,235) = 2.03, p = 0.11, η2
p= 0.03).
Also, there was a significant main effect of the task factor,
showing that the accuracy rate was higher in the obesity
screening than in the drowsiness detection task (F(1,235) =
363.41, p < 0.001, η2
p= 0.61). This difference was assumed
to be because the obesity screening task is much easier for
humans as confirmed in the previous study [24].
Furthermore, regarding the ratings of MDMT and Emo-
tional trust, there were neither significant interactions nor
main effects.
As a result of the hypothetical tests, H1 (displaying an
AI attention heatmap does not influence human trust in an
AI and acceptance of its outputs) and H2 (an AI pointing
to a heatmap of its attention does not influence human
trust in it and acceptance of its outputs) were supported.
However, H3 (a robot pointing to a heatmap of its attention
increases human trust in it and acceptance of its outputs) was
not supported. On the basis of these results, we performed
additional analyses to reveal the effects of the robot pointing
gestures.
2) Additional analyses: As additional analyses, consider-
ing the high and low interpretability of the heatmaps, we
performed 2(task)×4(heatmap)×2(interpretability) mixed-
design ANOVAs for the dependent variables. A statistical
power of higher than .80 was assured through a post-hoc
power analysis with G*Power.
As a result of the analysis for the acceptance rate of the
AI or robot answers (Fig. 4), there was a significant two-way
interaction effect (F(3,235) = 3.87, p = 0.01, η2
p= 0.02).
Also, there was a significant simple interaction between the
heatmap and the interpretability factors in the drowsiness
detection task (F(3,235) = 10.08, p < 0.001, η2
p= 0.07).
There was no such difference in the obesity screening task
(F(3,470) = 0.20, p = 0.89, η2
p<0.01).
Next, subsequent analyses of the interaction in the drowsi-
ness detection task were performed. As a result, there
was a significant simple main effect of the interaction
revealing that the acceptance rates were significantly dif-
ferent for the low-interpretability condition (F(3,470) =
7.74, p < 0.001, η2
p= 0.09). Also, there was a marginally
significant difference for the high-interpretability condition
(F(3,470) = 2.32, p = 0.07, η2
p= 0.01).
Finally, for the low-interpretability condition in the
drowsiness detection task, the results of multiple compar-
isons with Ryan’s method showed that the acceptance rates
for the heatmap-with-AI-pointing and heatmap-with-robot-
pointing conditions were significantly lower than those for
the no-heatmap condition (t(470) = 5.35, p < 0.001, r =
0.70; t(470) = 4.54, p < 0.001, r = 0.64) and the heatmap
condition (t(470) = 3.20, p = 0.001, r = 0.51; t(470) =
2.38, p = 0.02, r = 0.40). Also, there were no sig-
nificant differences in the acceptance rates between the
heatmap-with-AI-pointing and heatmap-with-robot-pointing
conditions (ts(470) = 0.86, p = 0.39, r = 0.04) and the
no-heatmap and heatmap conditions (ts(470) = 2.11, p =
0.05, r = 0.10).
Furthermore, in connection with this, although there
was only a marginally significant difference for the high-
interpretability condition, the results of multiple comparisons
with Ryan’s method showed that there was a significant
difference in the acceptance rate between the no-heatmap
and heatmap-with-robot-pointing conditions for the high-
interpretability condition (ts(470) = 2.44, p = 0.02, r =
0.41). There were no other significant differences for the
high-interpretability condition.
In addition, there was also a significant main effect of
the interpretability factor showing that the acceptance rate
was higher for the high- than low-interpretability heatmap
(F(1,235) = 30.32, p < 0.001, η2
p= 0.01). Also, there
Fig. 4. Acceptance rate for high- and low-interpretability heatmaps in four
heatmap conditions of each task. HM stands for heatmap.
was also a significant main effect of the task factor showing
that the acceptance rate was higher in the drowsiness detec-
tion task than in the obesity screening task (F(1,235) =
11.68, p < 0.001, η2
p= 0.06). There was no significant
main effect of the heatmap factor (F(3,235) = 1.95, p =
0.12, η2
p<0.01).
We also performed the same ANOVAs on the reliability
and the task accuracy rate. However, there were neither sig-
nificant interactions nor main effects related to the heatmap
conditions.
As a result of the additional analyses, we found that in the
drowsiness detection task, a more difficult task, the heatmaps
with AI and robot pointing lowered the participants’ ac-
ceptance of their answers when the heatmaps with low
interpretability were displayed, and also, there is a possibility
that the heatmaps with robot pointing could increase the
acceptance rate when the heatmaps with high interpretability
were displayed.
3) Correlational analyses: Moreover, we additionally per-
formed correlational analyses between the trust ratings, the
ratings of MDMT and emotional trust, and the acceptance
rate of the AI or robot answer in each heatmap condition of
each task.
Since each participant rated MDMT and emotional trust
three times after every 10 problems, the average acceptance
rate every 10 problems and the trust ratings after every
10 problems were used for the correlational analyses. As
a result, there were 102, 87, 93, and 96 datasets in the
TABLE I
CORRELATIONAL ANALYSES BETWEEN RATINGS OF MDMT AND E MOTI ONAL TR UST AN D ACCEP TANCE R ATES I N EACH C ONDI TION O F EACH TAS K. *
INDICATES p < 0.05,AND ** INDICATES p < 0.001.
Task Trust rating No heatmap Heatmap Heatmap with Heatmap with
AI pointing robot pointing
Drowsiness detection MDMT r=0.02, p = 0.83 ** r= 0.28, p < 0.001 r= 0.14, p = 0.20 r= 0.16, p = 0.13
Emotional trust r= 0.16, p = 0.13 ** r= 0.39, p < 0.001 r= 0.13, p = 0.24 ** r= 0.30, p < 0.001
Obesity screening MDMT r= 0.09, p = 0.36 r= 0.07, p = 0.53 r=0.15, p = 0.14 r= 0.13, p = 0.19
Emotional trust r= 0.18, p = 0.06 r= 0.12, p = 0.27 r=0.04, p = 0.70 *r= 0.21, p = 0.04
drowsiness detection task and 84, 84, 90, and 93 datasets
in the obesity screening task for the no-heatmap, heatmap,
heatmap-with-AI-pointing, and heatmap-with-robot-pointing
conditions. Statistical powers of higher than .80 were assured
through post-hoc power analyses with G*Power.
The results are shown in Table I. First, there were sig-
nificant correlations between the ratings of emotional trust
and the acceptance rates in the heatmap-with-robot-pointing
condition for both tasks. Also, there were significant correla-
tions between the MDMT and emotional trust ratings and the
acceptance rates in the heatmap condition for the drowsiness
detection task.
As a result of the correlational analyses, even though
the AI and robot pointing had the same effect on the
acceptance rates, as shown in the above analysis, we found
that there was a correlational relationship between emotional
trust in the robot and acceptance of the robot’s answers in
the heatmap-with-robot-pointing condition, but there was no
such relationship in the heatmap-with-AI-pointing condition.
V. DISCUSSION
A. Summary of results
This study investigated how a social robot pointing to
a robot attention heatmap influences human trust in the
robot and acceptance of its outputs. As a result, we found
that displaying an AI attention heatmap with and without
AI pointing did not influence human trust in the AI and
acceptance of its outputs, and robot pointing did not influence
human trust in the robot and acceptance of its outputs.
However, additional analyses that considered heatmap in-
terpretability revealed that heatmaps with AI and robot point-
ing lowered the participants’ acceptance of their answers
when the heatmaps had low interpretability in the drowsi-
ness detection task, a more difficult task. Also, heatmaps
with robot pointing had the possibility of increasing the
participants’ acceptance of their answers when the heatmaps
had high interpretability in the drowsiness detection task. In
addition, the acceptance of the robot’s answers correlated to
emotional trust in the robot.
B. Effects of robot pointing gestures
Regarding the effects of AI and robot pointing, as in the
previous study [19], [20], both pointings were considered
to capture the participants’ attention well. However, in this
study, the AI and robot pointings influenced the participants’
decisions to accept AI and robot answers only for the low-
interpretability heatmap in the drowsiness detection task.
The previous study showed that the obesity screening
task is much easier than the drowsiness detection task,
and the participants of that study did not rely on the AI
attention heatmap in the obesity screening task since the
reliability of the AI was easily identified [24]. Therefore,
as in the previous study, the participants in this study were
also considered to be influenced by the heatmaps only in the
drowsiness detection task, the more difficult task.
Also, the previous study showed that people generally
tend to overestimate AI errors; even though people perceive
errors made by AI to be small, they still decrease their
acceptance of the AI’s answers extremely [29]. Therefore,
pointing at the low-interpretability heatmap was considered
to be perceived as the result of possible AI or robot errors,
and the participants in this study might have decreased their
acceptance of the AI or robot answers extremely.
In addition, we also showed a possibility that heatmaps
with robot pointing might increase the acceptance rate when
heatmaps with high interpretability are displayed. The an-
thropomorphic features of a social robot activate theory of
mind [13], [11], and therefore, a robot making a pointing
gesture at high-interpretability heatmaps could trigger people
to assume that the robot has human attentional information
processing and increase acceptance of the robot’s outputs as
reliable information.
Finally, there was a correlational relationship between
emotional trust in the robot and acceptance of the robot’s
answers when the robot was seen pointing, but there was
no such correlational relationship when the AI was pointing.
This difference is also considered to have occurred because
of the effect of the anthropomorphic features of the social
robot. These features can cause people to have positive
emotions towards a robot [13]. Therefore, in various different
situations, emotional trust in a robot and human use of a
robot might be strongly connected.
C. Application
These results infer that a robot pointing to its attention
heatmap could be used to trigger people to increase and
decrease their acceptance of the robot’s outputs and their
emotional trust in the robot. In particular, when people tend
to over-rely on robot outputs, such as complacency (low vigi-
lance towards possible system failures) [30], a robot pointing
to an attention heatmap with low interpretability would
lead to decreased acceptance of its answer and emotional
trust. In addition, when people tend to under-rely on robot
outputs, such as algorism aversion (a behavior of discounting
algorithmic decisions) [31], [32], a robot pointing to an
attention heatmap with high interpretability could increase
acceptance of its answer and emotional trust.
D. Limitations
In this study, the robot pointing gesture was displayed as
a still image, and therefore, the robot did not move. The
previous studies showed that a physical pointing gesture of
a social robot could capture human attention [19], [20]. Also,
in a face-to-face situation, people feel a physically present
robot to be more likable, helpful, enjoyable, trustworthy,
and credible [33], [34] and become more compliant with
a physically present robot than with a robot displayed on
a screen [35]. Therefore, a robot pointing gesture to a
heatmap with a physical movement, especially in a face-
to-face situation, is considered to greatly influence trust in a
robot and acceptance of its outputs.
VI. CONCLUSION
This study investigated how a social robot pointing to
a heatmap influences human trust and acceptance of its
outputs. As a result, heatmaps with AI and robot pointing
lowered participants’ acceptance of their answers when the
heatmaps had low interpretability in a more difficult task.
Also, the robot pointing at the heatmaps showed the pos-
sibility of increasing acceptance of its answer when the
heatmaps had high interpretability in a more difficult task.
In addition, the acceptance of the robot’s answers correlated
with emotional trust in the robot.
The robot pointing at the low-interpretability heatmap is
considered to be perceived as the robot displaying a possible
error, so the participants might have decreased their trust and
acceptance of the robot’s answers extremely. Also, the robot
pointing at the high-interpretability heatmap is considered
to activate theory of mind and cause people to assume that
the robot has human attentional processes and thus increase
acceptance of its answers. In addition, the acceptance of the
robot’s answers correlated with emotional trust in the robot.
A robot pointing at its attention heatmap could be used to
control human behaviors and emotional trust in human-robot
interactions.
ACKNOWLEDGMENT
This work was partially supported by JST, CREST (JP-
MJCR21D4), Japan.
REFERENCES
[1] A. L. Baker, E. K. Phillips, D. Ullman, and J. R. Keebler, “Toward
an understanding of trust repair in human-robot interaction: Current
research and future directions,” ACM Transactions on Interactive
Intelligent Systems, vol. 8, no. 4, pp. 1–30, 2018. [Online]. Available:
https://doi.org/10.1145/3181671
[2] J. Lee and K. See, “Trust in automation: Designing for appropriate
reliance,” Human Factors, vol. 46, no. 1, pp. 50–80, 2004. [Online].
Available: https://doi.org/10.1518/hfes.46.1.50 30392
[3] K. Okamura and S. Yamada, “Empirical evaluations of framework for
adaptive trust calibration in human-AI cooperation, IEEE Access, pp.
1–1, 2020.
[4] E. Glikson and A. W. Woolley, “Human trust in artificial intelligence:
Review of empirical research, Academy of Management Annals,
vol. 14, no. 2, pp. 627–660, 2009.
[5] K. A. Hoff and M. Bashir, “Trust in automation: Integrating empirical
evidence on factors that influence trust, Human Factors, vol. 3, no. 57,
pp. 407—-434, 2015.
[6] U. Kayande, A. D. Bruyn, G. L. Lilien, A. Rangaswamy, and G. H. van
Bruggen, “How incorporating feedback mechanisms in a DSS affects
DSS evaluations, Information Systems Research, vol. 20, no. 4, pp.
527–546, 2009.
[7] D. Gunning, M. Stefik, J. Choi, T. Miller, S. Stumpf, and G. Z. Yang,
“XAI–Explainable artificial intelligence,” Science Robotics, vol. 4,
no. 37, 2019.
[8] W. Pieters, “Explanation and trust: what to tell the user in security
and ai?” Ethics and Information Technology, vol. 1, no. 13, pp. 53–
64, 2011.
[9] M. T. Dzindolet, S. A. Peterson, R. A. Pomranky, and L. Pierce,
“The role of trust in automation reliance,” International Journal
of Human-Computer Studies, vol. 58, no. 6, pp. 697–718, 2003.
[Online]. Available: https://doi.org/10.1016/S1071-5819(03)00038-7
[10] W. Wang and I. Benbasat, “Recommendation agents for electronic
commerce: Effects of explanation facilities on trusting beliefs, Jour-
nal of Management Information Systems, vol. 4, no. 23, pp. 217–246,
2007.
[11] W. Mou, M. Ruocco, D. Zanatto, and A. Cangelosi, “When would
you trust a robot? a study on trust and theory of mind in human-
robot interactions,” in Proceedings of the 29th IEEE International
Conference on Robot and Human Interactive Communication, ser. RO-
MAN ’20, 2020.
[12] A. Leslie, “Pretense and representation: The origins of “theory of
mind”,” Psychological Review, vol. 94, no. 4, pp. 412–426, 1987.
[13] E. J. Carter, M. N. Mistry, G. P. K. Carr, B. A. Kelly, and J. K.
Hodgins, “Playing catch with robots: Incorporating social gestures into
physical interactions,” in Proceedings of the 23rd IEEE International
Symposium on Robot and Human Interactive Communication, ser. RO-
MAN ’14, 2014, pp. 231—-236.
[14] D. Gunning, E. Vorm, J. Y. Wang, and M. Turek, “DARPA’s explain-
able AI (XAI) program: A retrospective, Applied AI Letters, vol. 2,
no. 4, 2021.
[15] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and
D. Batra, “Grad-CAM: Visual explanations from deep networks via
gradient-based localization,” in 2017 IEEE International Conference
on Computer Vision (ICCV2017), 2017.
[16] A. Chattopadhay, A. Sarkar, P. Howlader, and V. N. Balasubramanian,
“Grad-CAM: Generalized gradient-based visual explanations for deep
convolutional networks, in 2018 IEEE Winter Conference on Appli-
cations of Computer Vision (WACV), 2018.
[17] A. R. Akula, K. Wang, C. i Saba-Sadiya, H. Lu, S. Todorovic, J. Chai,
and S.-C. Zhu, “CX-ToM: Counterfactual explanations with theory-
of-mind for enhancing human trust in image recognition models,”
iScience, vol. 25, no. 1, p. 103581, 2022.
[18] K. Gulzar and V. Kyrki, “See what i mean-probabilistic optimization
of robot pointing gestures,” in 2015 IEEE-RAS 15th International
Conference on Humanoid Robots (Humanoids). IEEE, 2015.
[19] X. Wang, M.-A. Williams, P. Gardenfors, J. Vitale, S. Abidi, B. John-
ston, B. Kuipers, and A. Huang, “Directing human attention with
pointing,” in The 23rd IEEE International Symposium on Robot and
Human Interactive Communication. IEEE, 2014.
[20] T. Ishino, M. Goto, and A. Kashihara, A robot for reconstructing pre-
sentation behavior in lecture,” in Proceedings of the 6th International
Conference on Human-Agent Interaction. ACM.
[21] D. H. Park, L. A. Hendricks, Z. Akata, A. Rohrbach, B. Schiele,
T. Darrell, and M. Rohrbach, “Multimodal explanations: Justifying
decisions and pointing to the evidence, in 2018 IEEE/CVF Conference
on Computer Vision and Pattern Recognition. IEEE.
[22] R. Ghoddoosian, M. Galib, and V. Athitsos, “A realistic dataset and
baseline temporal model for early drowsiness detection, in 2019
IEEE/CVF Conference on Computer Vision and Pattern Recognition
Workshops (CVPRW), 2019, pp. 178–187.
[23] J. M. Moussally, L. Rochat, A. Posada, and M. V. der Linden,
“A database of body-only computer-generated pictures of women
for body-image studies: Development and preliminary validation,
Behavior Research Methods, vol. 49, no. 1, pp. 172–183, 2016.
[24] A. Maehigashi, Y. Fukuchi, and S. Yamada, “Modeling reliance on
xai indicating its purpose and attention,” arXiv:2302.08067, 2003.
[Online]. Available: https://doi.org/10.48550/arXiv.2302.08067
[25] F. Faul, E. Erdfelder, A.-G. Lang, and A. Buchner, “G*power 3: A
flexible statistical power analysis program for the social, behavioral,
and biomedical sciences,” Behavior Research Methods, vol. 39, no. 2,
pp. 175–191, 2007.
[26] M. R. Maniaci and R. D. Rogge, “Caring about carelessness: Par-
ticipant inattention and its effects on research,” vol. 48, pp. 61–83,
2014.
[27] Komiak and Benbasat, “The effects of personalization and familiarity
on trust and adoption of recommendation agents,” MIS Quarterly,
vol. 30, no. 4, p. 941, 2006.
[28] D. Ullman and B. F. Malle, “Measuring gains and losses in human-
robot trust: Evidence for differentiable components of trust, in Pro-
ceedings of the 14th ACM/IEEE International Conference on Human-
Robot Interaction, ser. HRI ’19, March 11-14, Daegu, Republic of
Korea 2019, pp. 618–619.
[29] M. T. Dzindolet, L. G. Pierce, H. P. Beck, and L. A. Dawe, “The
perceived utility of human and automated aids in a visual detection
task,” Human Factors, vol. 44, no. 1, pp. 79–94, 2002.
[30] R. Parasuraman and D. Manzey, “Complacency and bias in human use
of automation: An attentional integration,” Human Factors, vol. 52,
no. 3, pp. 381–410, 2010.
[31] B. J. Dietvorst, J. P. Simmons, and C. Massey, “Algorithm aversion:
People erroneously avoid algorithms after seeing them err. Journal
of Experimental Psychology: General, vol. 144, no. 1, pp. 114–126,
2015.
[32] H. Mahmud, A. N. Islam, S. I. Ahmed, and K. Smolander, “What
influences algorithmic decision-making? a systematic literature review
on algorithm aversion, Technological Forecasting and Social Change,
vol. 175, p. 121390, 2021.
[33] A. Powers, S. Kiesler, S. Fussell, and C. Torrey, “Comparing a
computer agent with a humanoid robot,” in Proceedings of the
ACM/IEEE international conference on Human-robot interaction, ser.
HRI ’07. New York, NY: Association for Computing Machinery,
Martch 10-12, Arlington, Virginia, USA 2007, pp. 145–152. [Online].
Available: https://doi.org/10.1145/1228716.1228736
[34] S. Kiesler, A. Powers, S. Fussell, and C. Torrey, Anthropomorphic
interactions with a robot and robot–like agent,” Social Cognition,
vol. 26, no. 2, pp. 169–181, 2008. [Online]. Available:
10.1521/soco.2008.26.2.169
[35] W. A. Bainbridge, J. W. Hart, E. S. Kim, and B. Scassellati,
“The benefits of interactions with physically present robots
over video-displayed agents, International Journal of Social
Robotics, vol. 3, no. 1, pp. 41–52, 2011. [Online]. Available:
https://doi.org/10.1007/s12369-010-0082-7
... Also, regarding the heatmaps with robot pointing, the previous study [20] showed that a robot pointing to an AI attention heatmap increased human acceptance of robot answers more than just displaying the AI attention heatmap without pointing. In the previous study, the robot was explained as performing the visual detection task all by itself. ...
Conference Paper
Full-text available
This study used XAI, which shows its purposes and attention as explanations of its process, and investigated how these explanations affect human trust in and use of AI. In this study, we generated heat maps indicating AI attention, conducted Experiment 1 to confirm the validity of the interpretability of the heat maps, and conducted Experiment 2 to investigate the effects of the purpose and heat maps in terms of reliance (depending on AI) and compliance (accepting answers of AI). The results of structural equation modeling (SEM) analyses showed that (1) displaying the purpose of AI positively and negatively influenced trust depending on the types of AI usage, reliance or compliance, and task difficulty, (2) just displaying the heat maps negatively influenced trust in a more difficult task, and (3) the heat maps positively influenced trust according to their interpretability in a more difficult task.
Article
Full-text available
We propose CX-ToM, short for counterfactual explanations with theory-of-mind, a new explainable AI (XAI) framework for explaining decisions made by a deep convolutional neural network (CNN). In contrast to the current methods in XAI that generate explanations as a single shot response, we pose explanation as an iterative communication process, i.e. dialog, between the machine and human user. More concretely, our CX-ToM framework generates a sequence of explanations in a dialog by mediating the differences between the minds of the machine and human user. To do this, we use Theory of Mind (ToM) which helps us in explicitly modeling human’s intention, machine’s mind as inferred by the human as well as human’s mind as inferred by the machine. Moreover, most state-of-the-art XAI frameworks provide attention (or heat map) based explanations. In our work, we show that these attention-based explanations are not sufficient for increasing human trust in the underlying CNN model. In CX-ToM, we instead use counterfactual explanations called fault-lines which we define as follows: given an input image I for which a CNN classification model M predicts class cpred, a fault-line identifies the minimal semantic-level features (e.g., stripes on zebra), referred to as explainable concepts, that need to be added to or deleted from I in order to alter the classification category of I by M to another specified class calt. Extensive experiments verify our hypotheses, demonstrating that our CX-ToM significantly outperforms the state-of-the-art XAI models.
Article
Full-text available
DARPA formulated the Explainable Artificial Intelligence (XAI) program in 2015 with the goal to enable end users to better understand, trust, and effectively manage artificially intelligent systems. In 2017, the four-year XAI research program began. Now, as XAI comes to an end in 2021, it is time to reflect on what succeeded, what failed, and what was learned. This article summarizes the goals, organization, and research progress of the XAI Program.
Article
Full-text available
Recent advances in AI technologies are dramatically changing the world and impacting our daily life. However, human users still essentially need to cooperate with AI systems to complete tasks as such technologies are never perfect. For optimal performance and safety in human-AI cooperation, human users must appropriately adjust their level of trust to the actual reliability of AI systems. Poorly calibrated trust can be a major cause of serious issues with safety and efficiency. Previous works on trust calibration have emphasized the importance of system transparency for avoiding trust miscalibration. Measuring and influencing trust are still challenging issues; consequently, not many studies have focused on how to detect improper trust calibration nor how to mitigate it. We approach these research challenges with a behavior-based approach to capture the status of calibration. A framework of adaptive trust calibration is proposed, including a formal definition of improper trust calibration called “a trust equation”. It involves cognitive cues called “trust calibration cues (TCCs)” and a conceptual entity called “trust calibration AI” (TCAI), which supervises the status of trust calibration. We conducted empirical evaluations using a simulated drone environment with two types of cooperative tasks: a visual search task and a real-time navigation task. We designed trust changing scenarios and evaluated our framework. The results demonstrated that adaptively presenting a TCC could promote trust calibration more effectively than a traditional system transparency approach.
Article
With the continuing application of artificial intelligence (AI) technologies in decision-making, algorithmic decision-making is becoming more efficient, often even outperforming humans. Despite this superior performance, people often consciously or unconsciously display reluctance to rely on algorithms, a phenomenon known as algorithm aversion. Viewed as a behavioral anomaly, algorithm aversion has recently attracted much scholarly attention. With a view to synthesize the findings of existing literature, we systematically review 80 empirical studies identified through searching in seven academic databases and using the snowballing technique. We inductively categorize the influencing factors of algorithm aversion under four main themes: algorithm, individual, task, and high-level. Our analysis reveals that although algorithm and individual factors have been investigated extensively, very little attention has been given to exploring the task and high-level factors. We contribute to algorithm aversion literature by proposing a comprehensive framework, highlighting open issues in existing studies, and outlining several research avenues that could be handled in future research. Our model could guide developers in designing and developing and managers in implementing and using of algorithmic decision.
Article
Artificial Intelligence (AI) characterizes a new generation of technologies capable of interacting with the environment and aiming to simulate human intelligence. The success of integrating AI into organizations critically depends on workers’ trust in AI technology. This review explains how AI differs from other technologies and presents the existing empirical research on the determinants of human trust in AI, conducted in multiple disciplines over the last twenty years. Based on the reviewed literature, we identify the form of AI representation (robot, virtual, embedded) and the level of AI’s machine intelligence (i.e. its capabilities) as important antecedents to the development of trust and propose a framework that addresses the elements that shape users’ cognitive and emotional trust. Our review reveals the important role of AI’s tangibility, transparency, reliability and immediacy behaviors in developing cognitive trust, and the role of AI’s anthropomorphism specifically for emotional trust. We also note several limitations in the current evidence base, such as diversity of trust measures and over-reliance on short-term, small sample, and experimental studies, where the development of trust is likely to be different than in longer term, higher-stakes field environments. Based on our review, we suggest the most promising paths for future research.