Content uploaded by Yosuke Fukuchi
Author content
All content in this area was uploaded by Yosuke Fukuchi on Nov 01, 2024
Content may be subject to copyright.
User Decision Guidance with Selective Explanation Presentation
from Explainable-AI
Yosuke Fukuchi1and Seiji Yamada2,3
Abstract— This paper addresses the challenge of selecting
explanations for XAI (Explainable AI)-based Intelligent Deci-
sion Support Systems (IDSSs). IDSSs have shown promise in
improving user decisions through XAI-generated explanations
along with AI predictions, and the development of XAI made
it possible to generate a variety of such explanations. However,
how IDSSs should select explanations to enhance user decision-
making remains an open question. This paper proposes X-
Selector, a method for selectively presenting XAI explanations.
It enables IDSSs to strategically guide users to an AI-suggested
decision by predicting the impact of different combinations of
explanations on a user’s decision and selecting the combination
that is expected to minimize the discrepancy between an AI
suggestion and a user decision. We compared the efficacy of
X-Selector with two naive strategies (all possible explanations
and explanations only for the most likely prediction) and two
baselines (no explanation and no AI support) in virtual stock-
trading support scenarios. The results suggest the potential of
X-Selector to guide users to AI-suggested decisions and improve
task performance under the condition of a high AI accuracy.
I. INTRODUCTION
Intelligent Decision Support Systems (IDSSs) [1], empow-
ered by Artificial Intelligence (AI), have the potential to help
users make better decisions by introducing explainability into
their support. An increasing number of methods have been
proposed for achieving explainable AIs (XAIs) [2], and the
development of large language models (LLMs) has also made
it possible to generate various post-hoc explanations that
justify AI predictions. Previous studies have integrated such
XAI methods into IDSSs and shown their effectiveness in
presenting explanations along with AI predictions in diverse
applications [3], [4].
Now that IDSSs can have a variety of explanation can-
didates, a new question arises as to which explanations an
IDSS should provide in dynamic interaction. Explanation is
a complex cognitive process [5]. Although XAI explanations
can potentially guide users to make better decisions, there is
also a risk of having negative effects on explainees’ deci-
sions. Various causes including explanation uninterpretabil-
ity [6], [7], information overload [8], [9], and contextual
inaccuracy [10] can affect users and thus the performance
of decision-making. A subtle difference in the nuance of
a linguistic explanation can also have a different impact
and sometimes mislead user decisions when influenced by
the context, the status of the task, and the cognitive and
1Faculty of Systems Design, Tokyo Metropolitan University, Tokyo,
Japan fukuchi@tmu.ac.jp.2Digital Content and Media Sciences
Research Division, National Institute of Infomatics, Tokyo, Japan. 3The
Graduate University for Advanced Studies, SOKENDAI, Tokyo, Japan. This
work was supported in part by JST CREST Grant Number JPMJCR21D4
and JSPS KAKENHI Grant Number JP24K20846.
psychological status of the users. Conversely, we can expect
that IDSSs can be greatly enhanced if they can strategically
select explanations that are likely to lead users to better
decisions while taking the situation into account.
To address the question of how IDSSs can select ex-
planations to improve user decisions, this paper proposes
X-Selector, a method for dynamically selecting which ex-
planations to provide along with an AI prediction. The
main characteristic of X-Selector is that it predicts how
explanations affect user decision-making for each trial and
attempts to guide users to an AI-suggested decision referring
to the prediction results. The design of guiding users with ex-
planation selection is inspired by libertarian paternalism [11],
the idea of influencing one’s choices to better ones while
embracing the autonomy of their decision-making.
This paper also reports user experiments that simulated
stock trading with an XAI-based IDSS. In a preliminary ex-
periment, we compared two naive but common strategies—
ALL (providing all possible explanations) and ARGMAX
(providing only explanations for the AI’s most probable
prediction)—against baseline scenarios providing no expla-
nations or no decision support. The results suggest that the
ARGMAX strategy works better with high AI accuracy,
and ALL is more effective when AI accuracy is lower,
indicating that the strategy for selecting explanations affects
user performance. In the second experiment, we compared
the results of explanations selected by X-Selector with
ARGMAX and ALL. The results indicate the potential of X-
Selector’s selective explanations to more strongly lead users
to suggested decisions and to outperform ARGMAX when
AI accuracy is high.
II. BACKGROU ND
A. XAIs for deep learning models
While various methods such as Fuzzy Logic and Evo-
lutionary Computing have been introduced to IDSSs, this
paper targets IDSSs with Deep Learning (DL) models. IDSSs
driven by DL models are capable of dealing with high-
dimensional data such as visual images and are actively
studied in diverse fields [12]–[14]. Due to their blackbox
nature, explainability for DL models is also an area of active
research, and this can potentially offer benefits for IDSSs.
There are various forms of explanations depending on the
nature of the target AI. Common explanations for visual
information processing AIs include presenting saliency maps.
The class activation map (CAM) is a widely used method for
visualizing a saliency map of convolutional neural network
(CNN) layers [15]. It identifies the regions of an input image
that contribute the most for a model to classify the image into
a particular class.
Language is also a common modality of XAIs, and free-
text explanation is becoming rapidly available thanks to
the advance of LLMs [16]. LLMs can generate post-hoc
explanations for AI predictions. Here, post-hoc means that
the explanations are generated after the AI’s decision-making
process has occurred, as opposed to intrinsic methods that
generate explanations in an integral part of that process [2].
B. Human-XAI interaction
The theme of this study involves how to facilitate users’
appropriate use of AI. Avoiding human over/under-reliance
on an AI is a fundamental problem of human-AI interac-
tion [17]. Here, over-reliance is a state in which a human
overestimates the capability of an AI and blindly follows its
decision, whereas under-reliance is a state in which a human
misuses an AI even though it can perform well.
Although explanation is believed to generally help people
appropriately use AI by providing transparency in AI predic-
tions, previous studies suggest that XAI explanations do not
always work positively [18]. Maehigashi et al. demonstrated
that presenting AI saliency maps has different effects on
user trust in an AI depending on the task difficulty and
the interpretability of the saliency map [6]. Herm revealed
that the type of XAI explanation strongly influences users’
cognitive load, task performance, and task time [9]. Panigutti
et al. conducted a user study with an ontology-based clinical
IDSS, and they found that the users were influenced by
the explanations despite the low perceived explanation qual-
ity [19]. These results suggest potential risks of triggering
under-reliance with explanations or, conversely, leading to
users blindly following explanations from an IDSS even if
the conclusion drawn from the explanations is incorrect.
This study aims to computationally predict how explana-
tions affect user decisions in order to avoid misleading users
and encourage them to make better decisions by selecting
explanations. Work by Wiegreffe et al. [16] shares a similar
concept with this study. They propose a method of evaluat-
ing explanations generated by LLMs by predicting human
ratings of their acceptability. This approach is pivotal in
understanding how users perceive AI-generated explanations.
However, our study diverges by focusing on the behavioral
impacts of these explanations on human decision-making.
We are particularly interested in how these explanations can
alter the decisions made by users getting an IDSS’s support,
rather than just their perceptions of the explanations. Another
relevant study, Pred-RC [20], [21], aims to predict the effect
of explanations of AI performance so that users can avoid
over/under-reliance. It dynamically predicts a user’s binary
judgment of whether s/he assigns a task to the AI and
selects explanations that guide him/her to better assignment.
X-Selector aims to take a further step to predict concrete
decisions taking the effects of explanations into account and
proactively influences them to improve the performance of
each decision-making.
III. X-SEL ECTOR
A. Overview
This paper proposes X-Selector, a method for dynamically
selecting explanations for AI predictions. The idea of X-
Selector is that it predicts user decisions under possible
combinations of explanations and chooses the best one that is
predicted to minimize the discrepancy between the decision
that the user is likely to make and the AI-suggested one.
B. Algorithm
The main components of X-Selector are UserModel and
π. UserModel is a model of a user who makes decision du.
X-Selector uses it for user decision prediction:
UserModel(c
c
c,x
x
x,d) = P(du=d|c
c
c,x
x
x).(1)
The output of UserModel is represented as a probaility
distribution of duconditioned by c
c
cand x
x
x, where x
x
x∈X
X
Xis a
combination of explanations to be presented to the user, and
c
c
crepresents all the other contextual information including
AI predictions, task status, and user status. In this paper,
we developed a dataset of (c
c
c,x
x
x,and du) and prepared a
machine learning model that was trained with the dataset
for implementing this.
In addition, X-Selector has a policy π, which considers a
decision dAI based on c
c
c. This inference is done in parallel
with user decision-making:
π(c
c
c,d) = P(dAI =d|c
c
c).(2)
X-Selector aims to minimize the discrepancy between du
and dAI by comparing the effect of each x
x
xon du. The selected
combination ˆ
x
x
xis calculated as:
ˆ
x
x
x=argminx
x
x|Ed∼UserModel(c
c
c,x
x
x,d)[d]−Ed∼π(c
c
c,d)[d]|.(3)
To calculate this equation, X-Selector simulates how x
x
xwill
change duusing UserModel and aims to choose the best one
that guides duto dAI the most.
C. Implementation
1) Task: We implemented X-Selector in a stock trading
simulator in which users get support from an XAI-based
IDSS. Figure 1 shows screenshots of the simulator. The
simulation was conducted on a website. Participants were
virtually given three million JPY and traded stocks for 60
days with a stock price chart, AI prediction of the future
stock price, and explanations for the prediction.
In the simulation, participants checked the opening price
and a price chart for each day and decided whether to buy
stocks with the funds they had, sell stocks they had, or
hold their position. In accordance with Japan’s general stock
trading system, participants could trade stocks in units of
100 shares. Participants were asked to show their decision
twice a day to clarify the influence of the explanations. They
were first asked to decide an initial order d′, that is, the
amount of trade only with chart information and without the
support of the IDSS. Then, the IDSS showed a bar graph that
indicated the output of a stock price prediction model and
(a) Chart
(b) Examples of StockAI’s prediction and its explanations
Fig. 1: Screenshots of trading simulator
its explanations. We did not explicitly show dAI to enhance
the autonomy of users’ decision-making, which is inspired
by libertarian paternalism [11], the idea of affecting behavior
while respecting freedom of choice as well. However, we can
easily extend X-Selector to a setting in which dAI is given
to users by including it with c
c
c(when you always show dAI)
or x
x
x(when you want to selectively show dAI ). Finally, they
input their final order d. After this, the simulator immediately
transited to the next day. The positions carried over from
the final day were converted into cash on the basis of the
average stock price over the next five days to calculate the
participants’ total performance.
2) StockAI: In the task, an IDSS provides a prediction of
a stock price prediction model (StockAI) as user support.
StockAI is a machine-learning model that is designed to
predict the average stock price in the next five business days,
and we used its prediction as the target of the explanation
provided to users. StockAI predicts future stock prices on
the basis of an image of a candlestick chart. Although using
a candlestick chart as an input does not necessarily lead
to better performance than modern approaches proposed in
the latest studies [22], we chose this because of the better
understandability of saliency maps generated with the model
as an explanation of AI predictions. Note that the aim of
this research is not building a high-performance prediction
model but investigating the interaction between a human and
an AI whose performance is not necessarily perfect.
Linear layers
𝑑 − 𝑑′
𝑖𝑝δ
𝒄
𝒙!"#
concat
𝑑′
𝒙
𝒙$%&$
embed
𝑃(𝑑 ≠ 𝑑!)
Fig. 2: Structure of UserModel
For the implementation of StockAI, we used ResNet-
18 [23], a deep-learning visual information processing
model, using the PyTorch library 1. The StockAI is trained in
a supervised manner; it classifies the ratio of the future stock
price to the opening price of the day into three classes: BULL
(over +2%), NEUTRAL (from -2 to +2%), and BEAR (under
-2%). The prediction results are presented as a bar graph of
the probability distribution for each class, which hereafter
denoted as p. For the training, we collected the historical
stock data (from 2018/5/18 to 2023/5/16) of companies that
are included in the Japanese stock index Nikkei225. We split
the data by stock code, with three-quarters of the data as the
training dataset and the remainder as the test. The accuracy
with which the model was able to predict the correct class
among the three classes was 0.474, and the accuracy for
binary classification, or the matching rate of the sign of the
expected value of the model’s prediction and that of actual
fluctuations, was 0.63 for the test dataset.
3) Explanations: We prepared two types of explanations:
saliency maps and free-texts. We applied CAM-based meth-
ods available in the pytorch-grad-cam package 2to StockAI
and adopted Score-CAM [24] because it most clearly visual-
izes saliency maps of StockAI. Because CAM-based methods
can generate a saliency map for each prediction class, three
maps were acquired for each prediction. Let x
x
xmap be the set
of the acquired maps.
In addition, we created a set of free-text explanations
based on the GPT-4V model in the Open-AI API [25], which
allows images as input. We input a chart with a prompt
that asked GPT-4V to generate two explanation sentences
that justify each prediction class (BULL, NEUTRAL, and
BEAR). Therefore, we acquired six sentences in total for
each chart. Let us denote the set of them by x
x
xtext.
As a result, three saliency maps and six free-text ex-
planations were available for each trading day, and X-
Selector considered 29=512 combinations of the selected
explanations (ˆ
x
x
x⊆x
x
xmap ∪x
x
xtext).
4) Models: We implemented UserModel with a deep
learning model (Figure 2). The input of UserModel is a
tuple (c
c
c,x
x
x).c
c
cincludes four variables: date i, StockAI’s
prediction p, total rate δ, and initial order d′.iis a categorical
variable that embeds the context of the day such as the stock
price. pis a three-dimensional vector that corresponds to
the values in the bar graph (Fig. 1b). δis the percentage
1https://pytorch.org/
2https://github.com/frgfm/torch-cam
increase or decrease of the user’s total asset from the initial
amount. iand the other variables are encoded in 2048-
dimensional vectors (hi,hp,hr,hd′)with the Embedding and
Linear modules implemented in PyTorch, respectively.
Let us denote x
x
xmap and x
x
xtext as a set {(x,cls,f l ag)},
where xis the raw data of an explanation, and cls ∈
{BULL,NEUTRAL,BEAR}.f lag =1 if xis to be pre-
sented, and 0 when hidden. x
x
xmap and x
x
xtext are also encoded
in 2048-dimensional vectors (hmap,htext):
hmap =∑
x,cls,
f lag ∈x
x
xmap
f lag ·(CNN(x)⊙ClsE mbedding(cls)),
htext =∑
x,cls,
f lag ∈x
x
xtext
f lag ·(TextEncod er(x)⊙ClsE mbed ding(cls)),
where ⊙denotes an element-wise product. CNN is a three-
layer CNN model. For TextE ncoder, we used the E5 (embed-
dings from bidirectional encoder representations) model [26]
with pretrained parameters3.
All embedding vectors (hi,hp,hr,hd′,hmap,htext) are con-
catenated and input to a three-layer linear model. To extract
the influence of explanations, the model was trained to
predict not dbut the difference d−d′. In our initial trial,
UserModel always predicted d−d′to be nearly 0 due
to the distributional bias, so we added an auxiliary task
of predicting whether d=d′and trained the model for
predicting d−d′only when d=d′. The expected value of
duin Equation 3 is P(d=d′)·(d−d′) + d′.
πwas acquired with the deep deterministic policy gradi-
ent, a deep reinforcement learning method [27]. We simply
trained πto decide dto maximize assets on the basis of pfor
the training dataset. The reward for the policy is calculated
as the difference in total assets between the current day and
the previous day.
IV. EXP E RI MEN TS
A. Preliminary experiment
1) Procedure: We conducted a preliminary experiment
to investigate the performance of users who were provided
explanations with two naive strategies (ALL and ARGMAX).
ALL shows all the nine explanations available for each day,
and ARGMAX selects explanations for StockAI’s most prob-
able prediction. To ensure the quality of the explanations,
we also prepared two baselines: ONLY PRED shows pbut
does not provide any explanations. In PLAIN, participants
received no support from the IDSS and acted on their own.
For simulation, we chose a Japanese general trading com-
pany (code: 2768) from the test dataset on the basis of the
common stock price range (1,000 - 3,000 JPY) and its high
volatility compared with the other Nikkei225 companies.
Because we had anticipated that the accuracy of StockAI
would affect the result, we prepared two scenarios: high-
accuracy and low-accuracy. We calculated the moving aver-
age of the accuracy of StockAI with a window size of 60
and chose two sections for them. The accuracy of StockAI
3https://huggingface.co/intfloat/multilingual-e5-large
TABLE I: Sample sizes of preliminary experiment
ALL ARGMAX ONLY PRED PLAIN
High-accuracy 39 40 34 41
Low-accuracy 31 34 34 38
for high-accuracy was 0.750, which was the highest, and that
for low-accuracy was 0.333, the chance level of three-class
classification.
We recruited participants to join the simulation with com-
pensation of 220 JPY through Lancers4, a Japanese crowd-
sourcing platform, and got 336 participants. The participants
were first provided pertinent information, and 325 consented
to the participation. We gave them instructions on the task
and gave basic explanations about stock charts and the price
prediction AI. We instructed the participants to increase the
given three million JPY as much as possible by trading with
the IDSS’s support. To motivate them, we told them that
additional rewards would be given to the top performers.
We did not notify them of the amount of the additional
rewards and the number of participants who got them. We
asked six questions to check their comprehension of the task.
34 participants who failed to answer correctly were excluded
from the task. After familiarization with the trading simula-
tor, the participants traded for 60 virtual days successively.
242 participants completed the task (152 males, 88 females,
and 2 did not answer; aged 14-77, M=42.8,SD =10.1).
Table I gives details on the sample sizes.
2) Result: Figure 3 shows the changes in the partici-
pants’ performance. A conspicuous result is the underperfor-
mance of PLAIN, particularly in the high-accuracy scenario.
ONLY PRED performed well for high-accuracy, but could
not outperform PLAIN for low-accuracy. This suggests that
presenting palone contributes to improving performance
only when it has enough accuracy.
ALL and ARGMAX showed different results between the
scenarios. For high-accuracy, ARGMAX outperformed ALL.
ALL slightly underperformed the ONLY PRED baseline as
well. This suggests that ARGMAX explanations successfully
guided users to follow the prediction of StockAI while
ALL toned down the guidance, which worked negatively
in this scenario. On the other hand, ALL outperformed
ARGMAX and the baselines for low-accuracy. Interestingly,
ARGMAX also outperformed the baselines, which suggests
that explanations successfully provide users with insights
into situations and AI accuracy and can contribute to better
decision-making. ALL positively worked for low-accuracy
by providing multiple perspectives.
B. Experiment with X-Selector
1) Procedure: To evaluate X-Selector, we conducted a
simulation with its selected explanations. To train User-
Model, we used the data of the preliminary experiment and
additional data acquired in another experiment in which
explanations were randomly selected. We added the data
to broaden the variety of explanation combinations in the
4https://lancers.jp/
0 10 20 30 40 50 60
day
0.96
0.98
1.00
1.02
1.04
1.06
1.08
asset
condition
ALL
ARGMAX
ONLY_PRED
PLAIN
(a) High-accuracy
0 10 20 30 40 50 60
day
1.00
1.02
1.04
1.06
1.08
1.10
asset
condition
ALL
ARGMAX
ONLY_PRED
PLAIN
(b) Low-accuracy
Fig. 3: Comparisons of baseline total assets. Error bands
represent standard errors.
−0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8
corrcoef
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
density
condition
X-Selector
ALL
ARGMAX
(a) Distribution of correlation coefficient between
duand dAI for each user
0 10 20 30 40 50 60
day
0.98
1.00
1.02
1.04
1.06
1.08
asset
condition
X-Selector
ALL
ARGMAX
(b) User performance
Fig. 4: Results for high-accuracy scenario
dataset. The numbers of the additional participants were
54 and 45 for high- and low-accuracy, respectively. We
conducted a 4-fold cross validation for UserModel, and the
correlation coefficient between the model’s predictions and
the ground truths was 0.429 on average (SD =0.056).
We obtained a participation of 97 participants. Finally, 39
and 35 participants completed the task for high-accuracy and
low-accuracy, respectively (46 males, 26 females and 2 did
not answer; aged 23-64, M=39.6,SD =10.0).
To analyze the results, we compared the correlation coeffi-
cient between dAI and dufor each participant as a measure of
whether X-Selector could successfully guided users to dAI,
as well as the comparison of user performance that we did
in the preliminary experiment.
2) Result: Figure 4a shows the correlation coefficient
distribution between dAI and duin the high-accuracy con-
dition. The results for ALL and ARGMAX are also shown
for comparison. Notably, while the peaks for ALL and
ARGMAX are centered around zero, X-Selector shifted this
peak to the right, indicating a stronger correlation between du
and dAI for a greater number of participants. This means that
X-Selector effectively guides users to dAI without coercing
but presenting explanations selectively.
Figure 4b illustrates the user performance. X-Selector
generally outperformed ALL and ARGMAX, meaning that
X-Selector enabled users to trade better with selective ex-
planations. In more detail, X-Selector first underperformed
ARGMAX, but the score reversed on day 16. The gap once
narrowed near day 39, but it broadened again until the end.
A possible reason for X-Selector’s better performance is
that it can predict which combination of explanations guides
participants to sell or buy shares more. For example, the
stock price around day 16 dropped steeply, so the IDSS
needed to guide participants to reverse their position. Here,
whereas ARGMAX showed explanations for BEAR, X-
Selector showed explanations for NEUTRAL as well as
BEAR, which may have helped users sell their shares more.
Similarly, X-Selector also attempted to guide users to buy
a moderate amount when dAI was positive but not high by,
for example, showing only a saliency map for BULL and
no text explanations. Another reason is that X-Selector can
overcome the ambiguity in the interpretation of p.preflects
a momentum of stock price in the high-accuracy scenario and
must provide some insight for trading, but it was up to the
participants how to use this to actually decide their order.
πsometimes suggested that they buy shares even though
NEUTRAL or BEAR was the most likely in p. Thus, we
can say that pwas poorly calibrated, but by referring to π,
X-Selector can avoid misleading participants and instead lead
them to more promising decisions.
On the other hand, X-Selector underperformed ARGMAX
until day 16. The stock price was in an uptrend until day
14, and ARGMAX continuously presented explanations for
BULL for 12 days in a row, which may have strongly guided
participants to buy stocks and lead to large benefits. In our
implementation, UserModel considers only the explanations
of the day and does not consider the history of what
explanations were previously provided, which can be a next
target for future work.
X-Selector could not improve user performance in the
low-accuracy scenario (Figure 5b). Overall, the result was
similar to ARGMAX and underperformed ALL. We further
focused on the correlation coefficient between duand dAI.
Figure 5a shows that, contrary to the high-accuracy scenario,
X-Selector did not increase the score. The different results
between the high- and low-accuracy scenarios indicate the
possibility that participants actively assessed the reliability
of the AI and autonomously decided whether to follow X-
−0.4 −0.2 0.0 0.2 0.4 0.6
corrcoef
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
density
condition
X-Selector
ALL
ARGMAX
(a) Distribution of correlation coefficient
0 10 20 30 40 50 60
day
1.00
1.02
1.04
1.06
1.08
1.10
asset
condition
X-Selector
ALL
ARGMAX
(b) User performance
Fig. 5: Results for low-accuracy scenario
Selector’s guidance. This itself highlights a positive aspect of
introducing libertarian paternalism to human-AI interaction
in that users can potentially avoid AI failure depending on
its reliability. However, this did not result in improving their
performance in this scenario. The lack of correlation between
the score and the final asset amounts in the X-Selector
condition (r=0.048) suggests that merely disregarding the
AI’s guidance does not guarantee performance improvement.
A future direction for this problem can be developing a
mechanism to control the strength of AI guidance and
provide explanations in more neutral way depending on AI
accuracy.
V. CONCLUSION
This paper investigated the question of how IDSSs can
select explanations, and we proposed X-Selector, which is
a method for dynamically selecting which explanations to
provide along with an AI prediction. In X-Selector, User-
Model predicts the effect of presenting explanations on a
user decision for each possible combination to show. Then,
it selects the best combination that minimizes the difference
between the predicted user decision and the AI’s suggestion.
We applied X-Selector to a stock trading simulation with the
support of an XAI-based IDSS. The result indicated that X-
Selector can select explanations that guide users to suggested
decisions effectively and improve the performance when the
accuracy of the AI is high, and in addition, it revealed a new
challenge of X-Selector for low-accuracy cases.
REFERENCES
[1] G. Phillips-Wren, Intelligent Decision Support Systems, 02 2013, pp.
25–44.
[2] A. Adadi and M. Berrada, “Peeking inside the black-box: A survey
on explainable artificial intelligence (xai),” IEEE Access, vol. 6, pp.
52 138–52 160, 2018.
[3] M. H. Lee and C. J. Chew, “Understanding the effect of counterfactual
explanations on trust and reliance on ai for human-ai collaborative
clinical decision making,” Proc. ACM Hum.-Comput. Interact., vol. 7,
no. CSCW2, oct 2023.
[4] D. P. Panagoulias, E. Sarmas, V. Marinakis, M. Virvou, G. A.
Tsihrintzis, and H. Doukas, “Intelligent decision support for energy
management: A methodology for tailored explainability of artificial
intelligence analytics,” Electronics, vol. 12, no. 21, 2023.
[5] T. Miller, “Explanation in artificial intelligence: Insights from the
social sciences,” Artificial Intelligence, vol. 267, pp. 1–38, 2019.
[6] A. Maehigashi, Y. Fukuchi, and S. Yamada, “Modeling reliance on xai
indicating its purpose and attention,” in Proc. Annu. Meet. of CogSci,
vol. 45, 2023, pp. 1929–1936.
[7] ——, “Empirical investigation of how robot’s pointing gesture influ-
ences trust in and acceptance of heatmap-based xai,” in 2023 32nd
IEEE Intl. Conf. RO-MAN, 2023, pp. 2134–2139.
[8] A. N. Ferguson, M. Franklin, and D. Lagnado, “Explanations that
backfire: Explainable artificial intelligence can cause information
overload,” in Proc. Annu. Meet. of CogSci, vol. 44, no. 44, 2022.
[9] L.-V. Herm, “Impact of explainable ai on cognitive load: Insights from
an empirical study,” in 31st Euro. Conf. Info. Syst., 2023, 269.
[10] U. Ehsan, P. Tambwekar, L. Chan, B. Harrison, and M. O. Riedl,
“Automated rationale generation: A technique for explainable ai and
its effects on human perceptions,” in Proc. 24th Int. Conf. IUI, 2019,
p. 263–274.
[11] C. R. Sunstein, Why Nudge?: The Politics of Libertarian Paternalism.
Yale University Press, 2014.
[12] M. Kraus and S. Feuerriegel, “Decision support from financial dis-
closures with deep neural networks and transfer learning,” Decision
Support Systems, vol. 104, pp. 38–48, 2017.
[13] A. Chernov, M. Butakova, and A. Kostyukov, “Intelligent decision
support for power grids using deep learning on small datasets,” in
2020 2nd Intl. Conf. SUMMA, 2020, pp. 958–962.
[14] C.-Y. Hung, C.-H. Lin, T.-H. Lan, G.-S. Peng, and C.-C. Lee,
“Development of an intelligent decision support system for ischemic
stroke risk assessment in a population-based electronic health record
database,” PLOS ONE, vol. 14, no. 3, pp. 1–16, 03 2019.
[15] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learn-
ing deep features for discriminative localization,” in Proc. IEEE Conf.
CVPR, June 2016.
[16] S. Wiegreffe, J. Hessel, S. Swayamdipta, M. Riedl, and Y. Choi, “Re-
framing human-AI collaboration for generating free-text explanations,”
in Proc. of the 2022 Conf. of NAACL. ACL, Jul. 2022, pp. 632–658.
[17] R. Parasuraman and V. Riley, “Humans and automation: Use, misuse,
disuse, abuse,” Human Factors, vol. 39, no. 2, pp. 230–253, 1997.
[18] H. Vasconcelos, M. J ¨
orke, M. Grunde-McLaughlin, T. Gerstenberg,
M. S. Bernstein, and R. Krishna, “Explanations can reduce
overreliance on ai systems during decision-making,” roc. ACM
Hum.-Comput. Interact., vol. 7, no. CSCW1, apr 2023. [Online].
Available: https://doi.org/10.1145/3579605
[19] C. Panigutti, A. Beretta, F. Giannotti, and D. Pedreschi, “Understand-
ing the impact of explanations on advice-taking: A user study for
ai-based clinical decision support systems,” in Proc. of 2022 CHI.
ACM, 2022.
[20] Y. Fukuchi and S. Yamada, “Selectively providing reliance calibration
cues with reliance prediction,” in Proc. Annu. Meet. of CogSci, vol. 45,
2023, pp. 1579–1586.
[21] ——, “Dynamic selection of reliance calibration cues with ai reliance
model,” IEEE Access, vol. 11, pp. 138870–138 881, 2023.
[22] J.-F. Chen, W.-L. Chen, C.-P. Huang, S.-H. Huang, and A.-P. Chen,
“Financial time-series data analysis using deep convolutional neural
networks,” in 2016 7th Intl. Conf. on CCBD, 2016, pp. 87–92.
[23] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
image recognition,” in Proc. IEEE Conf. CVPR, 2016, pp. 770–778.
[24] H. Wang, Z. Wang, M. Du, F. Yang, Z. Zhang, S. Ding, P. Mardziel,
and X. Hu, “Score-cam: Score-weighted visual explanations for convo-
lutional neural networks,” in Proc. IEEE/CVF Conf. CVPR Workshops,
June 2020.
[25] OpenAI, “Gpt-4 technical report,” 2023.
[26] L. Wang, N. Yang, X. Huang, B. Jiao, L. Yang, D. Jiang, R. Majumder,
and F. Wei, “Text embeddings by weakly-supervised contrastive pre-
training,” 2022.
[27] S. Gu, T. Lillicrap, I. Sutskever, and S. Levine, “Continuous deep q-
learning with model-based acceleration,” in Proc. 33rd Intl. Conf. on
Machine Learning, vol. 48, 20–22 Jun 2016, pp. 2829–2838.