Conference PaperPDF Available

Explainable AI for tailored electricity consumption feedback - An experimental evaluation of visualizations

Authors:

Abstract and Figures

Machine learning (ML) methods can effectively analyse data, recognize patterns in them, and make high-quality predictions. Good predictions usually come along with "black-box" models that are unable to present the detected patterns in a human-readable way. Technical developments recently led to eXplainable Artificial Intelligence (XAI) techniques that aim to open such black-boxes and enable humans to gain new insights from detected patterns. We investigated the application of XAI in an area where specific insights can have a significant effect on consumer behaviour, namely electricity use. Knowing that specific feedback on individuals' electricity consumption triggers resource conservation, we created five visualizations with ML and XAI methods from electricity consumption time series for highly personalized feedback, considering existing domain-specific design knowledge. Our experimental evaluation with 152 participants showed that humans can assimilate the pattern displayed by XAI visualizations, but such visualizations should follow known visualization patterns to be well-understood by users.
Content may be subject to copyright.
Twenty-Ninth European Conference on Information Systems (ECIS 2021), [Marrakesh, Morocco|A Virtual AIS
Conference]. 1
EXPLAINABLE AI FOR TAILORED ELECTRICITY
CONSUMPTION FEEDBACK AN EXPERIMENTAL
EVALUATION OF VISUALIZATIONS
Research Paper
Jacqueline Wastensteiner, University of Bamberg, Bamberg, Germany,
jacqueline.wastensteiner@web.de
Tobias M. Weiss, University of Bamberg, Bamberg, Germany, tobias.m.weiss@gmx.de
Felix Haag, University of Bamberg, Bamberg, Germany, felix.haag@uni-bamberg.de
Konstantin Hopf, University of Bamberg, Bamberg, Germany, konstantin.hopf@uni-
bamberg.de
Abstract
Machine learning (ML) methods can effectively analyse data, recognize patterns in them, and make
high-quality predictions. Good predictions usually come along with “black-box” models that are unable
to present the detected patterns in a human-readable way. Technical developments recently led to
eXplainable Artificial Intelligence (XAI) techniques that aim to open such black-boxes and enable
humans to gain new insights from detected patterns. We investigated the application of XAI in an area
where specific insights can have a significant effect on consumer behaviour, namely electricity use.
Knowing that specific feedback on individuals’ electricity consumption triggers resource conservation,
we created five visualizations with ML and XAI methods from electricity consumption time series for
highly personalized feedback, considering existing domain-specific design knowledge. Our
experimental evaluation with 152 participants showed that humans can assimilate the pattern displayed
by XAI visualizations, but such visualizations should follow known visualization patterns to be well-
understood by users.
Keywords: eXplainable Artificial Intelligence (XAI), Visualizations, Energy conservation, Machine
learning, Feedback.
1 Motivation
Many outstanding applications of machine learning (ML)—a core technology of artificial intelligence
(AI)documented in the literature focus on their superiority in making predictions about unseen data
or future events. Cancer detection from radiological images (McKinney et al., 2020) and fraud detection
(Abbasi et al., 2012) are often cited examples for tasks in which ML reaches human levels or partly
outperforms humans. Such applications relate to the use of AI to automate tasks in a wide range of
industries (Coombs et al., 2020). We owe the performance of these AI applications to ML models that
are becoming increasingly complex and difficult for humans to understand. Such ML models are often
black-boxes, which come at the price of low interpretability (Dourish, 2016; Faraj et al., 2018). The
opposite of these models are transparent ones, having lower capabilities to generalize from data (Barredo
Arrieta et al., 2020). Motivated by this tension, a recent field of research in the area of ML has put forth
eXplainable AI (XAI) approaches that make complex black-box models interpretable to humans,
without lowering their predictive power (Miller, 2019). These approaches are promising, not only for
applications where it is necessary to make algorithmic judgement interpretable to humans (e.g., for legal
or ethical decisions), but also for applications where AI is employed to provide more insights to
humans—uncover patterns in data, not only making predictionsenabling more informed human
XAI for electricity consumption feedback
Twenty-Ninth European Conference on Information Systems (ECIS 2021), Marrakesh, Morocco. 2
decisions. Therefore, XAI supports the use of AI to augment work (Grønsund and Aanestad, 2020;
Raisch and Krakowski, 2020) instead of replacing humans (Frey and Osborne, 2017).
Current XAI approaches, however, are criticized for (i) focusing too heavily on technical aspects or data
perspectives of developers and (ii) including few aspects of social sciences and human-computer
interaction (Abdul et al., 2018; Miller, 2019). Existing XAI studies have also often focused on content,
less on the interface design of explanations (Cheng et al., 2019). Literature also points to a lack of XAI
user studies (Adadi and Berrada, 2018; Nourani et al., 2019). Similarly, recent calls from information
systems research motivate empirical studies on the application of AI in organizations, not only to
automate but to augment human labour (Coombs et al., 2020; Lyytinen et al., 2020; Rai et al., 2019).
Time series are common data structures that ML and XAI methods can process. Time is an important
dimension of data analysis and time series data becomes more present, as digitization increases the
proliferation of sensors and smart devices, which capture more data with timestamps. An area that could
particularly benefit from uncovering and visualizing hidden patterns in time series data is residential
energy consumption. Behavioural research demonstrates that specific feedback on consumers’ energy
consumption—tailored to individuals—leads to sustainable behaviour and, thus, can trigger significant
energy savings (Brülisauer et al., 2020; Tiefenbeck et al., 2016). Deployed on a large scale, behavioural
feedback interventions can play an essential role in lowering the energy demand, thus reducing the
human carbon footprint. Although increasing amounts of data is available in the residential energy
context (e.g., because of smart metering infrastructures), helpful behavioural recommendations are hard
to extract from the data on a large scale. Advanced data processing and modelling techniques are
therefore warranted to make undesired human behaviour salient, and to guide people towards better
action. We believe that XAI can be of reasonable help in this regard and selected this case study for our
research project. The context of this study is also well suited to put forth XAI visualisations, because
plenty of time series data is available that contains complex patterns, which may be not easily to
recognize by humans but for ML. Thus, we examine the following research question:
How well do XAI visualizations of electricity consumption time series data, created based
on design knowledge from XAI and feedback research, perform in terms of
comprehensibility of humans and user preferences?
We created five XAI visualizations based on the current technological state as well as design knowledge
from the XAI and feedback literature. In a user experiment with 152 participants, we evaluated these
visualizations in isolation, using reading and memorization tasks, and in comparison, using a conjoint
experiment. Our results show that XAI can provide insights into electricity consumption time series data
that can be assimilated by humans. We also found that standard XAI visualizations should be adjusted
to foster comprehensibility by humans. These results underline the need for further investigating XAI-
based human-AI interfaces and tailored consumption feedback, as we outline in our discussion.
This paper proceeds with a review of the recent literature around XAI and automated feedback on
residential electricity use. Thereafter, we describe our research approach, our case selection, and the
design and implementation of XAI visualizations. Section 5 describes our experimental evaluation and
our findings. We finish this paper with a discussion and formulate implications for future research.
2 Related work
The discourse on XAI technology takes place primarily in the field of computer science, where it has
led to advances in the technological basis. Nevertheless, it would benefit from social science research
(Miller, 2019) and business perspectives (Satell and Sutton, 2019). This is a type of contribution that
lies at the core of the information systems research tradition, because this field purses a sociotechnical
perspective (Sarker et al., 2019). Lyyttinen et al. (2020) and Rai et al. (2019) underline the need for such
research to better understand the successful integration of AI in workplaces.
XAI for electricity consumption feedback
Twenty-Ninth European Conference on Information Systems (ECIS 2021), Marrakesh, Morocco. 3
2.1 XAI in information systems research
So far, information systems research has conceptualized the possibilities of XAI to enable personalized
explanations of ML models (Schneider and Handali, 2019) and the compliance with recommendations
that stem from AI (Kühl et al., 2019). Wanner et al. (2020) provide a literature review and outline a plan
for a user study to investigate the willingness of users to dispense the accuracy of model prediction in
favour of better explanations. Our work adds to this (so far conceptual) research an empirical
investigation on the application of AI in the context of energy feedback. Thereby, we draw on the
literature from XAI and feedback on energy consumption. We briefly summarize both areas below.
2.2 XAI technology
XAI is a very active field of research and technological development. This becomes visible in several
comprehensive literature review articles on that topic, which provide taxonomies of current XAI
approaches (e.g., Adadi and Berrada, 2018; Anjomshoae et al., 2019; Barredo Arrieta et al., 2020).
Explanatory methods can be classified according to several criteria, namely their compatibility (model-
specific vs. model-agnostic), the degree of interpretability (local vs. global), and whether ML models
are directly interpretable (intrinsic) or require methods that analyse ML models after training (post-hoc).
We concentrate on model-agnostic methods, which are mostly applied post-hoc and are pluggable on
any ML model, which makes them independent of a particular class of ML algorithms. Within these
group of model-agnostic methods, we focus on feature attribution methods that estimate the impact of
features on predictions on a local level (i.e., the impact of each feature on each individual predicted
instance). In the case of time-series prediction models, this capability allows to estimate the contribution
of individual time periods for a given outcome. In this category, we focus on two methods that recent
works (Slack et al., 2020) perceive as very relevant:
- Deep Shapley Additive exPlanations (SHAP) was introduced by Lundberg and Lee (2017) and
uses concepts from game theory for the predictor variable importance estimation.
- Local Interpretable Model-Agnostic Explanations (LIME) introduces variations in the dataset
(i.e., perturbation) and estimates how these affect the predictions of the black-box model by
using a human-interpretable model, like linear regression (Ribeiro et al., 2016).
Research has carried out comparisons of these methods (Alvarez Melis and Jaakkola, 2018; Schlegel et
al., 2019) using datasets from several domains. We have built upon these procedures and evaluated both
XAI methods in our study.
Literature from the area of human computer interaction points to the importance of user studies to
evaluate XAI visualizations (Abdul et al., 2018) and suggests to assess these visualizations with
respective metrics on the comprehensibility and user preferences. Current XAI visualizations are, for
example, criticized for being complex, target primarily ML-experts, and neglect a user perspective that
would foster the understanding of the visualizations (Abdul et al., 2020; Kaur et al., 2020). Mohseni et
al. (2020) develop a framework with design guidelines and evaluation methods to support the iterative
design and evaluation loop of XAI visualizations.
2.3 Automated consumer feedback on electricity consumption
Feedback has received much attention in research and practice, because such behavioural intervention
helps humans to overcome biases in their decision-making, thus it has the power to change human
behaviour for the good (Allcott and Mullainathan, 2010). Feedback can lead to pro-ecologic behaviour
(Klöckner, 2013) and can reduce energy use in the residential sector (Fischer, 2008; Karjalainen, 2011;
Lu et al., 2016; Weiss et al., 2016) at comparable low cost (Benartzi et al., 2017). Behavioural research
demonstrates that specific feedback to consumers, tailored to individuals, can lead to significant energy
savings (Brülisauer et al., 2020; Tiefenbeck et al., 2016).
A major obstacle in realizing such tailored feedback in practice lies in missing data when generating
personalized messages or visualizations on scale (Hopf, 2019, p. 147; Tiefenbeck, 2017). Collecting
such data in surveys or with energy audits is costly. To overcome this problem, research has analysed
XAI for electricity consumption feedback
Twenty-Ninth European Conference on Information Systems (ECIS 2021), Marrakesh, Morocco. 4
electricity consumption time series data of private households with ML to extract the necessary
information. One literature branchthe topic of non-intrusive appliances load monitoringanalyses
consumption data of high frequencies (usually more than one measurement per minute) with the goal to
detect single appliances (Hart, 1992; Zeifman and Roth, 2011), but such fine-grained data is usually not
available in many households. Another branch of literature develops ML methods to detect more general
household characteristics of residential households (Albert and Rajagopal, 2013; Beckel et al., 2014;
Hopf, 2019; Hopf et al., 2018; Weigert et al., 2020). These approaches provide more viable aid to carry
out feedback campaigns to many energy consumers. Results show, for example, that households with a
single occupant can be identified with up to 81% accuracy, the type of the cooking facility with up to
87%, and certain heating systems with up to 85%. Models with high predictive performance in these
works belong to the category of black-box models. Although extracted information about household
characteristics is helpful to make feedback more specific, energy experts still must formulate energy
saving recommendations based on predicted data. XAI has a high potential to overcome this drawback.
3 Research approach
Our study developed and evaluated XAI visualizations. Based on these artefacts, our objective was to
generalize experiences that contribute to the current debate on how to create effective XAI
visualizations. Our research approach followed the guidelines of design science research in information
systems (Hevner et al., 2004; Peffers et al., 2007). More precisely, we took up the Ivari’s (2015) second
strategy to conduct design science research, that solves a specific problem (i.e., tailored feedback based
on XAI) by building concrete IT artefacts in a specific context. From that we distil knowledge to address
a class of problem (i.e., human-understandable visualizations of patterns in time-series data).
Our design and evaluation efforts draw on two research areas, each of which brings substantial literature:
We combined a technical perspective (i.e., XAI) and a domain perspective (i.e., feedback on electricity
consumption) while pursuing our research, as we illustrate in Figure 1. We describe the first step (case
selection and problem definition) and the second step (requirement elicitation and definition of design
features) in this section, the technical implementation and their experimental evaluation in the following.
Figure 1. Research approach.
3.1 Case selection and problem definition
ML applications require a sufficient amount of training data that consists ofin the case of predictive
models—several predictor variables and ground truth data on the variable that should be predicted.
Earlier works that analysed electricity consumption time series data (15-min or 30-min smart meter data)
with ML for predicting household characteristics to support consumption feedback used datasets from
North America (Albert and Rajagopal, 2013), Ireland (Beckel et al., 2014; Wang et al., 2018) and
Switzerland (Hopf, 2019; Hopf et al., 2018). The largest dataset, which is also publicly available, stems
from a smart meter trial from the Commission for Energy Regulation (2011) in Ireland and covers 30-
minute smart meter electricity consumption data on 76 weeks (July 2009 – December 2010) and survey
XAI for electricity consumption feedback
Twenty-Ninth European Conference on Information Systems (ECIS 2021), Marrakesh, Morocco. 5
data for 4,232 households. The dataset also contains information on household characteristics (“ground
truth data”). We selected this dataset for our study because it was the largest available dataset.
We reviewed the survey data and selected those variables on household characteristics that (i) are related
to energy-intense activities, (ii) could potentially help to develop XAI electricity consumption feedback,
and (iii) could be detected with a comparably good predictive performance in earlier household
characteristics prediction studies. We thus selected: Electric cooking (yes, no), Presence at home during
typical days (yes, no), and Electric water heating (yes, no). For each of the household characteristics,
we trained ML models that predicted the respective variable. We applied XAI to visualize the times of
electricity use that the ML algorithm detected as relevant, to generate informative visualizations for
electricity consumption feedback. Details on the implementations and performance results follow in
Section 4.
3.2 Design of XAI visualizations for feedback on electricity consumption
We conducted a comprehensive literature review in which we identified 8 requirement categories for
XAI visualizations and 17 requirement categories for electricity consumption feedback (details on this
review and the detailed list of design requirements and features are listed in the Appendix). Based on
this design knowledge, we developed five basic XAI visualizations. Each visualization can fulfil the
design requirements to a certain degree. Figure 2 shows an example of each type of visualization.
(a) SHAP diagram
(b) Bar diagram
(c) Line diagram
(d) Polar diagram
(e) Generated text
Figure 2. XAI visualizations evaluated in our experiment.
The first diagram is a standard visualization of the SHAP approach. We included this to represent a
state-of-the-art visualization of XAI. Then, we adopted four illustrations that follow recommendations
of the energy feedback literature. A line diagram and a bar diagram, which are the most frequent
visualizations of electricity consumption feedback (Herrmann et al., 2018). Both tie in with the natural
analogy of taking electricity consumption data from left to right using a timeline. We also considered a
polar diagram that links to a clock analogy where 24 hours of consumption data are displayed in a circle.
Although users seem to perceive the line and bar diagram more positively and understand them better
than the polar diagram (Flora and Banerjee, 2014), we wanted to evaluate to what extent the additional
XAI for electricity consumption feedback
Twenty-Ninth European Conference on Information Systems (ECIS 2021), Marrakesh, Morocco. 6
information from XAI on a clock analogy is understood by users. All diagrams contained a highlight in
blue that indicated the period which was particularly relevant for the ML model decision to classify the
household as the respective class (e.g., electric cooking). Finally, we considered a basic text description
of the most relevant information from the ML models as a form of non-visualization.
4 Technical implementation
Our technical implementation1 that generated the electricity feedback artefacts consisted of two steps,
as Figure 3 illustrates. In Step A, we created a ML prediction model that was trained to predict an energy
consumption related variable for each household. We are not primarily interested in the predictions of
this model, rather in the patterns that this model detects in the electricity consumption data. The analyses
we did were to verify that our implementation follows the current state-of-the-art in ML modelling. Step
B then applied XAI methods to extract and visualize times of electricity use that the ML model found
relevant. We compared the two XAI methods and selected the most suitable one. The description of our
technical implementation focuses on essential aspects to understand the generated feedback element
artefacts due to the focus of this paper and the limited space available.
Figure 3. Overview technical implementation and the comparisons of the ML algorithms (A)
and the XAI methods (B).
4.1 ML model implementation and comparison
We considered three ML algorithms for the time series classification task. First, Random Forest
(Breiman, 2001), an ensemble learner that combines multiple uncorrelated decision trees to obtain a
well performing prediction in many real-world applications (Fernández-Delgado et al., 2014). Second,
convolutional neural network (CNN), an approach from the field of deep learning. Previous studies
found that CNN and Random Forest could detect household characteristics from electricity consumption
smart meter data with good performance (Hopf, 2019; Wang et al., 2018), Third, the InceptionTime
classifier (Ismail Fawaz et al., 2020, 2019), which combines an ensemble of five CNNs in that it
parallelizes the convolutional layers. Ismail Fawaz et al. (2019) demonstrate that their approach achieves
higher stability and prediction accuracy on time series data than other state-of-the-art classifiers. As an
input, CNN and InceptionTime took each week of electricity consumption time series data together with
labels for the respective household. Both algorithms can directly process image representations of time
series data. For Random Forest, we follow earlier studies and extracted 93 predictor variables from the
time series to reduce the dimensionality (Beckel et al., 2014; Hopf et al., 2018).
We compared the three ML algorithms regarding their predictive performance for the three selected
dependent variables and list the results together with statistics on the original data in Table 1. As
performance metrics, we used accuracy (ACC), which is the percentage of correctly classified
observations in the test sample, and the area under the receiver operating characteristic curve (AUC).
Both metrics are well-known for ML model evaluation (Hastie et al., 2009). Whereas ACC is easy to
1 The source code of our implementation is available at https://gitlab.rz.uni-bamberg.de/eesys-public/household-classification-
explainable-ai for further use.
XAI for electricity consumption feedback
Twenty-Ninth European Conference on Information Systems (ECIS 2021), Marrakesh, Morocco. 7
interpret, its values are biased by the class distributions. Therefore, ACC results of different variables
cannot be compared. AUC can be used as an unbiased estimate of the predictive performance (Fawcett,
2006). For the performance evaluation, we follow good practices of ML evaluation and apply 10-fold
cross-validation (Hastie et al., 2009) with a random allocation of the samples to the ten folds.
Variable
Sample size
Relative Freq.
positive class
InceptionTime
CNN
Households
Num. weeks
ACC
AUC
ACC
AUC
ACC
AUC
Electric cooking
4,232
114,455
69.91%
0.72
0.67
0.72
0.61
0.70
0.69
Presence at home during the day
1,310
27,949
56.95%
0.78
0.73
0.76
0.66
0.77
0.74
Electric water heating
4,232
138,044
80.63%
0.63
0.62
0.60
0.59
0.62
0.63
Table 1. Statistics and predictive performance of both ML algorithms for the three considered
dependent variables.
We took a conservative modelling approach and changed the standard parameters of the algorithms with
only a few variations to avoid bad configuration at chance2. More extensive optimization of hyper-
parameters can certainly improve our results. Thus, the performance results lie within those of earlier
studies, which used the same data set for the predictions of the cooking facility and achieved ACC
between 0.69 and 0.71 with different non-deep-learning ML algorithms (Beckel et al., 2014), and
between 0.739 and 0.766 (Wang et al., 2018), using CNN-based approaches. Wang et al. (2018), for
example, used hyper-parameter tuning to optimize their performance.
4.2 Implementation and comparison of the XAI methods
To extract human-comprehensible visualizations from InceptionTime (the best performing prediction
model in our analysis), we applied the XAI approaches SHAP and LIME. Both methods estimate the
importance of certain predictor variables on the level of individual observations. In our case, each
approach estimated which time span was particularly (ir)relevant to classify a household as electric
cooking or not electric cooking. We compare both methods according to their faithfulness and stability,
as suggested by Alvares Melis and Jaakkola (2018) and describe the evaluation procedures below.
Faithfulness: “Interpretability methods should … generate meaningful explanations … [even in the case
of] local perturbations of the input … adding minimal [amount of] noise to the input” (Alvarez Melis
and Jaakkola, 2018, p. 7). To operationalize this criterion, we adopted Schlegel et al.’s (2019) approach
and modified the time series input data, by blurring values of predictor variables that were identified by
the XAI methods to be most relevant for the model3. When the predictor variables are truly relevant for
the prediction, the outcome should change considerably with such a data modification. We measured
the relative amount of prediction changes after having modified 50 randomly chosen households (see
Table 2).
Stability: This approach measured the ability of an XAI method to determine similar predictor variables
for similar classifications. In doing so, we exploited the property of the time series data and the presence
of daily routines of humans like, e.g., cooking during the same times of the day. We measured how often
the ML model considered the same time of the day on different days as important for the model. We
computed this frequency using a random selection of 50 households and weeks (see Table 2).
Based on the empirical analyses, we finally selected the variable electric cooking with the InceptionTime
predictor and the SHAP explainer for our further study. The reasons were that these models showed a
2 Our instance of InceptionTime used the parameters: Max. kernel size: 40, Depth: 6, Num. kernels: 32, Batch size: 64, Use
Bottleneck: true, Use residual: true. For numeric parameters, we tested three alternatives, for the binary parameters both values.
We selected the best performing setting based on AUC on a 20% sample of the data. Calculations for InceptionTime ran on
Python Keras 2.2.4. For Random Forest, ntree=100 was used. The computations ran on Python using scikit-learn 0.19.1.
3 The replacement was the deviation of the consumption measurement from the average consumption of the household in the
opposite direction, as a zero consumption or negative consumption could be recognized by the ML algorithm as a special case.
XAI for electricity consumption feedback
Twenty-Ninth European Conference on Information Systems (ECIS 2021), Marrakesh, Morocco. 8
comparable high predictive performance, and that the variable provided the most reasonable feedback
(advice to energy users based on electric cooking allowed for more actionable insights than based on
other variables). Furthermore, the stability and faithfulness for this variable were comparably high.
Faithfulness
(relative number of predicted changes)
Stability
(relative number of non-unique time stamps)
SHAP
LIME
SHAP
LIME
Presence
0.24
0.22
0.32
0.28
Water heating
0.38
0.26
0.35
0.31
Cooking
0.3
0.24
0.42
0.35
Table 2. Comparison of the two XAI methods LIME and SHAP for the three selected household
characteristics regarding the criteria faithfulness and stability.
5 Experimental evaluation of the five visualizations
We conducted an experimental evaluation of the five obtained visualizations. The experiment was
carried out as an online survey and had two phases: The first phase focused on the isolated evaluation
of each visualization. We collected subjective (self-reported) and behavioural measures to evaluate the
visualizations. In the second phase of the experiment, we used a choice-based conjoint to measure user
preferences on the visualizations. Before the experiments started, we asked for sociodemographic
variables. In total, the online experiment took 17:16 minutes on average (13:30 minutes standard
deviation).
5.1 Sample description
We promoted the survey among students of our institution and used several online channels to attract
participants outside of the university context. Our sample is balanced regarding the gender (51.32%
female, 48.68% male, 0% diverse / not given), but it has a bias towards younger participants with higher
education (82.9% are younger than 35; in the German population, only 36.7% are in this age category),
likely because many participants were students from the university. However, the share of participants
which were employed (not marginally employed) is 44.1% which is similar to the share of employed
citizen and civil servants in Germany, which is 45.0% (DESTATIS, 2020, p. 39). Participants lived more
frequently in rented homes (69.1%) than the population (48.9%), according to Eurostat (2020), but the
number of people living in the households was similar to the German average (r(3) = .96, p = .011).
5.2 First phase: Isolated evaluation of the visualizations
The first phase of the experiment evaluated the comprehensibility of the five electricity feedback
visualizations (see Figure 2). We first describe the experimental setup and then analyse the results.
5.2.1 Experimental setup
We carried out four reading and memory tasks with the participants, each time with one randomly
selected visualization out of the five that we generated. Each participant saw each visualization only
once. Reading and memory tasks are common for evaluating XAI visualizations (Abdul et al., 2020).
We instructed participants to study (and memorize) the energy feedback illustration and informed them
that the illustration would not be shown when answering subsequent questions. In total, we collected
eight variables in the first part of the experiment (see Table 3).
The memorization task measured their objective understanding (Abdul et al., 2020; Cheng et al., 2019).
For that, we asked them to rate three statements regarding the visualization as correct or incorrect. The
statements had comparable length (Yan and Tourangeau, 2008) and were randomly selected from three
preformulated sets of statements to avoid memory effects in the series of tasks. Each of the three
statement sets consisted of two correct and two incorrect statements. The sets covered the topics (1)
electricity consumption at specific times, (2) the prediction made by machine learning, and (3) the model
explanation. Participants could also select an “I don’t know” alternative for each statement. After the
XAI for electricity consumption feedback
Twenty-Ninth European Conference on Information Systems (ECIS 2021), Marrakesh, Morocco. 9
memorization task, participants indicated their mental effort for completing the task (Paas et al., 2008)
on a seven point Likert scale. Finally, they indicated their subjective understanding of the visualization
(Cheng et al., 2019) in terms of a German school grade from “1” (best) to “6” (insufficient). All used
survey instruments can be requested from the authors. In addition to the self-reported data, we measured
the reading and completion time of the tasks to collect objective behavioural data. In the online system,
backward navigation was disabled, i.e., after the participants have seen the visualization and accessed
the page with the follow-up questions, they could no longer see the visualization. In this way we ensured
that participants had to answer the questions from their memory.
5.2.2 Statistical analysis and results
We analysed the results of the first experiment with an ordinary least squares linear regression. For each
evaluation metric that we collected during the first phase of our experiment, we estimated one model
(see results in Table 4). The models follow the specification
= + +  +
 + . Y was the dependent variable (see Table 3, variables 1-6) that we collected in one of
the four memorization experiments that each participant completed.  is a categorical variable
related to the visualization that was displayed to the participant. We used a dummy-encoding to represent
Variable
Description
Values
Mean (Std. dev.)
or frequency
1
ReadingTime
The time (in seconds) each participant spent on reading the visualization.
We compute the natural logarithm from the measurements to reduce the
positive skew of the empirical distribution.
3.47 (0.69)
2
AnswerTime
The time (in seconds) each participant spent on answering the questions
for the visualization. We also computed the natural logarithm.
3.13 (0.63)
3
MemTaskRight
Number of correct answers in the memorization task.
[0,3] 
2.05 (0.89)
4
MemTaskDontKnow
Number of “I don’t know” answers in the memorization task.
[0,3] 
5
MentalEffort
Self-reported mental effort during completing the recall experiments, on
a seven-point Likert scale.
[1,7] 
3.67 (1.38)
6
SchoolGrade
Self-reported school grade participants estimated on their result on the
recall tasks.
[1,6] 
3.91 (1.19)
7
Age
The age reported by survey participants.
30.2 (10.6)
8
Education
This binary variable state whether the study participant gained a general
qualification for university entrance in Germany or a lower education
degree.
high school
diploma
0.875 (n=133)
other
0.125 (n=19)
Table 3. Overview of variables raised in the experiment and used in the statistical analysis.
ReadingTime
AnswerTime
MemTaskRight
MemTaskDontKnow
MentalEffort
SchoolGrade
(Intercept)
3.51
***
2.80
***
1.72
***
0.60
***
3.31
***
3.37
***
(0.15)
(0.14)
(0.18)
(0.16)
(0.26)
(0.25)
VisualLine
0.07
0.08
0.34
**
-0.26
*
0.44
*
-0.37
*
(0.09)
(0.08)
(0.12)
(0.10)
(0.17)
(0.15)
VisualBar
-0.02
0.08
0.20
-0.13
0.32
-0.20
(0.09)
(0.08)
(0.12)
(0.11)
(0.18)
(0.16)
VisualPolar
0.18
0.11
0.06
-0.05
0.29
-0.04
(0.10)
(0.09)
(0.13)
(0.12)
(0.19)
(0.17)
VisualText
-0.14
0.15
*
0.19
-0.25
*
0.54
***
-0.36
*
(0.08)
(0.07)
(0.11)
(0.10)
(0.16)
(0.14)
Age
0.01
*
0.02
***
0.01
*
-0.00
0.00
0.00
(0.00)
(0.00)
(0.00)
(0.00)
(0.01)
(0.00)
EduHIGH
-0.27
**
-0.26
**
-0.05
0.05
0.00
-0.21
(0.10)
(0.09)
(0.11)
(0.10)
(0.15)
(0.15)
R^2
0.06
0.11
0.03
0.02
0.02
0.02
Adj. R^2
0.05
0.10
0.02
0.01
0.01
0.01
Num. obs.
608
608
608
608
608
608
Asterisks indicate statistical significance (*** < 0.001; ** < 0.01; * < 0.05), standard errors are in parentheses
Table 4. Statistical evaluation of the first experimental phase.
XAI for electricity consumption feedback
Twenty-Ninth European Conference on Information Systems (ECIS 2021), Marrakesh, Morocco. 10
the five different visualizations and choose SHAP as the reference level, given that it is the state-of-the-
art visualization from the chosen XAI approach.  is a numeric variable of the participants’ age in
years and  is a dummy variable with the value 1 for high school diploma and 0 for a lower degree.
We used robust standard errors (White, 1980; Zeileis, 2004) and checked for homoscedasticity of the
errors .
In general, the reading and answer times of the different visualizations display little difference.
Considering the other metrics, the SHAP illustration does not perform well. All other visualizations lead
to higher task performance (number of right answers and number of responses with “I don’t know”).
For the line chart and the text display, the differences are statistically significant. This performance is
confirmed by the subjective ratings with the school grade (lower numbers means better results).
Interestingly, the mental effort for the SHAP illustration was reported lower than for the others, likely
because of more “I don’t know” answers.
5.3 Second phase: Choice-based conjoint
The conjoint experiment allowed us to estimate user preferences regarding the visualizations. The
method originates from marketing research and is increasingly used in information systems research,
particularly to evaluate the design of information systems, as Naous & Legner (2017) found in their
literature review. We follow Naous & Legner’s (2017) framework of conducting conjoint experiments
in that we conducted a choice-based conjoint (CBC) in which the study participants had to choose
between two alternative feedback elements.
5.3.1 Experimental setup
Our main interest in this experiment phase was to find out which visualization the study participants
preferred. Following recommendations for conducting conjoint experiments (Backhaus et al., 2015;
Naous and Legner, 2017), we tried to make the choice options more realistic and at the same time
implement further design requirements in the field of energy feedback. Specifically, we varied the
visualizations with an additional explanatory text, energy saving tips and a chatbot frame. The energy
saving tip was included because earlier literature from energy feedback underlined the relevance of such
feedback devices. We considered two variants of tips (Vasseur et al., 2019): A curtailment tip (CMT)
that suggests thinking about a repetitive, habitual change to reduce its electricity consumption, and an
efficiency tip (ET) that recommends lowering the household’s electricity demand by making a one-time
investment. In total, the variations of the presented choices varied in four stimuli (visualization types,
existence of explanatory text, type of energy saving tip, and chatbot). We used a full profile approach
in which all possible combinations of the stimuli (5222 = 40 variants in total) were considered.
From all possible combinations, five choice sets were created for each study participant. Each choice
set contained two randomly drawn variants together with a non-option as a third choice. The none-option
makes the choice experiment more realistic (Vermeulen et al., 2008), because forced choice situations
are avoided (Backhaus et al., 2015, p. 181).
5.3.2 Statistical analysis and results
To evaluate the conjoint experiment, we estimate a logistic regression with maximum-likelihood method
to model the choices. The model has the specification (
= 1)=1 + exp
 with the linear
predictor
= +++ +++.
is the dependent variable that indicates whether an option was selected.  is a categorical
variable with the visualization that was displayed to the participant. The existence of an explainable text
was a separate characteristic and represented with the variable . Further characteristics are if the
visualization was embedded in a chatbot environment () and the type of energy saving tip was
displayed (). As usual in conjoint analyses, model the stimuli variables with effect-encoding. Only
the variable () that specifies the no-option is encoded as a dummy (Vermeulen et al., 2008).
Table 5 shows the estimated model (the column Estimate contains the log odds) details together with
the odds ratios. Users preferred the Line and the Bar visualization, given that both have an odds ratio of
XAI for electricity consumption feedback
Twenty-Ninth European Conference on Information Systems (ECIS 2021), Marrakesh, Morocco. 11
2.64 and 2.7 respectively, which means that the chance of selection for these visualizations is 2.64 (2.7)
times more likely than for the others. The SHAP illustration, which is the response category in the
regression analysis, must be computed by summarizing all other estimates, has only an odds ratio of
0.29, so participants strongly prefer the line or bar visualization instead of the SHAP illustration.
Estimate
Std. Error
z-value
p-value
Odds ratio
VisualNo
-0.42
(0.11)
-3.862
<0.001
***
0.66
VisualLine
0.97
(0.12)
8.242
<0.001
***
2.64
VisualBar
0.99
(0.11)
8.772
<0.001
***
2.70
VisualPolar
-0.32
(0.11)
-2.943
0.003
**
0.73
VisualSHAP
0.29
Text_No
-0.10
(0.06)
-1.752
0.080
0.91
Chatbot_No
0.03
(0.06)
0.550
0.582
1.03
Tip_CM
0.18
(0.06)
3.179
0.015
**
1.20
None
-1.66
(0.10)
-16.630
<0.001
***
0.19
AIC
2498.82
Asterisks indicate statistical significance
(*** < 0.001; ** < 0.01; * < 0.05
),
standard errors are in parentheses
BIC
2544.56
Log Likelihood
-1241.41
Deviance
2482.82
Num. obs.
2247
Table 5. Logistic regression results of the conjoint analysis.
6 Discussion and Research Implications
Our experimental evaluation led to two findings that we outline in Table 6. This section discusses them,
names limitations, and formulates the implications as well as future research needs for the field of
electricity consumption feedback and human-AI interfaces.
Finding
Implication for consumption
feedback
Implication for human-AI interfaces
XAI technology can be used to
develop tailored electricity
consumption feedback for end-users
- New class of feedback elements
based on XAI that display novel
patterns in the data
- Feedback can be more tailored to
individuals
- XAI can be a support to realize
augmented reality, where humans
are supported by ML
- Novel visualizations highlight
patterns in time series data
The SHAP visualization has not
performed well in comparison to
others (especially the line diagram)
- Integrate XAI elements into
existing feedback elements
- (Re-)align the design of human-AI
interfaces with known standards
(e.g., time series visualization)
Table 6. Overview to findings and implications from our study.
6.1 Summary of the major findings
Our experiment provides two important findings: First, our study demonstrated that XAI technology can
help to develop tailored electricity consumption feedback. Our experiments showed that users can
assimilate novel insights from time series data with them. The artifacts created in this study realize many
requirements of XAI or feedback visualizations from earlier research. Our study further demonstrated
that XAI-based electricity consumption feedback can constitute a new class of feedback, which can also
be transferred to other domains (e.g., heating, anticipatory driving). Second, the SHAP diagram, a state-
or-the-art visualisation in XAI, did not perform well compared to the other tested visualizations. The
line visualization, in particular, performed better in both phases of the experiment. We suppose that this
is due to two reasons: a) Given the natural analogy that depicts time series data on a timeline from left
to right, this illustration might be easier to comprehend by humans; b) it leverages already known
elements can help non-expert users (i.e., without prior domain-knowledge) to make sense of unfamiliar
visualizations (Lee et al., 2016). The text, generated by XAI, had a better comprehensibility by
participants but was less preferred in the conjoint experiment. From the second finding we conclude that
results of XAI should be integrated into visualizations that follow known standards to foster receptivity
by humans.
XAI for electricity consumption feedback
Twenty-Ninth European Conference on Information Systems (ECIS 2021), Marrakesh, Morocco. 12
6.2 Limitations
Our study is one of the early investigations of XAI applications in information systems research and, to
the best of our knowledge, the first application of state-of-the-art XAI technology in the area of
residential electricity consumption to develop tailored feedback for consumers. Given this novelty and
the broad scope of the study, we identified four limitations. First, a common problem in XAI evaluation
is that ground truth for the explanations—obtained by MLis missing. Gathering such data would be
expensive, but this would significantly help to improve the approaches. Second, we could not clearly
identify whether the variance in performance of the visualizations results from the visualization itself or
if it results from the fact that detected pattern are not fully clear to the user. We had to make
simplifications in our experiment, for example, we could not capture all potential user preferences (e.g.,
colour preferences, aesthetic design), and—due to the already long online surveywe have not
controlled for graph literacy, which is recommended by Abdul et al. (2020). In addition, we did not
control for energy literacy, while prior knowledge may have an impact on how users make sense of the
presented visualizations (Herrmann et al., 2018; Quintal et al., 2016). Third, energy consumption related
statements that we could generate from the available dataset (i.e., electric cooking) had only limited
relevance in practice, because feedback on the cooking type, or activities related to cooking are hard to
change behaviours. Future research could collect data on human activities that have more actionable
impact on the consumption behaviour (e.g., standby consumption of appliances or old devices). Fourth,
the experimental evaluation only considers SHAP based XAI visualizations. We focused on SHAP
because it performed best in terms of stability and faithfulness for the case at hand. Nevertheless, future
research could involve additional visualizations based on other XAI methods such as LIME.
6.3 Future research
Considering the two major findings and the three limitations of our study, we identify the following five
areas for future research.
First, the patterns that are detected by ML and visualized by XAI should be validated with respect to
their meaningfulnessseparately to the visualization. Future studies could either investigate this with
ground truth data, for example, collecting data on the true pattern of electric cooking in our case (e.g.,
with interviews, household surveys, or energy audits). Studies could also approach this using synthetic
data where the pattern are known upfront, as for example Tonekaboni et al. (2020) did.
Second, the visualization variants should be evaluated independent of the meaningfulness of the detected
pattern. Here, our experimental setting can be replicated using visualization of pattern that are known
to be correct. This can reduce variance in the collected variables and should follow Abdul et al. (2020).
Third, the efficacy of electricity consumption feedback should be validated in field trials that measure
the true conservation of resources. Future studies can, for example, use earlier studies from electricity
(Allcott, 2011) and water consumption feedback (Tiefenbeck et al., 2016) as a blueprint to evaluate the
novel XAI-based feedback visualizations.
Fourth, several other methods for time series data processing and XAI exist, which are steadily improved
and novel ones suggested. Our research design can be extended with alternative technical approaches.
Fifth, further research could focus on feedback elements that show what type of activity contributes how
much to the overall electricity consumption. The new XAI explanations could be embedded in
interactive energy feedback displays that already depict the main energy consuming appliances in
specific time-of-use frames (Costanza et al., 2012).
7 Conclusion
Our study evaluated five visualizations generated by current ML and XAI methods to give consumers
feedback on their electricity consumption. We selected residential electricity consumption as our study
context because reducing energy demand is a societal challenge. Yet, the energy consumption context
is also an interesting study site from an information systems perspective, because extensive time series
data is available, which contains complex patterns that may not be easy to recognize by humans. Given
XAI for electricity consumption feedback
Twenty-Ninth European Conference on Information Systems (ECIS 2021), Marrakesh, Morocco. 13
the recent calls for empirical research to get a better understanding on how to integrate AI in human
workplaces (Lyytinen et al., 2020; Rai et al., 2019), and the importance of AI technology to support
humans by augmenting reality rather than replacing humans by AI (Raisch and Krakowski, 2020), our
study demonstrated the power of XAI methods in human-AI interface design and highlights areas of
further research and development.
8 Appendix: Requirements and design features for XAI-based
feedback on electricity consumption
We reviewed the related research fields to identify design requirements for XAI-based feedback on
electricity consumption. As a starting point, we chose five XAI literature review articles (Abdul et al.,
2018; Adadi and Berrada, 2018; Anjomshoae et al., 2019; Miller, 2019; Mohseni et al., 2020) and
articles that summarize general feedback research (Cianci et al., 2010; Mumm and Mutlu, 2011; van
Duijvenvoorde et al., 2008), pro-ecological behaviour (Klöckner, 2013), energy consumption feedback
(Lu et al., 2016), and electricity feedback (Benartzi et al., 2017; Fischer, 2008; Karjalainen, 2011; Weiss
et al., 2016). With these articles, we conducted a forward and backward search (we reviewed all
references in the review articles and all citations of them in Google Scholar) in order to complete our
picture on the topics. In total, we reviewed the metadata of 869 referenced articles in the XAI papers
(376 in the electricity feedback papers) and 1,124 articles that cited these papers (2,870 for feedback).
In this review, we selected papers that contained requirements for the design of novel feedback elements.
In the end, we found ten additional articles for XAI visualizations and 16 additional articles for feedback
regarding electricity use in addition to those that we used as the starting point of our review. We list the
identified requirements and their realization in the visualizations in Table 7.
Category
Requirement named in literature
XAI visualizations
Tips
Chat
-bot
LD
BD
PD
SHD
TX
CMT
ET
1. Requirements for XAI visualizations
1.1. Information content
Why / why not explanations
Display of few representative instances for why /
why not explanations
Details about causal relations (selective
explanations)
Input variable information (value and relevance of
the variables)
No display of accuracy information
No possibilities to modify the model in the case
of high accuracy
1.2. User interface
Combination of text and image elements
Adequate degree of user interaction
2. Requirements for electricity consumption feedback
2.1. Information content
Data source (actually measured el. consumption)
Unit of measurement (kWh cost)
Relation to time of usage
Granularity related to activities
Historical comparison with previous time units
Descriptive, normative comparison with the average
Individualized energy saving tips
2.2 Multimodal feedback
Combination of feedback types
2.3. Specific formats
Bar diagram for historical comparison
Bar diagram for normative comparison
Grading scale for injunctive normative comparison
2.4. Colour usage
Use of diagram colours that are preferred by users
(e.g., traffic light indicators)
Colours that activate associations (e.g., red, green)
Colours without associations (e.g., black, white)
Text colours (black text on white background)
2.5. Interaction design
Evaluative feedback
Interaction with virtual agent
LD: line diagram, BD: bar diagram, PD: polar diagram, SHD: SHAP diagram, TX: text, CMT: curtailment tip, ET: efficiency tip
Table 7. Requirements for XAI and electricity consumption feedback; symbols indicate their
realization (“
realized, “
partly realized, “
not realized) to what extent our
visualizations.
XAI for electricity consumption feedback
Twenty-Ninth European Conference on Information Systems (ECIS 2021), Marrakesh, Morocco. 14
References
Abbasi, A., Albrecht, C., Vance, A., Hansen, J., 2012. Metafraud: A Meta-Learning Framework for
Detecting Financial Fraud. MIS Q. 36, 1293-A12.
Abdul, A., Vermeulen, J., Wang, D., Lim, B.Y., Kankanhalli, M., 2018. Trends and Trajectories for
Explainable, Accountable and Intelligible Systems: An HCI Research Agenda, in: Proceedings
of the 2018 CHI Conference on Human Factors in Computing Systems, CHI ’18. Association
for Computing Machinery, New York, NY, USA, pp. 1–18.
https://doi.org/10.1145/3173574.3174156
Abdul, A., von der Weth, C., Kankanhalli, M., Lim, B.Y., 2020. COGAM: Measuring and Moderating
Cognitive Load in Machine Learning Model Explanations, in: Proceedings of the 2020 CHI
Conference on Human Factors in Computing Systems, CHI ’20. Association for Computing
Machinery, New York, NY, USA, pp. 1–14. https://doi.org/10.1145/3313831.3376615
Adadi, A., Berrada, M., 2018. Peeking Inside the Black-Box: A Survey on Explainable Artificial
Intelligence (XAI). IEEE Access 6, 52138–52160.
https://doi.org/10.1109/ACCESS.2018.2870052
Albert, A., Rajagopal, R., 2013. Smart Meter Driven Segmentation: What Your Consumption Says
About You. IEEE Trans. Power Syst. 28, 4019–4030.
Allcott, H., 2011. Social norms and energy conservation. J. Public Econ., Special Issue: The Role of
Firms in Tax Systems 95, 1082–1095. https://doi.org/10.1016/j.jpubeco.2011.03.003
Allcott, H., Mullainathan, S., 2010. Behavior and energy policy. Science 327, 1204–1205.
Alvarez Melis, D., Jaakkola, T., 2018. Towards Robust Interpretability with Self-Explaining Neural
Networks, in: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett,
R. (Eds.), Advances in Neural Information Processing Systems 31. Curran Associates, Inc., pp.
7775–7784.
Anjomshoae, S., Najjar, A., Calvaresi, D., Främling, K., 2019. Explainable Agents and Robots: Results
from a Systematic Literature Review, in: Proceedings of the 18th International Conference on
Autonomous Agents and MultiAgent Systems (AAMAS) 2019. Montreal, Canada, p. 13.
Backhaus, K., Erichson, B., Weiber, R., 2015. Auswahlbasierte Conjoint- Analyse, in: Backhaus, K.,
Erichson, B., Weiber, R. (Eds.), Fortgeschrittene Multivariate Analysemethoden: Eine
anwendungsorientierte Einführung. Springer, Berlin, Heidelberg, pp. 175–292.
https://doi.org/10.1007/978-3-662-46087-0_5
Barredo Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S.,
Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., Herrera, F., 2020. Explainable Artificial
Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI.
Inf. Fusion 58, 82–115. https://doi.org/10.1016/j.inffus.2019.12.012
Beckel, C., Sadamori, L., Staake, T., Santini, S., 2014. Revealing household characteristics from smart
meter data. Energy 78, 397–410.
Benartzi, S., Beshears, J., Milkman, K.L., Sunstein, C.R., Thaler, R.H., Shankar, M., Tucker-Ray, W.,
Congdon, W.J., Galing, S., 2017. Should Governments Invest More in Nudging? Psychol. Sci.
28, 1041–1055. https://doi.org/10.1177/0956797617702501
Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32.
Brülisauer, M., Goette, L., Jiang, Z., Schmitz, J., Schubert, R., 2020. Appliance-specific feedback and
social comparisons: Evidence from a field experiment on energy conservation. Energy Policy
145, 111742. https://doi.org/10.1016/j.enpol.2020.111742
XAI for electricity consumption feedback
Twenty-Ninth European Conference on Information Systems (ECIS 2021), Marrakesh, Morocco. 15
Cheng, H.-F., Wang, R., Zhang, Z., O’Connell, F., Gray, T., Harper, F.M., Zhu, H., 2019. Explaining
Decision-Making Algorithms through UI: Strategies to Help Non-Expert Stakeholders, in:
Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems - CHI ’19.
Presented at the the 2019 CHI Conference, ACM Press, Glasgow, Scotland Uk, pp. 1–12.
https://doi.org/10.1145/3290605.3300789
Cianci, A.M., Schaubroeck, J.M., McGill, G.A., 2010. Achievement Goals, Feedback, and Task
Performance. Hum. Perform. 23, 131–154. https://doi.org/10.1080/08959281003621687
Commission for Energy Regulation, 2011. Electricity Smart Metering Customer Behaviour Trials
(CBT) Findings Report (Information Paper No. CER11080a).
Coombs, C., Hislop, D., Taneva, S.K., Barnard, S., 2020. The strategic impacts of Intelligent
Automation for knowledge and service work: An interdisciplinary review. J. Strateg. Inf. Syst.
101600. https://doi.org/10.1016/j.jsis.2020.101600
Costanza, E., Ramchurn, S.D., Jennings, N.R., 2012. Understanding domestic energy consumption
through interactive visualisation: a field study, in: Proceedings of the 2012 ACM Conference
on Ubiquitous Computing, UbiComp ’12. Association for Computing Machinery, New York,
NY, USA, pp. 216–225. https://doi.org/10.1145/2370216.2370251
DESTATIS, 2020. Bevölkerung und Erwerbstätigkeit - Haushalte und Familien Ergebnisse des
Mikrozensus (No. 2010300197004), Fachserie 1, Reihe 3. German Federal Statistical Office.
Dourish, P., 2016. Algorithms and their others: Algorithmic culture in context. Big Data Soc. 3,
205395171666512. https://doi.org/10.1177/2053951716665128
Eurostat, 2020. Distribution of the population by housing ownership, household type and income group
- EU-SILC survey (Statistical data), Income and living conditions (ilc). Eurostat, the statistical
office of the European Union, Brussels, Belgium.
Faraj, S., Pachidi, S., Sayegh, K., 2018. Working and organizing in the age of the learning algorithm.
Inf. Organ. 28, 62–70. https://doi.org/10.1016/j.infoandorg.2018.02.005
Fawcett, T., 2006. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874.
Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D., 2014. Do we need hundreds of classifiers
to solve real world classification problems? J. Mach. Learn. Res. 15, 3133–3181.
Fischer, C., 2008. Feedback on household electricity consumption: a tool for saving energy? Energy
Effic. 1, 79–104.
Flora, J.A., Banerjee, B., 2014. Energy Graph Feedback: Attention, Cognition and Behavior Intentions,
in: Marcus, A. (Ed.), Design, User Experience, and Usability. User Experience Design for
Everyday Life Applications and Services, Lecture Notes in Computer Science. Springer
International Publishing, Cham, pp. 520–529. https://doi.org/10.1007/978-3-319-07635-5_50
Frey, C.B., Osborne, M.A., 2017. The future of employment: How susceptible are jobs to
computerisation? Technol. Forecast. Soc. Change 114, 254–280.
https://doi.org/10.1016/j.techfore.2016.08.019
Grønsund, T., Aanestad, M., 2020. Augmenting the algorithm: Emerging human-in-the-loop work
configurations. J. Strateg. Inf. Syst. 101614. https://doi.org/10.1016/j.jsis.2020.101614
Hart, G.W., 1992. Nonintrusive appliance load monitoring. Proc. IEEE 80, 1870–1891.
https://doi.org/10.1109/5.192069
Hastie, T., Tibshirani, R., Friedman, J., 2009. The Elements of Statistical Learning, Springer Series in
Statistics. Springer, New York, NY.
XAI for electricity consumption feedback
Twenty-Ninth European Conference on Information Systems (ECIS 2021), Marrakesh, Morocco. 16
Herrmann, M.R., Brumby, D.P., Oreszczyn, T., 2018. Watts your usage? A field study of householders’
literacy for residential electricity data. Energy Effic. 11, 1703–1719.
https://doi.org/10.1007/s12053-017-9555-y
Hevner, A.R., March, S.T., Park, T., Ram, S., 2004. Design Science in Information Systems Research.
MIS Q. 28, 75–105.
Hopf, K., 2019. Predictive Analytics for Energy Efficiency and Energy Retailing, 1st ed, Contributions
of the Faculty Information Systems and Applied Computer Sciences of the Otto-Friedrich-
University Bamberg. University of Bamberg, Bamberg.
Hopf, K., Sodenkamp, M., Staake, T., 2018. Enhancing energy efficiency in the residential sector with
smart meter data analytics. Electron. Mark. 28. https://doi.org/10.1007/s12525-018-0290-9
Iivari, J., 2015. Distinguishing and contrasting two strategies for design science research. Eur. J. Inf.
Syst. 24, 107–115. https://doi.org/10.1057/ejis.2013.35
Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., Muller, P.-A., 2019. Deep learning for time
series classification: a review. Data Min. Knowl. Discov. 33, 917–963.
https://doi.org/10.1007/s10618-019-00619-1
Ismail Fawaz, H., Lucas, B., Forestier, G., Pelletier, C., Schmidt, D.F., Weber, J., Webb, G.I.,
Idoumghar, L., Muller, P.-A., Petitjean, F., 2020. InceptionTime: Finding AlexNet for time
series classification. Data Min. Knowl. Discov. 34, 1936–1962. https://doi.org/10.1007/s10618-
020-00710-y
Karjalainen, S., 2011. Consumer preferences for feedback on household electricity consumption. Energy
Build. 43, 458–467. https://doi.org/10.1016/j.enbuild.2010.10.010
Kaur, H., Nori, H., Jenkins, S., Caruana, R., Wallach, H., Wortman Vaughan, J., 2020. Interpreting
Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine
Learning, in: Proceedings of the 2020 CHI Conference on Human Factors in Computing
Systems. Presented at the CHI ’20: CHI Conference on Human Factors in Computing Systems,
ACM, Honolulu HI USA, pp. 1–14. https://doi.org/10.1145/3313831.3376219
Klöckner, C.A., 2013. A comprehensive model of the psychology of environmental behaviour—A meta-
analysis. Glob. Environ. Change 23, 1028–1038.
https://doi.org/10.1016/j.gloenvcha.2013.05.014
Kühl, N., Lobana, J., Meske, C., 2019. Do you comply with AI? Personalized explanations of learning
algorithms and their impact on employees’ compliance behavior, in: ICIS 2019 Paper-a-Thon.
Presented at the 40th International Conference on Information Systems (ICIS), AIS electronic
library, Munich, Germany.
Lee, S., Kim, S., Hung, Y., Lam, H., Kang, Y., Yi, J.S., 2016. How do People Make Sense of Unfamiliar
Visualizations?: A Grounded Model of Novice’s Information Visualization Sensemaking. IEEE
Trans. Vis. Comput. Graph. 22, 499–508. https://doi.org/10.1109/TVCG.2015.2467195
Lu, S., Ham, J., Midden, C., 2016. The influence of color association strength and consistency on ease
of processing of ambient lighting feedback. J. Environ. Psychol. 47, 204–212.
https://doi.org/10.1016/j.jenvp.2016.06.005
Lundberg, S.M., Lee, S.-I., 2017. A unified approach to interpreting model predictions, in: Proceedings
of the 31st International Conference on Neural Information Processing Systems, NIPS’17.
Curran Associates Inc., Red Hook, NY, USA, pp. 4768–4777.
Lyytinen, K., Nickerson, J.V., King, J.L., 2020. Metahuman systems = humans + machines that learn.
J. Inf. Technol. 0268396220915917. https://doi.org/10.1177/0268396220915917
XAI for electricity consumption feedback
Twenty-Ninth European Conference on Information Systems (ECIS 2021), Marrakesh, Morocco. 17
McKinney, S.M., Sieniek, M., Godbole, V., Godwin, J., Antropova, N., Ashrafian, H., Back, T., Chesus,
M., Corrado, G.C., Darzi, A., Etemadi, M., Garcia-Vicente, F., Gilbert, F.J., Halling-Brown,
M., Hassabis, D., Jansen, S., Karthikesalingam, A., Kelly, C.J., King, D., Ledsam, J.R.,
Melnick, D., Mostofi, H., Peng, L., Reicher, J.J., Romera-Paredes, B., Sidebottom, R.,
Suleyman, M., Tse, D., Young, K.C., Fauw, J.D., Shetty, S., 2020. International evaluation of
an AI system for breast cancer screening. Nature 577, 8994. https://doi.org/10.1038/s41586-
019-1799-6
Miller, T., 2019. Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell.
267, 1–38. https://doi.org/10.1016/j.artint.2018.07.007
Mohseni, S., Zarei, N., Ragan, E.D., 2020. A Multidisciplinary Survey and Framework for Design and
Evaluation of Explainable AI Systems. ArXiv181111839 Cs.
Mumm, J., Mutlu, B., 2011. Designing motivational agents: The role of praise, social comparison, and
embodiment in computer feedback. Comput. Hum. Behav. 27, 1643–1650.
https://doi.org/10.1016/j.chb.2011.02.002
Naous, D., Legner, C., 2017. Leveraging Market Research Techniques in IS A Review of Conjoint
Analysis in IS Research, in: ICIS 2017 Proceedings. Presented at the 38. International
Conference on Information Systems (ICIS), AIS electronic library, Seoul, South Korea.
Nourani, M., Kabir, S., Mohseni, S., Ragan, E.D., 2019. The Effects of Meaningful and Meaningless
Explanations on Trust and Perceived System Accuracy in Intelligent Systems, in: Proceedings
of the 33rd AAAI Conference on Artificial Intelli-Gence. Presented at the 33rd AAAI
Conference on Artificial Intelli-gence, AAAI Press, Paolo Alto, pp. 97–105.
Paas, F., Ayres, P., Pachman, M., 2008. Assessment of cognitive load in multimedia learning. Recent
Innov. Educ. Technol. Facil. Stud. Learn. Inf. Age Publ. Inc Charlotte NC 11–35.
Peffers, K., Tuunanen, T., Rothenberger, M.A., Chatterjee, S., 2007. A Design Science Research
Methodology for Information Systems Research. J. Manag. Inf. Syst. 24, 45–77.
https://doi.org/10.2753/MIS0742-1222240302
Quintal, F., Jorge, C., Nisi, V., Nunes, N., 2016. Watt-I-See: A Tangible Visualization of Energy, in:
Proceedings of the International Working Conference on Advanced Visual Interfaces, AVI ’16.
Association for Computing Machinery, New York, NY, USA, pp. 120–127.
https://doi.org/10.1145/2909132.2909270
Rai, A., Constantinides, P., Sarker, S., 2019. Editor’s Comments: Next-Generation Digital Platforms:
Toward Human–AI Hybrids. Manag. Inf. Syst. Q. 43, iii–ix.
Raisch, S., Krakowski, S., 2020. Artificial Intelligence and Management: The Automation-
Augmentation Paradox. Acad. Manage. Rev. https://doi.org/10.5465/2018.0072
Ribeiro, M.T., Singh, S., Guestrin, C., 2016. “Why Should I Trust You?”: Explaining the Predictions of
Any Classifier, in: Proceedings of the 22nd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, KDD ’16. Association for Computing Machinery, New
York, NY, USA, pp. 1135–1144. https://doi.org/10.1145/2939672.2939778
Sarker, S., Chatterjee, S., Xiao, X., Elbanna, A., 2019. The Sociotechnical Axis of Cohesion for the IS
Discipline: Its Historical Legacy and its Continued Relevance. Manag. Inf. Syst. Q. 43, 695–
719.
Satell, G., Sutton, J., 2019. We Need AI That Is Explainable, Auditable, and Transparent. Harv. Bus.
Rev. Digit. Artic. 2–5.
Schlegel, U., Arnout, H., El-Assady, M., Oelke, D., Keim, D.A., 2019. Towards A Rigorous Evaluation
Of XAI Methods On Time Series, in: 2019 IEEE/CVF International Conference on Computer
Vision Workshop (ICCVW). Presented at the 2019 IEEE/CVF International Conference on
XAI for electricity consumption feedback
Twenty-Ninth European Conference on Information Systems (ECIS 2021), Marrakesh, Morocco. 18
Computer Vision Workshop (ICCVW), pp. 4197–4201.
https://doi.org/10.1109/ICCVW.2019.00516
Schneider, J., Handali, J., 2019. Personalized Explanation for Machine Learning: A Conceptualization,
in: ECIS 2019 Research Papers. Presented at the 27th European Conference on Information
Systems (ECIS), AIS electronic library, Stockholm & Upsala, Sweden.
Slack, D., Hilgard, S., Jia, E., Singh, S., Lakkaraju, H., 2020. Fooling LIME and SHAP: Adversarial
Attacks on Post hoc Explanation Methods, in: Proceedings of the AAAI/ACM Conference on
AI, Ethics, and Society, AIES ’20. Association for Computing Machinery, New York, NY,
USA, pp. 180–186. https://doi.org/10.1145/3375627.3375830
Tiefenbeck, V., 2017. Bring behaviour into the digital transformation. Nat. Energy 2, 17085.
https://doi.org/10.1038/nenergy.2017.85
Tiefenbeck, V., Goette, L., Degen, K., Tasic, V., Fleisch, E., Lalive, R., Staake, T., 2016. Overcoming
Salience Bias: How Real-Time Feedback Fosters Resource Conservation. Manag. Sci.
https://doi.org/10.1287/mnsc.2016.2646
Tonekaboni, S., Joshi, S., Campbell, K., Duvenaud, D., Goldenberg, A., 2020. What went wrong and
when? Instance-wise Feature Importance for Time-series Models, in: Proceedings of the 34th
Conference on Neural Information Processing Systems (NeurIPS 2020). Presented at the 34th
Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.
van Duijvenvoorde, A.C.K., Zanolie, K., Rombouts, S.A.R.B., Raijmakers, M.E.J., Crone, E.A., 2008.
Evaluating the Negative or Valuing the Positive? Neural Mechanisms Supporting Feedback-
Based Learning across Development. J. Neurosci. 28, 9495–9503.
https://doi.org/10.1523/JNEUROSCI.1485-08.2008
Vasseur, V., Marique, A.-F., Udalov, V., 2019. A Conceptual Framework to Understand Households’
Energy Consumption. Energies 12, 4250. https://doi.org/10.3390/en12224250
Vermeulen, B., Goos, P., Vandebroek, M., 2008. Models and optimal designs for conjoint choice
experiments including a no-choice option. Int. J. Res. Mark. 25, 94–103.
https://doi.org/10.1016/j.ijresmar.2007.12.004
Wang, Y., Chen, Q., Gan, D., Yang, J., Kirschen, D.S., Kang, C., 2018. Deep Learning-Based Socio-
demographic Information Identification from Smart Meter Data. IEEE Trans. Smart Grid PP,
1–1. https://doi.org/10.1109/TSG.2018.2805723
Wanner, J., Herm, L.-V., Janiesch, C., 2020. How Much Is the Black Box? The Value of Explainability
in Machine Learning Models, in: ECIS 2020 Research-in-Progress Papers. Presented at the 28th
European Conference on Information Systems (ECIS), AIS electronic library.
Weigert, A., Hopf, K., Weinig, N., Staake, T., 2020. Detection of heat pumps from smart meter and
open data. Energy Inform. 3, 21. https://doi.org/10.1186/s42162-020-00124-6
Weiss, T., Diesing, M., Krause, M., Heinrich, K., Hilbert, A., 2016. Effective Visualizations of Energy
Consumption in a Feedback System A Conjoint Measurement Study, in: Abramowicz, W.,
Alt, R., Franczyk, B. (Eds.), Business Information Systems, Lecture Notes in Business
Information Processing. Springer International Publishing, Cham, pp. 55–66.
https://doi.org/10.1007/978-3-319-39426-8_5
White, H., 1980. A Heteroskedasticity-Consistent Covariance Matrix and a Direct Test
forHeteroskedasticity. Econometrica 817–383.
Yan, T., Tourangeau, R., 2008. Fast times and easy questions: the effects of age, experience and question
complexity on web survey response times. Appl. Cogn. Psychol. 22, 51–68.
https://doi.org/10.1002/acp.1331
XAI for electricity consumption feedback
Twenty-Ninth European Conference on Information Systems (ECIS 2021), Marrakesh, Morocco. 19
Zeifman, M., Roth, K., 2011. Nonintrusive appliance load monitoring: Review and outlook. IEEE Trans.
Consum. Electron. 76–84. https://doi.org/10.1109/TCE.2011.5735484
Zeileis, A., 2004. Econometric Computing with HC and HAC Covariance Matrix Estimators. J. Stat.
Softw. 11, 1–17. https://doi.org/10.18637/jss.v011.i10
... As LIME can give the contradict or support value of each input feature for a prediction sample, it is valuable to explain the prediction of classification problems. Wastensteiner et al. used LIME to interpret ML-based time-series classification for building energy consumption and analyzed the stability and reliability of the interpretation [124] . Madhikermi et al. trained ANN and SVM for AHU fault diagnosis, and six samples were randomly selected to demonstrate the interpretability of LIME [96] . ...
... DNN models were used for prediction in each stage, and LIME was employed to interpret the model output [131] . Besides, LIME was also used for other building management-related applications such as distributed PV power prediction [102] , electricity demand prediction [124] , and indoor CO 2 concentration prediction [122] . ...
... Santos et al. adopted XGBoost to detect fraud electricity consumption in the market, and SHAP was used to build interpretations for fraud activities afterward [119] . Additionally, SHAP can be used to interpret time-series classification for building energy consumption [124] . ...
Article
Full-text available
Machine learning has been widely adopted for improving building energy efficiency and flexibility in the past decade owing to the ever-increasing availability of massive building operational data. However, it is challenging for end-users to understand and trust machine learning models because of their black-box nature. To this end, the interpretability of machine learning models has attracted increasing attention in recent studies because it helps users understand the decisions made by these models. This article reviews previous studies that adopted interpretable machine learning techniques for building energy management to analyze how model interpretability is improved. First, the studies are categorized according to the application stages of interpretable machine learning techniques: ante-hoc and post-hoc approaches. Then, the studies are analyzed in detail according to specific techniques with critical comparisons. Through the review, we find that the broad application of interpretable machine learning in building energy management faces the following significant challenges: (1) different terminologies are used to describe model interpretability which could cause confusion, (2) performance of interpretable ML in different tasks is difficult to compare, and (3) current prevalent techniques such as SHAP and LIME can only provide limited interpretability. Finally, we discuss the future R&D needs for improving the interpretability of black-box models that could be significant to accelerate the application of machine learning for building energy management.
... Once a decision has been made on the content of the transparency cue, how aspects move into focus [20]. Regarding the format of presentation, various studies underlined that users preferred visualizations over text-based cues (e.g., [24,25]). As an example, Wastensteiner et al. [25] compared energy consumption feedback in the form of a bar or line diagram to that in the form of sentences. ...
... Regarding the format of presentation, various studies underlined that users preferred visualizations over text-based cues (e.g., [24,25]). As an example, Wastensteiner et al. [25] compared energy consumption feedback in the form of a bar or line diagram to that in the form of sentences. Contrary to the users' preference ratings, the authors also found that the text explanation was better understood, and Silva et al. [11] even showed that various presentation formats (e.g., text, decision trees, probabilities) did not affect trust perceptions or performances differently when using a decision-making system. ...
Conference Paper
Full-text available
As technologies become more complex, the question of how transparent they should be for users and how transparency cues should be designed comes to the fore. Transparency refers to the extent to which users learn, for example, how the technology works or arrives at certain results. The increased interest in this topic also stems from legal changes such as the debate about a European AI regulation, which demands transparent AI systems and thus necessitates solutions for an optimal design of transparency cues. The paper discusses examples and risks of lacking transparency and approaches and the state of knowledge for improving the user experience by technology-based transparency cues. Finally, we present an outlook on the promising directions for design guidelines and next steps of research.
... For post-hoc interpretable techniques, Local Interpretable Modelagnostic Explanations (LIME) [46] and Shapley Additive exPlanations (SHAP) [47] are the most commonly-used model-agnostic tools to interpret individual predictions. Wastensteiner et al. [48] applied LIME to interpret the classification results of time-series building energy consumption data and evaluated the reliability of the interpretation. Zdravkovic et al. [49] employed LIME to generate the feature importances for the local forecasts for the district heating demand. ...
... 2) A lack of interpretation of the prediction results achieved by ML techniques from the perspective of the building load characteristics. The prior studies on interpretable ML techniques improved the interpretability by improving the structure of the individual models during the model training process [44,45] or measuring the feature importance after training [48][49][50][51][52]. Few studies have tried to explain the prediction performance of ML techniques based on the characteristics of building load profiles. ...
Article
Full-text available
Data-driven forecasting techniques have been widely used for building load forecasting due to their accuracy and wide availability of operational data. Recent advances have been underpinned by the increased capability of machine learning (ML) algorithms; however, most studies only tested ML techniques on a single or a small number of buildings over short periods, lacking reliable tests. Moreover, few studies focused on the effects of characteristics of building load profiles on forecast accuracy, lacking the interpretation of ML-based prediction results. In this study, we investigate the impacts of building load dispersion level on its best load forecasting accuracy, which is obtained by comparing the forecasting performances of 11 prediction models over 9 weeks for 56 British non-domestic buildings. We find that conventional shallow ML models still outperform the increasingly popular deep learning models for time-series load forecasting, and ensemble learning can help improve forecast accuracy by integrating diverse individual models. We demonstrate that each building’s best forecasting performance is largely influenced by the load dispersion level. In practice, the proposed dispersion metrics are recommended to quantify load dispersion levels before model development. For a building with a low dispersion level, the simple persistence model has satisfactory performance and could be directly used for design, control, and fault diagnosis of building energy systems for energy efficiency and energy flexibility.
... They showed the advantages in terms of the interpretability, compactness, and robustness of using LIME instead of sensitivity models for the evaluation of energy systems. Wastensteiner et al. [166] applied LIME to interpret ML-based time-series classification models for the assessment of buildings' energy consumption, then analysed the stability and reliability of the interpretations. Tsoka et al. [167] developed an eXplainable Neural Network (XNN) to evaluate Italians EPCs. ...
Article
Full-text available
Machine learning (ML) algorithms are now part of everyday life, as many technological devices use these algorithms. The spectrum of uses is wide, but it is evident that ML represents a revolution that may change almost every human activity. However, as for all innovations, it comes with challenges. One of the most critical of these challenges is providing users with an understanding of how models’ output is related to input data. This is called “interpretability”, and it is focused on explaining what feature influences a model’s output. Some algorithms have a simple and easy-to-understand relationship between input and output, while other models are “black boxes” that return an output without giving the user information as to what influenced it. The lack of this knowledge creates a truthfulness issue when the output is inspected by a human, especially when the operator is not a data scientist. The Building and Construction sector is starting to face this innovation, and its scientific community is working to define best practices and models. This work is intended for developing a deep analysis to determine how interpretable ML models could be among the most promising future technologies for the energy management in built environments.
... Further limitations concern the implementation of technical aspects: Our XAI component relies only on default SHAP force plots, which other research has found difficult to interpret (Wastensteiner et al., 2021), potentially leading to limited value for decision-making. Thus, more task-specific visualizations for price estimations (e.g., including the value of feature attributions as monetary units) and other XAI visualization methods, such as LIME (Ribeiro et al., 2016), could be implemented to improve the decision support. ...
Conference Paper
Full-text available
Information systems (IS) are frequently designed to leverage the negative effect of anchoring bias to influence individuals' decision-making (e.g., by manipulating purchase decisions). Recent advances in Artificial Intelligence (AI) and the explanations of its decisions through explainable AI (XAI) have opened new opportunities for mitigating biased decisions. So far, the potential of these technological advances to overcome anchoring bias remains widely unclear. To this end, we conducted two online experiments with a total of N=390 participants in the context of purchase decisions to examine the impact of AI and XAI-based decision support on anchoring bias. Our results show that AI alone and its combination with XAI help to mitigate the negative effect of anchoring bias. Ultimately, our findings have implications for the design of AI and XAI-based decision support and IS to overcome cognitive biases.
... However, developing so-called post-hoc explainability methods that possess high fidelity and end-user friendliness is still an ongoing research effort. 58 • Push intrinsically transparent models to the max-As noted above, a complex deep learning model typically outperforms simple additive and linear models when fed with the same raw data. This is due to the ability of deep neural networks to automatically discover and represent relevant features and model non-linearities and interactions. ...
Article
Full-text available
Recent years have brought major technological breakthroughs in artificial intelligence (AI), and firms are expected to invest nearly $98 B in 2023. However, many AI projects never leave the pilot phase, and many companies have difficulties extracting value from their AI initiatives. To explain this contradiction, this article reports on a study of 55 projects implementing AI in organizations. It shows that organizational challenges in implementing AI projects are a result of a paradoxical tension created by two different perspectives on data science work: craft and mechanical work. Executives, managers, and data scientists should actively manage this tension to enable and sustain value creation through AI.
... Our XAI-based feedback is inspired by SRL theory in the form that many of our ML features should stimulate and guide cognitive learning strategies (e.g., watch a specific video to repeat its content) and all phases of the SRL process (e.g., progress control in the performance phase). This may also hold for XAIbased feedback within other domains such as resource conservation (Wastensteiner et al. 2021), where theory from environmental psychology and human-computer interaction can guide the design of such artifacts. ...
Conference Paper
Due to the advent of digital learning environments and the freedom they offer for learners, new challenges arise for students' self-regulated learning. To overcome these challenges, the provision of feedback has led to excellent results, such as less procrastination and improved academic performance. Yet, current feedback artifacts neglect learners’ heterogeneity when it comes to prescriptive feedback that should meet personal characteristics and self-regulated learning skills. In this paper, we derive requirements from self-regulated learning theory for a feedback artifact that takes learners’ heterogeneity into account. Based on these requirements, we design, instantiate, and evaluate an Explainable AI-based approach. The results demonstrate that our artifact is able to detect promising patterns in data on learners' behaviors and characteristics. Moreover, our evaluation suggests that learners perceive our feedback as valuable. Ultimately, our study informs Information Systems research in the design of future Explainable AI-based feedback artifacts that seek to address learners' heterogeneity.
... Concept drift resulting from environmental changes, such as pandemic-induced lock-downs, drastically impacts the energy consumption patterns necessitating online ML (García-Martín et al., 2019). Explaining these predictions yields a greater understanding of an individual's energy use and enables prescriptive modeling for further energy-saving measures (Wastensteiner et al., 2021). For black-box ML methods, so-called post-hoc XAI methods seek to explain single predictions or entire models in terms of the contribution of specific features (Adadi & Berrada, 2018). ...
Article
Full-text available
Explainable artificial intelligence has mainly focused on static learning scenarios so far. We are interested in dynamic scenarios where data is sampled progressively, and learning is done in an incremental rather than a batch mode. We seek efficient incremental algorithms for computing feature importance (FI). Permutation feature importance (PFI) is a well-established model-agnostic measure to obtain global FI based on feature marginalization of absent features. We propose an efficient, model-agnostic algorithm called iPFI to estimate this measure incrementally and under dynamic modeling conditions including concept drift. We prove theoretical guarantees on the approximation quality in terms of expectation and variance. To validate our theoretical findings and the efficacy of our approaches in incremental scenarios dealing with streaming data rather than traditional batch settings, we conduct multiple experimental studies on benchmark data with and without concept drift.
... Among these articles, twelve build their research upon existing methodologies while the remaining 23 do not organize their research according to a standard process (see Table 1). [12] x Bohanec, Robnik-Šikonja, & Borštnar (2017a) [43] x x Bohanec, Robnik-Šikonja, & Borštnar (2017b) [44] x x Ming, Qu, & Bertini (2018) [45] x Ventura, Cerquitelli, & Giacalone (2018) [46] x Colace et al. (2019) [47] x Eitle & Buxmann (2019) [48] x Yeganejou, Dick, & Miller (2019) [49] x Alvanpour, Das, Robinson, Nasraoui, & Popa (2020) [50] x Bramhall, Horn, Tieu, & Lohia (2020) [51] x Dolk, Kridel, Dineen, & Castillo (2020) [52] x Harl, Weinzierl, Stierle, & Matzner (2020) [53] x Kim, Srinivasan, & Ram (2020) [54] x Mehdiyev & Fettke (2020) [55] x Yadam, Moharir, & Srivastava (2020) [56] x Zhang, Du, & Zhang (2020) [57] Asmussen, Jørgensen, & Møller (2021) [58] x x Barfar, Padmanabhan, & Hevner (2021) [59] x Garriga, Aarns, Tsigkanos, Tamburri, & Heuvel (2021) [60] x Han et al. (2021) [61] x Herm, Wanner, Seubert, & Janiesch (2021) [62] x Johnson, Albizri, Harfouche, & Tutun (2021) [63] x Liu, Du, Hong, & Fan (2021) [64] x Mehdiyev & Fettke (2021) [65] x Mombini et al. (2021) [66] x Pereira et al. (2021) [67] x Velichety & Ram (2021) [68] x Wang et al. (2021) [69] x Wastensteiner, Michael Weiss, Haag, & Hopf (2021) [70] x Zhou, Wang, Ren, & Chen (2021) [71] x Bodendorf, Xie, Merkl, & Franke (2022) [72] x Johnson, Albizri, Harfouche, & Fosso-Wamba (2022) [73] x Kowalczyk, Röder, Dürr, & Thiesse (2022) [74] x Tao, Zhou, & Hickey (2022) [75] x Yang, Yuan, & Lau (2022) [76] x ...
Preprint
Full-text available
Prediction-oriented machine learning is becoming increasingly valuable to organizations, as it may drive applications in crucial business areas. However, decision-makers from companies across various industries are still largely reluctant to employ applications based on modern machine learning algorithms. We ascribe this issue to the widely held view on advanced machine learning algorithms as "black boxes" whose complexity does not allow for uncovering the factors that drive the output of a corresponding system. To contribute to overcome this adoption barrier, we argue that research in information systems should devote more attention to the design of prototypical prediction-oriented machine learning applications (i.e., artifacts) whose predictions can be explained to human decision-makers. However, despite the recent emergence of a variety of tools that facilitate the development of such artifacts, there has so far been little research on their development. We attribute this research gap to the lack of methodological guidance to support the creation of these artifacts. For this reason, we develop a methodology which unifies methodological knowledge from design science research and predictive analytics with state-of-the-art approaches to explainable artificial intelligence. Moreover, we showcase the methodology using the example of price prediction in the sharing economy (i.e., on Airbnb).
Article
Many applications are driven by Machine Learning (ML) today. While complex ML models lead to an accurate prediction, their inner decision-making is obfuscated. However, especially for high-stakes decisions, interpretability and explainability of the model are necessary. Therefore, we develop a holistic interpretability and explainability framework (HIEF) to objectively describe and evaluate an intelligent system’s explainable AI (XAI) capacities. This guides data scientists to create more transparent models. To evaluate our framework, we analyse 50 real estate appraisal papers to ensure the robustness of HIEF. Additionally, we identify six typical types of intelligent systems, so-called archetypes, which range from explanatory to predictive, and demonstrate how researchers can use the framework to identify blind-spot topics in their domain. Finally, regarding comprehensiveness, we used a random sample of six intelligent systems and conducted an applicability check to provide external validity.
Article
Full-text available
With cloud and mobile computing, information systems (IS) evolve towards mass-market services. While user involvement is critical for IS success, the IS discipline lacks methods that allow integrating the "voice of the customer" in the case of mass-market services with individual and dispersed users. Conjoint analysis (CA), from marketing research, allows for understanding user preferences and measures user trade-offs for multiple product features simultaneously. While CA has gained popularity in the IS domain, the existing studies have mostly been one-time efforts and no cumulative research patterns have been observed. We argue that CA could have a significant impact on IS research (and practice) if it were fully developed and adopted as a method in IS. From reviewing 70 CA studies published between 1999 and 2019 in the IS field, we find that CA can be leveraged in the initial conceptualization, iterative design and evaluation of IS and their business models. We critically assess the methodological choices along the CA procedure to provide recommendations and guidance on "how" to leverage CA techniques in future IS research. We then synthesize our findings into a "Framework for Conjoint Analysis Studies in IS" that outlines "where" CA can be applied along the IS lifecycle.
Article
Full-text available
Heat pumps embody solutions that heat or cool buildings effectively and sustainably, with zero emissions at the place of installation. As they pose significant load on the power grid, knowledge on their existence is crucial for grid operators, e.g., to forecast load and to plan grid operation. Further details, like the thermal reservoir (ground or air source) or the age of a heat pump installation renders energy-related services possible that utility companies can offer in the future (e.g., detecting wrongly calibrated installations, household energy efficiency checks). This study investigates the prediction of heat pump installations, their thermal reservoir and age. For this, we obtained a dataset with 397 households in Switzerland, all equipped with smart meters, collected ground truth data on installed heat pumps and enriched this data with weather data and geographical information. Our investigation replicates the state of the art in the area of heat pump detection and goes beyond it, as we obtain three major findings: First, machine learning can detect the existence of heat pumps with an AUC performance metric of 0.82, their heat reservoir with an AUC of 0.86, and their age with an AUC of 0.73. Second, heat pump existence can be better detected using data during the heating period than during summer. Third the number of training samples to detect the existence of heat pumps must not be necessarily large in terms of the number of training instances and observation period.
Article
Full-text available
This paper brings deep learning at the forefront of research into time series classification (TSC). TSC is the area of machine learning tasked with the categorization (or labelling) of time series. The last few decades of work in this area have led to significant progress in the accuracy of classifiers, with the state of the art now represented by the HIVE-COTE algorithm. While extremely accurate, HIVE-COTE cannot be applied to many real-world datasets because of its high training time complexity in O(N2·T4) for a dataset with N time series of length T. For example, it takes HIVE-COTE more than 8 days to learn from a small dataset with N=1500 time series of short length T=46. Meanwhile deep learning has received enormous attention because of its high accuracy and scalability. Recent approaches to deep learning for TSC have been scalable, but less accurate than HIVE-COTE. We introduce InceptionTime—an ensemble of deep Convolutional Neural Network models, inspired by the Inception-v4 architecture. Our experiments show that InceptionTime is on par with HIVE-COTE in terms of accuracy while being much more scalable: not only can it learn from 1500 time series in one hour but it can also learn from 8M time series in 13 h, a quantity of data that is fully out of reach of HIVE-COTE.
Article
Full-text available
How do configurations of humans and algorithms evolve as firms adopt artificial intelligence (AI) capabilities, and what are the implications for work and organization? We explored these questions through a two-year long case study of an organization in the international maritime trade that introduced automated algorithmic support for data analysis and prediction work. Drawing on a human–machine configuration perspective, we found that humans and the algorithm were configured and reconfigured in multiple ways over time as the organization dealt with the introduction of algorithmic analysis. In contrast to replacing human work, the emergent configurations required new roles and redistribution of extant expertise to augment and improve the accuracy of the algorithm. Our analysis suggests that the new configuration resembled a human-in-the-loop pattern, comprised of both the augmentation work of auditing (i.e. the generation of a ground truth and assessment of the algorithmic output against this) as well as the work of altering the algorithm and the data acquisition architecture. Our research points to the strategic importance of a human-in-the-loop pattern for organizational reflexivity to ensure that the performance of the algorithm meets the organization’s requirements and changes in the environment.
Article
Full-text available
A significant recent technological development concerns the automation of knowledge and service work as a result of advances in Artificial Intelligence (AI) and its sub-fields. We use the term Intelligent Automation to describe this phenomenon. This development presents organisations with a new strategic opportunity to increase business value. However, academic research contributions that examine these developments are spread across a wide range of scholarly disciplines resulting in a lack of consensus regarding key findings and implications. We conduct the first interdisciplinary literature review that systematically characterises the intellectual state and development of Intelligent Automation technologies in the knowledge and service sectors. Based on this review, we provide three significant contributions. First, we conceptualise Intelligent Automation and its associated technologies. Second, we provide a business value-based model of Intelligent Automation for knowledge and service work and identify twelve research gaps that hinder a complete understanding of the business value realisation process. Third, we provide a research agenda to address these gaps.
Article
The need for interpretable and accountable intelligent systems grows along with the prevalence of artificial intelligence ( AI ) applications used in everyday life. Explainable AI ( XAI ) systems are intended to self-explain the reasoning behind system decisions and predictions. Researchers from different disciplines work together to define, design, and evaluate explainable systems. However, scholars from different disciplines focus on different objectives and fairly independent topics of XAI research, which poses challenges for identifying appropriate design and evaluation methodology and consolidating knowledge across efforts. To this end, this article presents a survey and framework intended to share knowledge and experiences of XAI design and evaluation methods across multiple disciplines. Aiming to support diverse design goals and evaluation methods in XAI research, after a thorough review of XAI related papers in the fields of machine learning, visualization, and human-computer interaction, we present a categorization of XAI design goals and evaluation methods. Our categorization presents the mapping between design goals for different XAI user groups and their evaluation methods. From our findings, we develop a framework with step-by-step design guidelines paired with evaluation methods to close the iterative design and evaluation cycles in multidisciplinary XAI teams. Further, we provide summarized ready-to-use tables of evaluation methods and recommendations for different goals in XAI research.
Article
The provision of feedback about individual electricity consumption is a widely used approach to promote pro-environmental behavior. This form of feedback typically invokes social comparisons by informing households about their aggregate electricity consumption relative to others. While previous research has shown that aggregate consumption feedback translates into significant energy savings, the potential for further reductions may remain untapped because households lack knowledge about their appliance energy consumption patterns. In this paper, we present evidence from a field experiment, where we provide residents with feedback about their electricity consumption, specific to a high-energy use appliance (i.e. air-conditioner). We provide the relevant social norm information by varying the reference group of each resident. We find that our appliance-specific feedback is a powerful tool to curb electricity consumption. Residents significantly reduce their average air-conditioning usage by 17% in our treatment groups. Notwithstanding, our effects are not driven by comparative feedback with respect to different reference groups. We interpret this as encouraging evidence to promote the use of appliance-specific feedback to realize energy savings.
Article
Metahuman systems are new, emergent, sociotechnical systems where machines that learn join human learning and create original systemic capabilities. Metahuman systems will change many facets of the way we think about organizations and work. They will push information systems research in new directions that may involve a revision of the field’s research goals, methods and theorizing. Information systems researchers can look beyond the capabilities and constraints of human learning toward hybrid human/machine learning systems that exhibit major differences in scale, scope and speed. We review how these changes influence organization design and goals. We identify four organizational level generic functions critical to organize metahuman systems properly: delegating, monitoring, cultivating, and reflecting. We show how each function raises new research questions for the field. We conclude by noting that improved understanding of metahuman systems will primarily come from learning-by-doing as information systems scholars try out new forms of hybrid learning in multiple settings to generate novel, generalizable, impactful designs. Such trials will result in improved understanding of metahuman systems. This need for large-scale experimentation will push many scholars out from their comfort zone, because it calls for the revitalization of action research programs that informed the first wave of socio-technical research at the dawn of automating work systems.