Available via license: CC BY 4.0
Content may be subject to copyright.
Cognitive Dissonance can Increase Consumption:
On the effects of Inaccurate Explanations on User
Behavior in Video-On-Demand Platforms
Marcel Hauck ( mhauck@uni-mainz.de )
Johannes Gutenberg University Mainz
Sven Pagel
Mainz University of Applied Sciences
Franz Rothlauf
Johannes Gutenberg University Mainz
Research Article
Keywords: recommender systems, cognitive dissonance, inaccurate explanations, usage behavior,
randomized controlled trial
Posted Date: September 29th, 2023
DOI: https://doi.org/10.21203/rs.3.rs-3377030/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License
Additional Declarations: No competing interests reported.
Cognitive Dissonance can Increase Consumption:
On the effects of Inaccurate Explanations on User
Behavior in Video-On-Demand Platforms
Marcel Hauck 1*, Sven Pagel2and Franz Rothlauf1
1*Johannes Gutenberg University Mainz, Germany.
2Mainz University of Applied Sciences, Germany.
*Corresponding author(s). E-mail(s): mhauck@uni-mainz.de;
Contributing authors: sven.pagel@hs-mainz.de;rothlauf@uni-mainz.de;
Abstract
On most video-on-demand platforms, recommender systems help users find rele-
vant content. Many recommender systems can provide explanations (e.g., short
texts) on why they make recommendations for an individual user. Existing lit-
erature assumes that explanations are accurate, meaning that they align with
the recommended content. Thus, the effect of inaccurate explanations that do
not fit the displayed content is unclear. In this large-scale, real-world random-
ized controlled trial, users of a major Public Service Media provider received
recommendations either without explanations (baseline), with accurate expla-
nations (control), or with some inaccurate explanations (treatment). For users
who experience only a low percentage of inaccurate explanations, usage (click-
through rate) is similar to the case where all explanations are accurate; for users
who experience a higher percentage of inaccurate explanations, usage signifi-
cantly increases (up to +7.8%). For the extreme case, where most explanations
are inaccurate, usage significantly decreases (down to −15.8%). The increase
in usage (click-through rate) when experiencing some inaccurate explanations
can be explained by Festinger’s theory of cognitive dissonance. It postulates
that users confronted with inconsistent cognitions try to remove dissonant cog-
nitions by investigating the reason for a dissonant recommendation through
clicking on it. As expected, usage decreases if most of the shown explanations are
inaccurate. The findings demonstrate the benefits of online user studies, which
allow for deeper insights into the interplay between recommender systems and
human users. Furthermore, such results that may seem at first counter-intuitive
for the recommender system community can be explained when drawing on
well-established theories explaining human behavior.
1
Keywords: recommender systems, cognitive dissonance, inaccurate explanations,
usage behavior, randomized controlled trial
1 Introduction
To help users find relevant content while avoiding “information overload”, compa-
nies use recommender system (RS) to automatically filter their content in areas such
as video-on-demand (VOD), e-commerce, or news (Ricci et al.,2015). Popular video
platforms have thousands of videos in their catalog available, all of which could poten-
tially be presented to the users. For example, the largest German Public Service Media
(PSM) providers ARD and ZDF together have more than 250,000 available videos
(ARD,2021); Netflix offers only around 6,000 videos (Comparitech,2023). RS filter
this catalog of content to automatically create recommendations for each individual
user (Goldberg et al.,1992). To make these automatic filtering decisions comprehen-
sible, companies such as YouTube or Instagram display (local) explanations of the
recommendations on the user’s front end.
Explanations in RS can help users to decide about the relevance of recommen-
dations (Tintarev and Masthoff,2015;Millecamp et al.,2022) and therefore have an
influence on usage (Herlocker et al.,2000;Tintarev and Masthoff,2012;Zhang et al.,
2014;Zhao et al.,2019;Millecamp et al.,2019;Tran et al.,2021;Guesmi et al.,
2021;Xian et al.,2021;Balog et al.,2023). It is well known that explanations can
significantly increase usage (Click-Through Rate (CTR)). Despite there are studies
that find that unexpectedness, novelty or diversity of recommendations can lead to
increased user satisfaction (Matt et al.,2014;Kunaver and Poˇzrl,2017;Song et al.,
2019;Castells et al.,2022), all previous studies on explanations assumed that the
shown explanations are accurate and fit well to the corresponding recommendations.
The goal of this paper is to study the effect of inaccurate explanations on usage
behavior. Based on Festinger’s human behavioral theory of cognitive dissonance (Fes-
tinger,1957), we assume that providing some inaccurate explanations can lead to
increased usage. The theory postulates that dissonant cognitions—in our case a mis-
match between correct recommendations and inaccurate explanations—pushes the
user to reduce this dissonance. As a result, users try to remove dissonant cognitions
by taking a closer look at given recommendations (with a click), adding new cog-
nitions that are consonant with already existing ones (Festinger,1957). We expect
that this effect leads to an increased usage (CTR) of recommendations if the per-
centage of inaccurate explanations is low but still noticeable. For the extreme case,
where most explanations are inaccurate, users are expected to be unsatisfied with the
recommender leading to a reduced usage (Castells et al.,2022).
To validate our expectation, we perform a controlled real-world experiment with
a large VOD platform using a state-of-the-art hybrid RS with three groups: First,
a baseline group that does not see any explanations. Second, a control group which
sees accurate explanations as used in other studies before. Third, a treatment group
that see inaccurate explanations. While keeping the recommendations from the RS
2
unchanged, we randomize features (e.g., a category) in preference-based explanations
(e.g., “you like sports”) leading to inaccurate ones (e.g., “you like news”). Static,
non-personalized explanations that belong to recommendations from two other RS
techniques (Collaborative Filtering and Statistical List) remain untouched.
Results show that inaccurate explanations lead to a significant overall increase in
usage in comparison to the baseline where users do not see any explanations. In partic-
ular, we find that inaccurate explanations significantly increases the CTR by +7.3%
in comparison to showing no explanations (baseline). Focusing on the percentage of
personalized, preference-based explanations, we find that a low number (<25%) of
inaccurate explanations does not lead to a significant change of CTR (p > 0.05). In
contrast, providing more inaccurate explanations (between 25% and 75% of all shown
explanations) leads to a significantly higher CTR than in the baseline (+7.8%) and
also than the control group (+5.1%) with accurate explanations (p≤0.05). The
extreme case, where almost all given explanation are inaccurate (>75%) leads to a
significantly lower CTR (−15.8%) than the control group with accurate explanations.
We observe the difference between the groups not only in the early sessions of a user
in the experiment, but the differences also exists in later user sessions.
Thus, providing some inaccurate explanations leads to more user clicks and has no
negative effect on viewing time. The mismatch between accurate recommendation and
inaccurate explanation triggers a cognitive dissonance, which leads to the significantly
increased usage. However, this mechanism only works if the percentage of inaccurate
explanations is not too high, as in the extreme case users do not trust the recommender
anymore, which leads to a decreased usage (CTR) of recommendations (Komiak and
Benbasat,2006;Pu and Chen,2007;Shani and Gunawardana,2011).
2 Related work
We summarize related work on RS and explanations in RS. Furthermore, we dis-
cuss Festinger’s theory of cognitive dissonance and how it applies to inaccurate
explanations in RS.
2.1 Recommender Systems
RS can create values for users as well as for the organization that provides the recom-
mendations (Jannach and Zanker,2022). User value can be generated, for example,
by reducing the need for users to search for appropriate content or by enabling them
to make effective decisions. For the recommendation providers, the value of a RS can
be measured in CTR or customer retention. Adomavicius et al. (2013) found evidence
that RS can significantly influence user’s interaction with the system. For example,
RS accounted for 60% of all video clicks on YouTube’s home page in 2010 (Davidson
et al.,2010) or about 80% of the consumption time on Netflix in 2015 (Gomez-Uribe
and Hunt,2015). These values depend on the quality of recommendations, which
largely depends on used RS techniques.
RS usually generate either personalized or non-personalized recommendations
(Ricci et al.,2022). Personalized recommendations use either implicit (e.g, clicks or
3
video playbacks) or explicit (e.g., like buttons) feedback to create individual recom-
mendations for each user (Ricci et al.,2022). Personalized recommendations aim at
finding users with similar usage profile. Based on the profiles, items are recommended
that are new to the current users but known for a user with similar usage profile. Such
recommender do not require content metadata and can be used even if the metadata
is of poor quality. An example of personalized RS is the Naive Bayesian Classification
(Pronk et al.,2007) which builds a preference model from features (e.g., categories)
for each user. Non-personalized recommender create recommendations based on prop-
erties of content (e.g., movies). Relevant properties are, for example, popularity or
usage of content. Such methods do not need information about the users (e.g., their
preferences) or about the content (e.g., metadata). They can be particularly useful
when little usage information is available or the quality of the metadata is low.
RS techniques can be combined to form a hybrid recommender. Hybrid recom-
mendations are built using hybridization methods like “weighting” or “mixed” (Burke,
2002). “Weighting” uses different RS techniques to provide a single recommendation;
“mixed” presents recommendations from different RS recommender in one recommen-
dation list (Burke,2002). Hybrid approaches can help to mitigate cold start problems
(Schafer et al.,2007).
2.2 Explanations in Recommender Systems
RS are often black boxes, giving no information about why particular recommenda-
tions were given (Tintarev and Masthoff,2007). Explanations can open that black
box to the users by providing a reason for displayed recommendations. They can help
to improve the attitude of users towards the system (e.g., perceived satisfaction) and
usage behavior (e.g., clicks). An explanation can be a description that helps users
to understand the recommended item and enable them to decide about its relevance
(Tintarev and Masthoff,2015). Explanations can be generated automatically or man-
ually e.g., by crowdsourcing (Balog and Radlinski,2020;Chang et al.,2016). They
can be used for every recommended item (item-level) or as an explanation for a list of
recommendations (group-level). Depending on the type of RS, explanations can also
either be personalized or non-personalized.
Providing explanations can increase CTR and usage. For example, personalized
item-level explanations with natural language increased the CTR of song recommen-
dations in a conversational RS by 8.2% (Zhao et al.,2019). Zhang et al. (2014) found
that explanations based on personalized item-level explanation using features (e.g,
product characteristics) increase the CTR of an e-commerce website up to 34%. Expla-
nations can not only increase CTR but also financial numbers. E.g., Xian et al. (2021)
found that explanations can increase conversion (+0.080%) and revenue (+0.105%)
in an e-commerce website.
A high CTR does not necessarily mean that the content is also relevant to the
users, as the metric can be influenced e.g., by popular content (Zheng et al.,2010;
Garcin et al.,2014). Optimizing only for CTR can even have a long-term negative
impact, as consumers could feel misled, when seeing only popular, non-personalized
content or even only “clickbait headlines” (Jannach and Zanker,2022).
4
2.3 Explanation does not Match Recommendation: The Case
of Cognitive Dissonance
Usually, explanations shown to a user are accurate and fit well to the corresponding
recommendations (e.g., (Zhang et al.,2014;Zhao et al.,2019;Xian et al.,2021)). But
what if some explanations are inaccurate and do not fit well to the corresponding
recommendations? To the best of our knowledge, there is no work studying whether
and how the accuracy of the shown explanation has an effect on usage behavior.
Showing explanations that do not match the recommendations leads to cognitive
dissonance as users do not recognize a relationship between explanation and recom-
mendation. Festinger’s theory of cognitive dissonance describes the causes and effects
of cognitive dissonance (Festinger,1957;Ross et al.,2010). The theory postulates that
related cognitive pairs (such as elements of knowledge) can be consonant or dissonant
(Harmon-Jones,2019). The presence of cognitive dissonance leads to user’s internal
pressure to reduce it (Festinger,1957), as they want to avoid this state (Surendren
and Bhuvaneswari,2014). One of the proposed mechanisms to reduce the dissonance
is to add new cognitive elements that are “consonant with already existing cognition”
(Festinger,1957).
Studies based on the theory of cognitive dissonance focus on the effects of dissonant
information as well as on people’s decisions in dissonant scenarios (Harmon-Jones,
2019). When people are confronted with information that contradicts their beliefs,
they experience cognitive dissonance. If the dissonance is not resolved, it can lead
to information misunderstanding or misinterpretation (Festinger et al.,1956). In
situation with cognitive dissonances, people even justify their decisions afterwards
by removing either negative or adding positive aspects (“Free-Choice Paradigm”)
(Brehm,1956) or justifying their efforts when engaging an unpleasant activity
(“Effort-Justification Paradigm”) (Aronson and Mills,1959).
In RS, recommendations that are inconsistent with users preferences can lead
to dissonance (Schwind et al.,2011). Furthermore, the cognitive pair “recommenda-
tion” and “explanation” can be in a dissonant relation if an explanation is inaccurate
although the recommendation itself is appropriate (Figl et al.,2019). This can for
example be the case, when a descriptive text (e.g., “you like news”) does not fit to
the recommended item (e.g., a sports clip). In such a situation of cognitive disso-
nance, users could want to resolve the subconscious pressure created by the dissonance
between the shown recommendation (based on user preference) and an inaccurate (not
fitting) explanation by, e.g., clicking on the recommendation. With such a click, users
add the recommended clip as an additional cognitive element that fits the mental
model of their preferences resolving the cognitive dissonance.
Of course, cognitive dissonance can only arise when a user has some trust in a
system and still believes that the system is working correctly. In the extreme case
where almost all explanations are inaccurate, i.e., they do not match the displayed
recommendations, users have no cognitive dissonance that they want to resolve, but
simply do not trust the system anymore. This naturally leads to lower system usage.
In summary, there is evidence that explanations for RS lead to higher usage
(measured as CTR) than a simple RS. Furthermore, inaccurate explanations can
5
trigger cognitive dissonance that can lead–according to existing theory–to higher sys-
tem usage and an increase in CTR. The extreme case where almost only inaccurate
explanations are displayed should lead to a decrease in CTR.
3 Experimental design
We conducted a real-world between-subjects randomized controlled trial together with
a large Public Service Media (PSM) provider located in Germany with more than 100
million monthly user sessions and over 100,000 available videos. The PSM organizes
its content on pages (e.g., home page or thematic page) in lists. Each list contains a
set of clips with thumbnail and describing texts. Most of the offered content is hand-
selected by editors. These items were therefore the same for every user and static until
updated by one of the editors. Users that are logged into the system see an additional
list of recommendations at the home page of the VOD platform.
3.1 Used Recommender System Techniques
We briefly describe the RS techniques used to create the recommendations and
explanations provided by the recommender system. The recommendation as well as
explanations are provided by a state-of-the-art RS based on a user’s implicit usage
data (e.g., click and view history). Each (logged-in) user sees either four (on mobile
devices) or six recommendations (on desktop devices). The recommendations were cre-
ated using either Naive Bayesian Classification, Collaborative Filtering, or Statistical
List.
3.1.1 Naive Bayesian Classification
The Naive Bayesian Classification approach builds an individual preference model for
each user which describes which content features (e.g., categories such as “sports”)
are preferred by a user. It leverages the available implicit feedback (e.g., clicks and
playbacks). For each preference feature, it calculates a score that indicates to what
extent the user likes or even dislikes the particular feature. The more content of a
certain feature has been clicked on or viewed by a user, the higher the score. If content
is clicked on, but playback is stopped again after a short time, the corresponding
features are rated negative. To build user features, the recommender uses content
metadata like category, title, synopsis1, and keywords.
The available metadata is validated before it is used as features by the RS.
All content has either one or no category from a limited choice of 25 possible
ones (e.g., “sports” or “news”). Naive Bayesian Classification generates personalized
feature-based recommendations and (accurate) explanations based on the user-specific
preference model. The explanation on why a recommendation is shown to a user can
therefore directly name individual features. Although both positive (e.g., “you like
sports”) and negative features (e.g., “you don’t like news”) are possible, the PSM
decided to use only positive features in the explanation texts to maintain a high level
of comprehensibility.
1A synopsis is a brief, concise summary of a story or plot.
6
3.1.2 Collaborative Filtering
Unlike Naive Bayesian Classification, Collaborative Filtering does not require user
data (e.g., preferences) or content data (e.g., metadata) (Koren et al.,2022) but relies
on information about which user has consumed or interacted with which content.
Based on implicit (e.g., usage) or explicit data (e.g., ratings), Collaborative Filter-
ing determines similar users and recommends new content that similar users watched
before. The PSM uses static explanation texts that do not contain references to
previous usage or other users (see Table 1).
3.1.3 Statistical List
The Statistical List technique creates non-personalized recommendations based on
popular content of the last seven days. Although the recommendations are the same
for all users, it leads to more diverse recommendations for a particular user. It also
helps to solve cold start problems for users who have not consumed any or very little
content so far. The explanation texts created in the process are very simple, as it just
says that the recommended content is popular on the VOD platform right now (see
Table 1).
3.1.4 Mixed Hybrid Recommendation
The recommendations provided by either Naive Bayesian Classification, Collaborative
Filtering, or Statistical List are combined in a mixed hybrid recommendation list. Only
registered and logged-in users see such non-editorial, personalized recommendations.
The percentage of shown recommendations from the three different recommender
depends on the viewing history of a user. New users see mainly recommendations from
the Statistical List, whereas experienced users see mainly recommendations created
by Naive Bayesian Classification.
1100,001200,001300,001400,001500,001600,001700,001800,001
Ranking of users according to the number of playbacks
0
1
2
3
4
5
6
Average number of displayed
recommendations by RS technique
High usage (experienced user)Low usage (new user)
Statistical List
(Ø = 1.84) Preference-based
(Ø = 3.02) Collaborative Filtering
(Ø = 1.14)
Fig. 1 Average percentage of shown recommendations for different recommender technique over the
previous cumulative playbacks (measured by rank; rank 1 indicates the highest previous cumulative
playbacks)
7
Table 1 Representative examples of used text explanations. Personal features (words in italics)
were replaced according to user-specific preferences. In the treatment group, only personal features
from the Naive Bayesian Classification were randomized to obtain inaccurate explanations
RS technique Data
basis Data Exemplary explanation
Naive Bayesian
Classification
Usage &
content Feature Our recommendation because you are interested in
“<personal feature A>” and “<personal feature B>”
Collaborative
Filtering Usage User This has been watched by others with similar interests.
Statistical List Usage Item This is popular right now.
Figure 1 plots the average number of shown recommendations from different rec-
ommender techniques over their rank of the previous cumulative number of playbacks
of the corresponding user. A lower rank indicates that the corresponding user has
viewed more content before when seeing the particular recommendations. With higher
previous cumulative number of playbacks (lower rank), a larger percentage of shown
recommendations comes from Naive Bayesian Classification. The plotted numbers are
based on a retrospective analysis assuming six recommendations per page for all users
of the VOD platform who have logged in within the last two years. This represents
the population of all possible users who could have participated in the study.
Table 1 lists examples of the shown textual explanations for each of the three RS
techniques. The messages were visually designed according to the corporate design
of the PSM and displayed individually next to a recommendation (only for logged-in
users). To achieve a high level of efficiency and effectiveness, the system created direct
feature explanations with limited level of details as recommended by Guesmi et al.
(2021).
3.2 Treatment and Control Groups
All registered user of the platform were assigned randomly to one of the three groups:
•Baseline group without explanations: All users in this group saw the website as
usual without any explanations.
•Control group with accurate explanations: All users in this group had accurate
explanation texts next to each non-editorial recommendation. The explanations
were automatically created by the RS and fitted to the corresponding recommen-
dation.
•Treatment group with inaccurate explanations: All users in this group also saw
explanation texts next to the shown non-editorial recommendations. For all expla-
nations suggested by the Naive Bayesian Classification, the accurate explanation
features (e.g., “sports”) were replaced by a feature explanation randomly chosen
from 83 pre-defined categories (e.g., “news”). We ensured that we did not exchange
a feature with exactly the same (e.g., changed “sports” to “sports”). Explanations
from Collaborative Filtering and Statistical List were not modified since they had
non-personalized texts.
8
To prevent a possible biases in CTR due to unevenly distributed display of popular
content across test groups, the shown recommendations were correct in all three groups
(Zheng et al.,2010). To prevent possible Hawthorne effects (Landsberger,1958) and
to reduce cognitive biases (Arnott,2006), users did not get information about the
purpose of the study. In order to ensure a high quality when presenting explanations,
only users using one of the four major browsers (Google Chrome,Mozilla Firefox,
Apple Safari, and Microsoft Edge) were included in the study. During the study,
the VOD website got two updates (bug fixes and new features).
Before the experiment, we estimated the lower bound of required users with a
power analysis using G*Power 3.1.9.6 (Faul et al.,2007). Assuming a very small
effect size (f=.05), an αerror probability of .05 and a power level of .9 (1 −β), the
minimum sample size was 5,067 users.
3.3 Descriptive Statistics
For our real-world study, we collected click data for five weeks in June and July 2021.
Table 2 shows the descriptive statistics. To ensure high data quality, we removed
users that were assigned to multiple groups (e.g., because using multiple devices). In
total, we had 51,191 users with 241,725 sessions and 1,390,097 item views. Results
from omnibus one-way analysis of variance (ANOVA) tests showed that there were
no significant differences in sessions per users (F=.012; p=.988) and item views per
session (F= 1.088; p=.337) between the three groups.
Table 2 Descriptive statistics for behavioral data
Group Users Sessions Sessions
per user
Item
views
Item views
per session
Baseline: No explanations 17,163 81,125 4.7 471,739 5.8
Control: Accurate explanations 17,035 80,477 4.7 460,601 5.7
Treatment: Inaccurate explanations 16,993 80,123 4.7 457,757 5.7
4 Results and discussion
We study the effects of explanations on CTR. Second, we address effects on playback
time and playback percentage (of the total clip length) of clicked recommendations.
Third, we investigate how usage changes over the study period.
4.1 Effects of Displaying Explanations on CTR
We investigate the effect of showing explanations on CTR.CTR is defined as the
number of clicks over the number of views where one view is the display of one
single item (thumbnail and describing texts). CTR is a wide-spread industry relevant
measurement for users’ behavior (Davidson et al.,2010;Gomez-Uribe and Hunt,2015;
Jannach and Zanker,2022).
Levene’s test for homogeneity of variances based on median (Ramsey and Schafer,
2013) showed that variances are not equal across the three test groups (F= 6.443; p=
9
.002). Thus, we used a robust omnibus one-way Welch ANOVA (Field,2009) to test
whether group means are equal. We find significant differences between the groups
(F= 6.381; p=.002).
Given that the omnibus result is significant, we conducted pairwise post hoc Welch
t-tests with Bonferroni correction to test for significant differences between all groups
(Field,2009). Table 3 shows the results. We find that the CTR is significantly different
for the treatment group with inaccurate explanations in comparison to the baseline
group without explanations (p≤0.05).
Table 3 Pairwise post hoc Welch t-tests with Bonferroni correction for
CTR in test groups
Group Group Mean
Diff.
Std.
Error p
Baseline: No
explanations
Control: Accurate
explanations
−.0653 .0305 .097
Baseline: No
explanations
Treatment: Inaccurate
explanations
−.1089 .0306 .001
Control: Accurate
explanations
Treatment: Inaccurate
explanations
−.0436 .0306 .463
For each group, Table 4 shows the mean CTR. The last two columns show the rel-
ative mean differences. Showing accurate explanations does not significantly increase
CTR in comparison to the baseline group showing no explanations (p > 0.05). In con-
trast, users in the treatment group seeing inaccurate explanations have a significantly
higher CTR (1.61%) than users in the baseline group (+7.3%). The CTR of the treat-
ment group is not significantly higher than the CTR of users in the control group with
accurate explanations (+2.8%). In summary, showing inaccurate explanations lead to
highest CTR.
Table 4 CTR for the three groups. Significance cutoffs are:
*0.1,**0.05,***0.01
Group Mean
CTR
Diff. to
Baseline
Diff. to
Control
Baseline: No explanations 1.50%
Control: Accurate explanations 1.56% +4.3%*
Treatment: Inaccurate explanations 1.61% +7.3%*** +2.8%
4.2 Effects of Percentage of Displayed Recommendations with
Inaccurate Explanations on CTR
We study whether the percentage of recommendations with inaccurate explanations
that are seen by a user affects the CTR of a user. We assume that if a user sees only
a very low percentage of inaccurate explanations, then there is no effect on CTR as
the user does not notice the inaccurate explanations and, thus, does not experience
10
cognitive dissonance. With a higher percentage of shown inaccurate explanations,
users should experience cognitive dissonance triggering user efforts to reduce this
induced dissonance. For the extreme case when almost all explanations that are shown
to a particular user are inaccurate, we expect a negative impact on usage as users do
not trust the recommender anymore.
In our study design, only recommendations generated from the Naive Bayesian
Classification (feature-based) can be either accurate (control group) or inaccurate
(treatment group). Consequently, we split all users into three groups depending on the
percentage of recommendations that come from Naive Bayesian Classification. Group
LOW contains all users who see on average less than 25% of recommendations created by
Naive Bayesian Classification. For these users, the majority of recommendations came
from the Collaborative Filtering and the Statistical List, which have static explana-
tions (see Table 1). Thus, LOW users in the treatment group saw only a small number
of inaccurate explanations as the percentage of recommendations coming from Naive
Bayesian Classification was low. Second, the group MEDIUM contains all users where
the average percentage of seen recommendations coming from Naive Bayesian Classi-
fication is in the range between 25% and 75%. Third, group HIGH contains all users,
where more than 75% of the shown recommendations on average are from Naive
Bayesian Classification. This means, for example, for the users in the treatment group
that more than 75% of the recommendations have inaccurate explanations.
For the group LOW, Levene’s test for homogeneity of variances based on median
(Ramsey and Schafer,2013) finds that variances in this case are equal across the three
test groups (F= 2.548, p=.078). Therefore, we used an omnibus one-way ANOVA
(Field,2009) which finds no significant differences in means between the three different
groups (F= 2.548, p=.078). Table 5 presents the results of additional pairwise post
hoc Welch t-tests with Bonferroni correction. We see no significant differences between
the groups (all p-values > .05).
Table 5 CTR for group LOW with less than 25% feature-based
recommendations coming from Naive Bayesian Classification. Significance
cutoffs are: *0.1,**0.05,***0.01
Group Mean
CTR
Diff. to
Baseline
Diff. to
Control
Baseline: No explanations 1.36%
Control: Accurate explanations 1.56% +14.7%*
Treatment: Inaccurate explanations 1.46% +7.4% −6.4%
For the group MEDIUM, Levene’s test for homogeneity finds that variances are not
equal across the three test groups (F= 6.873, p=.001). A robust omnibus one-
way Welch finds significant differences between the test groups (F= 6.6354, p=
.001). Table 6 presents the results of additional pairwise post hoc Welch t-tests with
Bonferroni correction. There are significant differences between the treatment and
control group (+5.1%) as well as between the baseline group and treatment group
(+7.8%).
11
Table 6 CTR for group MEDIUM with between 25% and 75% feature-based
recommendations coming from Naive Bayesian Classification. Significance
cutoffs are: *0.1,**0.05,***0.01
Group Mean
CTR
Diff. to
Baseline
Diff. to
Control
Baseline: No explanations 1.53%
Control: Accurate explanations 1.57% +2.65%
Treatment: Inaccurate explanations 1.65% +7.8%*** +5.1%**
For the third group HIGH, Levene’s test finds that variances are equal across the
three test groups (F= 2.855, p=.058). An omnibus one-way ANOVA finds no
significant differences between all groups (F= 2.855; p=.058). Table 7 presents the
results of additional pairwise post hoc Welch t-tests with Bonferroni correction. For
the treatment group, the CTR is significantly lower than in the control group. Thus,
showing a high percentage of inaccurate explanations reduces CTR.
Table 7 CTR for group HIGH with more than 75% feature-based
recommendations. Significance cutoffs are: *0.1,**0.05,***0.01
Group Mean
CTR
Diff. to
Baseline
Diff. to
Control
Baseline: No explanations 1.43%
Control: Accurate explanations 1.58% +10.5%
Treatment: Inaccurate explanations 1.33% −7.0% −15.8%**
Finally, we study how CTR depends on the percentage of seen recommendations
coming from Naive Bayesian Classification. As a robustness check, we split all users
not only in three groups LOW,MEDIUM, and HIGH, but in five groups depending on
the percentage of seen recommendations coming from Naive Bayesian Classification.
For our three different test groups (baseline, treatment, and control), Figure 2 plots
the average CTR over the percentage of displayed personalized recommendations. We
observe similar results as when splitting all users in the three groups LOW,MEDIUM, and
HIGH. There is no effect on CTR when users experience a low percentage of inaccurate
explanations for recommendations. For users who experience a higher percentage of
inaccurate explanations, usage significantly increases when showing inaccurate expla-
nations in comparison to showing accurate explanations. For the extreme case, where
most explanations are inaccurate, usage significantly decreases.
12
(-0.01, 0.2] (0.2, 0.4] (0.4, 0.6] (0.6, 0.8] (0.8, 1.0]
Average percentage of personalized recommendations
1.30%
1.40%
1.50%
1.60%
1.70%
1.80%
Click-through rate
Baseline: No explanations Control: Accurate explanations Treatment: Inaccurate explanations
Fig. 2 Average CTR per percentage of personalization in the three test groups
4.3 Effects of Explanations on Playback Time
We study whether displaying explanations has an impact on playback behavior. For
each clicked recommendation, we record the playback time in minutes as well as the
viewed percentage of the clip length. Table 8 shows mean values of playback time for
the different test groups. We tested whether the mean values of the test groups are the
same. Both, the one-way ANOVA for playback time in minutes (F=.079; p=.925)
and for percentage of the clip length (F=.377; p=.687) are not significant. Thus,
playback time is not significantly different between the different groups.
Table 8 Playback time for test groups. Significance cutoffs are: *0.1,**0.05,***0.01
Group
Playback
time (in
minutes)
Diff. to
Baseline
Playback time (in
percentage of the
clip length)
Diff. to
Baseline
Baseline: No explanations 26m 24s 36.2%
Control: Accurate explanations 26m 31s +0.5% 36.2% +0.1%
Treatment: Inaccurate explanations 26m 37s +0.9% 36.7% +1.5%
We also tested whether there are differences in playing time for the three different
groups LOW,MEDIUM, and HIGH (see subsection 4.2). The one-way Welch ANOVA for
all subgroups is not significant (p > 0.05) for playback (time in minutes as well
as percentage of the clip length). Pairwise post hoc Welch t-tests with Bonferroni
confirmed that playback times are not different for the different test groups (detailed
results are not shown here).
4.4 Novelty Effect of Inaccurate Explanations
The novelty effect describes the observation that users usually react strongest when
confronted with new or unfamiliar situations (Li et al.,2020;Li,2021). Once a user
is used to the situation, the effect disappears.
We want to study the novelty effect of both showing accurate and inaccurate
explanations. To study the novelty effect, we analyzed sessions from the existing click
behavior of a user. A single session contains all related actions of a user (e.g., opening
a page, clicking on an element or watching a clip). A session ends if no event occurs
for 30 minutes.
13
Most of the users in the study have been registered on the platform before the
study (in some cases several years ago). During the five-week experiment, users had a
median of five individual sessions. The minimum number of sessions of a user is one,
the maximum number of sessions for a user is 52 (last percent of extreme outliers
removed). 70% of the users had less than 10 sessions, 90% of the users had less than
21 sessions within the five weeks.
For the three different groups, Figure 3 plots the average CTR per session over the
number iof the user session. Each value is averaged over all clicks of a user in session i.
We use a moving average over two sessions to better identify the trend of each group.
Furthermore, Figure 3 plots the number of users (averaged over the three groups as the
decrease of users over iis very similar in the three groups) with at least iuser sessions.
As expected, the number of users strongly decreases with a higher number of sessions.
The plot shows a slight decrease of CTR over the number of sessions. This means
that users with a higher number of sessions in these five-week period click slightly
fewer recommendations in later sessions. This trend can not only be observed for the
treatment and control group, but also the baseline group (unchanged VOD platform).
Such changes in CTR are common on the platform mainly due to seasonal effects. One
should be aware that the values for the average CTR become less representative with
a higher number of sessions, as the number of users with a higher number of sessions
strongly decreases. For example, in each of the three groups around 10,000 users have
two or more sessions, however, only around 2,000 users have more than 10 sessions.
As a more detailed analysis, we study how CTR depends on the number of sessions.
For the three different test groups, we present results for the average CTR observed
in the first seven user sessions (Table 9), for the average CTR observed between user
session 8 to 14 (Table 10) and for the average CTR observed between session 15 to
21 (Table 11). In analogy to subsection 4.1, we find a significant higher CTR when
showing inaccurate explanation in comparison to the baseline in the first seven user
sessions. Furthermore, we always observe higher CTR for the treatment group in
comparison to the baseline as well as control group. However, for a higher number of
sessions these differences are not significant which is mainly due to the low number of
users with a higher number of sessions (e.g., only around 3,000 users have more than
7 sessions).
Table 9 Average CTR in the first seven sessions. Significance cutoffs are:
*0.1,**0.05,***0.01
Group Mean
CTR
Diff. to
Baseline
Diff. to
Control
Baseline: No explanations 1.68%
Control: Accurate explanations 1.74% +3.6%
Treatment: Inaccurate explanations 1.78% +6.0%** +2.3%
We also evaluated the data regarding a possible novelty effect in playbacks.
Figure 4 plots the average playback times of clicked recommendations (in seconds)
over the number iof a session. Visual inspection as well as statistical tests (not shown
here) find no changes in playback times over the number of sessions.
14
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Session number
0.60
0.80
1.00
1.20
1.40
1.60
1.80
2.00
Recommendation click-through rate [%]
0
2,000
4,000
6,000
8,000
10,000
Average number of users
Baseline: No explanations
(Ø 1.56 %; n=17,163)
Average number of users
Control: Accurate explanations
(Ø 1.62 %; n=17,035)
Notes:
Moving average = 2 sessions
Treatment: Inaccurate explanations
(Ø 1.67 %; n=16,993)
Confidence interval for
baseline with 0.95 % confidence
Fig. 3 Average CTR over session number for the three test groups. Average number of users in each
test group over session number
Table 10 Average CTR in sessions 8 to 14. Significance cutoffs are:
*0.1,**0.05,***0.01
Group Mean
CTR
Diff. to
Baseline
Diff. to
Control
Baseline: No explanations 1.32%
Control: Accurate explanations 1.39% +4.8%
Treatment: Inaccurate explanations 1.46% +10.1% +5.1%
Table 11 Average CTR in session 15 to 21. Significance cutoffs are:
*0.1,**0.05,***0.01
Group Mean
CTR
Diff. to
Baseline
Diff. to
Control
Baseline: No explanations 1.15%
Control: Accurate explanations 1.14% −1.0%
Treatment: Inaccurate explanations 1.26% +9.0% +10.1%
In summary, we observe no novelty effect on either CTR or playback time. We
observe a slight seasonal reduction of CTR with a higher number of sessions, which
can be observed in the baseline group as well as control and treatment group. For a
higher number of sessions, there are no unexpected changes in the differences in CTR
between the different groups. As expected, differences between the groups become not
significant anymore for a higher number of sessions as the number of users with a
higher number of sessions gets low.
15
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Session number
1,000
1,200
1,400
1,600
1,800
2,000
2,200
Recommendation playback duration [s]
0
200
400
600
800
1,000
Average number of users
Baseline: No explanations
(Ø 00:26:31; n=3,698)
Average number of users
Control: Accurate explanations
(Ø 00:26:32; n=3,815)
Notes:
Moving average = 2 sessions
Treatment: Inaccurate explanations
(Ø 00:26:37; n=3,797)
Confidence interval for
baseline with 0.95 % confidence
Fig. 4 Average playback time per session over session number
5 Conclusions and Future Research
This paper studies the effect of explanations in recommender systems on usage behav-
ior in a large real-world randomized controlled trial on a German video-on-demand
platform. We motivated our study by Festinger’s theory of cognitive dissonance, which
states that users that experience a dissonant cognition initiate actions to resolve this
dissonance. For the case of recommender systems, inaccurate explanations that do
not fit to the corresponding recommendation cause a cognitive dissonance for the
user, which is resolved by clicking on the corresponding recommendation. As a result,
providing some inaccurate recommendations that do not match with the suggested
recommendation should increase usage. Of course, this mechanism can only work if
the percentage of inaccurate explanations is not too high while still being noticeable
by the user. In an extreme case, where almost all explanations are inaccurate, users
would not trust the recommender anymore leading to a reduced usage.
In our randomized controlled trial, the Public Service Media provider used a hybrid
recommender to provide recommendations on video clips to the user. We assigned all
logged-in users during our experimental period (five weeks) into one of three groups:
a baseline group, where the VOD platform was kept untouched (no explanations
shown), a control group with an (accurate) explanation for each recommendation, and
a treatment group, where some explanations did not fit to the corresponding (correct)
recommendation and were therefore inaccurate.
For an overall sample of more than 50,000 users, we find that providing inaccu-
rate explanations leads to a significant overall usage increase (CTR) by +7.3% in
comparison to the baseline where users do not see explanations. Furthermore, the play-
back behavior (measured in viewing time and percentage of the clip length) remains
16
unchanged. When grouping the users according to the percentage of displayed inac-
curate explanations, we find no significant differences between baseline, control, and
treatment group for users that see only a low percentage of inaccurate explanations
(<25% of all displayed recommendations). In contrast, for users where on average
between 25% and 75% of the recommendations have inaccurate explanations, the
CTR is significantly higher than in the baseline (+7.8%) and in the control group
(+5.1%). For the extreme case where most recommendations have inaccurate expla-
nations (>75%), the usage in the treatment group is significantly lower (−15.8%)
than in the control group.
The findings demonstrate the benefits of online user studies, which can give us
deeper insights into the interplay between RS and human users. The findings should
motivate researchers to draw more often on existing theories on human behavior and
use them for design and analysis of RS. We find that RS that create confusion for the
user, can cause cognitive dissonances and thereby lead to increased usage. This can
be well explained by existing theories on human behavior, although the results are at
first counter-intuitive for the classical RS community.
We see some directions for future research. Subsequent studies should look more
deeply into the mechanisms that lead to increase usage. Proper instruments may be
controlled user studies where user’s perception of the RS is measured with special-
ized questionnaires during interaction with the system. Second, in our study design
we used minimalistic text explanations without the option to see more detailed expla-
nations. There should be additional research to control the effects of more detailed
(e.g., graphical) explanations.
17
Statements and Declarations
•Funding: This work was supported by the European Regional Development Fund,
co-financed by the Ministry of Science and Health of Rhineland-Palatinate with
grant 84003544.
•Competing Interests — Employment: Marcel Hauck reports a relationship
with the Public Service Media where the study was conducted that includes:
Employment after the study finished. There were no influences of the Public Service
Media on the study.
•Data: The data that support the findings of this study are not openly available
due to reasons of sensitivity and are available from the corresponding author upon
reasonable request.
•Compliance with Ethical Standards: The research involved human partici-
pants, all of whom gave informed consent to randomized trials during account
creation.
•Author Contributions Statement
– Marcel Hauck: Conceptualization, Methodology, Software, Formal analysis, Inves-
tigation, Resources, Data Curation, Writing — Original Draft, Visualization,
Project administration
– Sven Pagel: Conceptualization, Writing — Review & Editing, Supervision,
Project administration, Funding acquisition
– Franz Rothlauf: Conceptualization, Methodology, Validation, Writing — Review
& Editing, Supervision
References
Tintarev, N., Masthoff, J.: A Survey of Explanations in Recommender Systems. In:
2007 IEEE 23rd International Conference on Data Engineering Workshop, pp.
801–810. IEEE, Istanbul, Turkey (2007). https://doi.org/10.1109/ICDEW.2007.
4401070
Zhang, Y., Lai, G., Zhang, M., Zhang, Y., Liu, Y., Ma, S.: Explicit factor mod-
els for explainable recommendation based on phrase-level sentiment analysis. In:
Proceedings of the 37th International ACM SIGIR Conference on Research & Devel-
opment in Information Retrieval - SIGIR ’14, pp. 83–92. ACM Press, Gold Coast,
Queensland, Australia (2014). https://doi.org/10.1145/2600428.2609579
Millecamp, M., Conati, C., Verbert, K.: “Knowing me, knowing you”: Personalized
explanations for a music recommender system. User Modeling and User-Adapted
Interaction (2022) https://doi.org/10.1007/s11257-021-09304-9
Ricci, F., Rokach, L., Shapira, B.: Recommender Systems: Introduction and Chal-
lenges. In: Ricci, F., Rokach, L., Shapira, B. (eds.) Recommender Systems
Handbook, pp. 1–34. Springer US, Boston, MA (2015). https://doi.org/10.1007/
978-1-4899-7637-6 1
18
ARD: ARD und ZDF bauen ihre Mediatheken zu einem gemeinsamen
Streaming-Netzwerk aus. https://www.ard.de/die-ard/wie-sie-uns-erreichen/ard-
pressemeldungen/2021/06-21-ARD-und-ZDF-bauen-Mediatheken-zu-Streaming-
Netzwerk-aus100/ (2021)
Comparitech: Netflix Data 2023: Cost per Title by Country & Plan.
https://www.comparitech.com/blog/vpn-privacy/countries-netflix-cost/ (2023)
Goldberg, D., Nichols, D., Oki, B.M., Terry, D.: Using collaborative filtering to weave
an information tapestry. Communications of the ACM 35(12), 61–70 (1992) https:
//doi.org/10.1145/138859.138867
Tintarev, N., Masthoff, J.: Explaining Recommendations: Design and Evaluation. In:
Ricci, F., Rokach, L., Shapira, B. (eds.) Recommender Systems Handbook, pp. 353–
382. Springer US, Boston, MA (2015). https://doi.org/10.1007/978-1-4899-7637-6
10
Herlocker, J.L., Konstan, J.A., Riedl, J.: Explaining collaborative filtering recom-
mendations. In: Proceedings of the 2000 ACM Conference on Computer Supported
Cooperative Work - CSCW ’00, pp. 241–250. ACM Press, Philadelphia, Pennsylva-
nia, United States (2000). https://doi.org/10.1145/358916.358995
Tintarev, N., Masthoff, J.: Evaluating the effectiveness of explanations for recom-
mender systems: Methodological issues and empirical studies on the impact of
personalization. User Modeling and User-Adapted Interaction 22(4-5), 399–439
(2012) https://doi.org/10.1007/s11257-011-9117-5
Zhao, G., Fu, H., Song, R., Sakai, T., Chen, Z., Xie, X., Qian, X.: Personalized
Reason Generation for Explainable Song Recommendation. ACM Transactions on
Intelligent Systems and Technology 10(4), 1–21 (2019) https://doi.org/10.1145/
3337967
Millecamp, M., Naveed, S., Verbert, K., Ziegler, J.: To explain or not to explain: The
effects of personal characteristics when explaining feature-based recommendations
in different domains. In: Proceedings of the 6th Joint Workshop on Interfaces and
Human Decision Making for Recommender Systems (2019)
Tran, T.N.T., Le, V.M., Atas, M., Felfernig, A., Stettinger, M., Popescu, A.: Do
Users Appreciate Explanations of Recommendations? An Analysis in the Movie
Domain. In: Fifteenth ACM Conference on Recommender Systems, pp. 645–650.
ACM, Amsterdam Netherlands (2021). https://doi.org/10.1145/3460231.3478859
Guesmi, M., Chatti, M.A., Vorgerd, L., Joarder, S., Ain, Q.U., Ngo, T., Zumor,
S., Sun, Y., Ji, F., Muslim, A.: Input or Output: Effects of Explanation Focus
on the Perception of Explainable Recommendation with Varying Level of Details.
In: IntRS’21: Joint Workshop on Interfaces and Human Decision Making for
Recommender Systems, p. 18 (2021)
19
Xian, Y., Zhao, T., Li, J., Chan, J., Kan, A., Ma, J., Dong, X.L., Faloutsos, C.,
Karypis, G., Muthukrishnan, S., Zhang, Y.: EX3: Explainable Attribute-aware
Item-set Recommendations. In: Fifteenth ACM Conference on Recommender Sys-
tems, pp. 484–494. ACM, Amsterdam Netherlands (2021). https://doi.org/10.1145/
3460231.3474240
Balog, K., Radlinski, F., Petrov, A.: Measuring the Impact of Explanation Bias:
A Study of Natural Language Justifications for Recommender Systems. In: CHI
Conference on Human Factors in Computing Systems (CHI EA ’23) (2023)
Matt, C., Hess, T., Benlian, A., Weiß, C.: Escaping from the Filter Bubble? The Effects
of Novelty and Serendipity on Users’ Evaluations of Online Recommendations. In:
Thirty Fifth International Conference on Information Systems, Auckland, p. 19
(2014)
Kunaver, M., Poˇzrl, T.: Diversity in recommender systems – A survey. Knowledge-
Based Systems 123, 154–162 (2017) https://doi.org/10.1016/j.knosys.2017.02.009
Song, Y., Sahoo, N., Ofek, E.: When and How to Diversify—A Multicategory Util-
ity Model for Personalized Content Recommendation. Management Science (2019)
https://doi.org/10.1287/mnsc.2018.3127
Castells, P., Hurley, N., Vargas, S.: Novelty and Diversity in Recommender Sys-
tems. In: Ricci, F., Rokach, L., Shapira, B. (eds.) Recommender Systems Hand-
book, pp. 603–646. Springer US, New York, NY (2022). https://doi.org/10.1007/
978-1-0716-2197-4 16
Festinger, L.: A Theory of Cognitive Dissonance. A Theory of Cognitive Dissonance.,
p. 291. Stanford University Press, Stanford, CA (1957)
Komiak, Benbasat: The Effects of Personalization and Familiarity on Trust and
Adoption of Recommendation Agents. MIS Quarterly 30(4), 941 (2006) https:
//doi.org/10.2307/25148760 10.2307/25148760
Pu, P., Chen, L.: Trust-inspiring explanation interfaces for recommender systems.
Knowledge-Based Systems 20(6), 542–556 (2007) https://doi.org/10.1016/j.knosys.
2007.04.004
Shani, G., Gunawardana, A.: Evaluating Recommendation Systems. In: Ricci,
F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Hand-
book, pp. 257–297. Springer US, Boston, MA (2011). https://doi.org/10.1007/
978-0-387-85820-3 8
Jannach, D., Zanker, M.: Value and Impact of Recommender Systems. In: Ricci,
F., Rokach, L., Shapira, B. (eds.) Recommender Systems Handbook, pp. 519–546.
Springer US, New York, NY (2022). https://doi.org/10.1007/978-1-0716-2197-4 14
20
Adomavicius, G., Bockstedt, J.C., Curley, S.P., Zhang, J.: Do Recommender Sys-
tems Manipulate Consumer Preferences? A Study of Anchoring Effects. Information
Systems Research 24(4), 956–975 (2013) https://doi.org/10.1287/isre.2013.0497
Davidson, J., Livingston, B., Sampath, D., Liebald, B., Liu, J., Nandy, P., Van Vleet,
T., Gargi, U., Gupta, S., He, Y., Lambert, M.: The YouTube video recommendation
system. In: Proceedings of the Fourth ACM Conference on Recommender Systems -
RecSys ’10, p. 293. ACM Press, Barcelona, Spain (2010). https://doi.org/10.1145/
1864708.1864770
Gomez-Uribe, C.A., Hunt, N.: The Netflix Recommender System: Algorithms, Busi-
ness Value, and Innovation. ACM Transactions on Management Information
Systems 6(4), 1–19 (2015) https://doi.org/10.1145/2843948
Ricci, F., Rokach, L., Shapira, B.: Recommender Systems: Techniques, Applications,
and Challenges. In: Ricci, F., Rokach, L., Shapira, B. (eds.) Recommender Systems
Handbook, pp. 1–35. Springer US, New York, NY (2022). https://doi.org/10.1007/
978-1-0716-2197-4 1
Pronk, V., Verhaegh, W., Proidl, A., Tiemann, M.: Incorporating user control into
recommender systems based on naive bayesian classification. In: Proceedings of the
2007 ACM Conference on Recommender Systems - RecSys ’07, p. 73. ACM Press,
Minneapolis, MN, USA (2007). https://doi.org/10.1145/1297231.1297244
Burke, R.: Hybrid Recommender Systems: Survey and Experiments. User Modeling
and User-Adapted Interaction 12(4), 331–370 (2002) https://doi.org/10.1023/A:
1021240730564
Schafer, J.B., Frankowski, D., Herlocker, J., Sen, S.: Collaborative Filtering Recom-
mender Systems. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) The Adaptive
Web vol. 4321, pp. 291–324. Springer Berlin Heidelberg, Berlin, Heidelberg (2007).
https://doi.org/10.1007/978-3-540-72079-9 9
Balog, K., Radlinski, F.: Measuring Recommendation Explanation Quality: The Con-
flicting Goals of Explanations. In: Proceedings of the 43rd International ACM
SIGIR Conference on Research and Development in Information Retrieval, pp.
329–338. ACM, Virtual Event China (2020). https://doi.org/10.1145/3397271.
3401032
Chang, S., Harper, F.M., Terveen, L.G.: Crowd-Based Personalized Natural Language
Explanations for Recommendations. In: Proceedings of the 10th ACM Confer-
ence on Recommender Systems - RecSys ’16, pp. 175–182. ACM Press, Boston,
Massachusetts, USA (2016). https://doi.org/10.1145/2959100.2959153
Zheng, H., Wang, D., Zhang, Q., Li, H., Yang, T.: Do clicks measure recommendation
relevancy?: An empirical user study. In: Proceedings of the Fourth ACM Confer-
ence on Recommender Systems - RecSys ’10, p. 249. ACM Press, Barcelona, Spain
21
(2010). https://doi.org/10.1145/1864708.1864759
Garcin, F., Faltings, B., Donatsch, O., Alazzawi, A., Bruttin, C., Huber, A.: Offline
and online evaluation of news recommender systems at swissinfo.ch. In: Proceedings
of the 8th ACM Conference on Recommender Systems - RecSys ’14, pp. 169–176.
ACM Press, Foster City, Silicon Valley, California, USA (2014). https://doi.org/10.
1145/2645710.2645745
Ross, L., Lepper, M., Ward, A.: History of Social Psychology: Insights, Challenges, and
Contributions to Theory and Application. In: Fiske, S.T., Gilbert, D.T., Lindzey,
G. (eds.) Handbook of Social Psychology, p. 001001. John Wiley & Sons, Inc.,
Hoboken, NJ, USA (2010). https://doi.org/10.1002/9780470561119.socpsy001001
Harmon-Jones, E. (ed.): Cognitive Dissonance: Reexamining a Pivotal Theory in Psy-
chology, Second edition edn. American Psychological Association, Washington, DC
(2019)
Surendren, D., Bhuvaneswari, V.: A Framework for Analysis of Purchase Dissonance
in Recommender System Using Association Rule Mining. In: 2014 International
Conference on Intelligent Computing Applications, pp. 153–157. IEEE, Coimbatore,
India (2014). https://doi.org/10.1109/ICICA.2014.41
Festinger, L., Riecken, H.W., Schachter, S.: When Prophecy Fails. University of
Minnesota Press, Minneapolis (1956). https://doi.org/10.1037/10030-000
Brehm, J.W.: Postdecision changes in the desirability of alternatives. The Journal of
Abnormal and Social Psychology 52(3), 384–389 (1956) https://doi.org/10.1037/
h0041006
Aronson, E., Mills, J.: The effect of severity of initiation on liking for a group. The
Journal of Abnormal and Social Psychology 59(2), 177–181 (1959) https://doi.org/
10.1037/h0047195
Schwind, C., Buder, J., Hesse, F.W.: I will do it, but i don’t like it: User reactions
to preference-inconsistent recommendations. In: Proceedings of the SIGCHI Con-
ference on Human Factors in Computing Systems, pp. 349–352. ACM, Vancouver
BC Canada (2011). https://doi.org/10.1145/1978942.1978992
Figl, K., Kießling, S., Rank, C., Vakulenko, S.: Fake News Flags, Cognitive Dissonance,
and the Believability of Social Media Posts. In: ICIS 2019 Proceedings, M¨unchen,
Germany (2019)
Koren, Y., Rendle, S., Bell, R.: Advances in Collaborative Filtering. In: Ricci, F.,
Rokach, L., Shapira, B. (eds.) Recommender Systems Handbook, pp. 91–142.
Springer US, New York, NY (2022). https://doi.org/10.1007/978-1-0716-2197-4 3
Landsberger, H.A.: Hawthorne Revisited. Management and the Worker, Its Critics
22
and Developments in Human Relations in Industry. Cornell Studies in Industrial
and Labor Relations, vol. IX. Cornell University, Ithaca, N.Y. (1958)
Arnott, D.: Cognitive biases and decision support systems development: A design
science approach. Information Systems Journal 16(1), 55–78 (2006) https://doi.
org/10.1111/j.1365-2575.2006.00208.x
Faul, F., Erdfelder, E., Lang, A.-G., Buchner, A.: G*Power 3: A flexible statistical
power analysis program for the social, behavioral, and biomedical sciences. Behavior
Research Methods 39(2), 175–191 (2007) https://doi.org/10.3758/BF03193146
Ramsey, F.L., Schafer, D.W.: The Statistical Sleuth: A Course in Methods of Data
Analysis, 3rd ed edn. Brooks/Cole, Cengage Learning, Australia ; Boston (2013)
Field, A.P.: Discovering Statistics Using SPSS: (And Sex, Drugs and Rock ’n’ Roll),
3rd ed edn. SAGE Publications, Los Angeles (2009)
Li, P., Que, M., Jiang, Z., Hu, Y., Tuzhilin, A.: PURS: Personalized Unexpected
Recommender System for Improving User Satisfaction. In: Fourteenth ACM Con-
ference on Recommender Systems, pp. 279–288. ACM, Virtual Event Brazil (2020).
https://doi.org/10.1145/3383313.3412238
Li, P.: Leveraging Multi-Faceted User Preferences for Improving Click-Through Rate
Predictions. In: Fifteenth ACM Conference on Recommender Systems, pp. 864–868.
ACM, Amsterdam Netherlands (2021). https://doi.org/10.1145/3460231.3473899
23