Content uploaded by Christian Roth
Author content
All content in this area was uploaded by Christian Roth on Apr 18, 2016
Content may be subject to copyright.
Breaching Interactive Storytelling’s
Implicit Agreement:
A Content Analysis of Fa¸cade User Behaviors
Christian Roth and Ivar Vermeulen
VU University, De Boelelaan 1081, 1081HV Amsterdam, The Netherlands
roth@spieleforschung.de,i.e.vermeulen@vu.nl
Abstract. Using both manual and automatic content analysis we an-
alyzed 100 collected screen plays of 50 users of the IS system Fa¸cade,
coding the extent to which users stayed “in character”. Comparing this
measure for first and second exposure to Fa¸cade revealed that users stay
significantly less in character during second exposure. Further, related to
a set of independently collected user experience measures we found stay-
ing in character to negatively influence users’ affective responses. The re-
sults confirm the notion that the more Fa¸cade users keep to their assigned
role, the easier they become dissatisfied with the system’s performance.
As a result, users start exploring the system by acting “out of character”.
Keywords: Fa¸cade, User Behavior, User Experience, Content Analysis.
1 Introduction
Satisfaction of user expectations is a key element in successful entertainment
media [1]. Users have certain expectations on how the product will fulfill their
entertainment needs - e.g., provide arousal, flow, suspense, and/or agency. In
turn, media creators have expectations on how their product should be used in
order to be entertaining. For example, to experience suspense, movie viewers
should not fast forward to find out who ”dunnit”; to experience flow, players of
an action game should play at a level appropriate to their playing skills. Based on
these reciprocal expectations, creators and users thus form an implicit agreement
that entertainment media, if used in a proper way, will be entertaining.
For new types of media this implicit agreement holds as well. However, new
types of media may not be able to fulfill it immediately. The latter seems to be the
case with many Interactive Storytelling (IS) prototypes. IS users expect to have
an impact on story progress and outcomes, and thus to receive meaningful feed-
back regarding ”global agency” [2]. However, many IS prototypes are proofs-of-
concepts focusing mainly on technical challenges and only tentatively offer such
feedback. Research on the IS application Fa¸cade for example showed that the
system often did not understand user intentions and failed at giving meaningful
responses [3]. Another study found that Fa¸cade users were initially goal oriented,
but when realizing their limited control on story progression, increasingly tested
H. Koenitz et al. (Eds.): ICIDS 2013, LNCS 8230, pp. 168–173, 2013.
c
Springer International Publishing Switzerland 2013
Breaching Interactive Storytelling’s Implicit Agreement 169
the system’s boundaries by acting against social conventions [4]. A study on the
IS prototype EmoEmma showed low ratings on key user experience dimensions
including overall satisfaction [2]. The current paper focuses on what happens
when IS applications do not deliver the entertaining experience users expect.
IS applications attempt to offer entertainment by involving users in an in-
teractive narrative, mostly with a role in the narrative. Building on experiences
with commercial video games, users expect keeping to this role—that is, staying
“in character”—will result in meaningful system responses and an immersive IS
experience. Ironically, however, the more users behave as a natural character
in the story, the richer and more complex their inputs will be, and the more
challenging it will become for the system to provide meaningful feedback. So,
the harder the user tries to “entertain” the system, the less likely it may be to
receive an entertaining experience in return. As a consequence, users will likely
be dissatisfied and adapt their strategy to try out more unconventional behavior.
When they do so, their expectations about the system’s performance will lower,
and the performance of the system might be better, yielding “out of character”
users likely to be more satisfied and entertained. Specifically, we expect: (1) The
more IS users behave according to a pre-described role (i.e., stay “in character”),
the less meaningful the system response will be, (2) the more IS users stay in
character, the less positive their experiences will be, and (3) IS users are less
likely to stay “in character” during a second play session.
2Method
To test these expectations, we tested 50 users (17 males, 33 females; average age
M= 19.8years,SD =1.73 years) of the IS application Fa¸cade [5]. Participants
had a moderate degree of computer game literacy (M=1.78, SD =.71 on a
scale from 1−3). 10 participants had been exposed to Fa¸cade in a previous study.
Upon arrival participants were seated in one of six cabins equipped with modern
PCs, screens, and peripherals to ”test a new prototype system”. In an instruction
sheet, participants were told about their role (an old friend) and their task (to
help the characters with their relationship issues). After interacting with Fa¸cade
for an average of 18 minutes, stage plays were saved and participants filled out
an online questionnaire (av. 12m). Participants were then asked to interact with
Fa¸cade again, using the same instructions (av. 18m). Stage plays were saved
and participants filled out the same questionnaire (av. 10m). The recorded stage
plays resulted in around 29.000 lines of dialogue.
To meas u r e staying in character, we conducted manual and automated content
analysis. Stage plays were put in random order, and participant and session
numbers were blinded to prevent subjective biases in the manual coder. Rating
involved a scale from 1 to 5, assigned to each individual user utterance. One
point (not staying in character) was given for highly inappropriate behavior, for
instance repeated kissing or flirting, extreme rudeness, insulting, and repeatedly
ignoring of computer characters’ intentions. Five points (completely staying in
character) were given for highly engaged, appropriate, and on-topic interactions.
170 C. Roth and I. Vermeulen
Ratings per utterance were averaged in order to obtain an “in character” score
per user, per session.
The automatic content analysis-based measurement of users staying in charac-
ter used the text analysis software Linguistic Inquiry and Word Count (LIWC) [6].
Syntax-based indicators for staying in character were the use of question and excla-
mation marks (indicating natural language use), and (more than) 6 letters words
(indicating complex language use). The number of verbal interactions and words
per user per session were recorded as an indicatorofusereffort.Content-basedin-
dicators were the use of articles (indicating conventional language use), pronouns
(indicating social references), positive (e.g., nice, happy) and negative affect (e.g.,
hurt, ugly) words, words associated with assent (e.g., agree, okay), social (e.g.,
friend, family), cognitive (e.g., think, know), and perceptual processes (e.g., see,
hear, feel); both latter categories indicate reflection. For all indicator categories
relative prevalence (percentages of words used) were calculated. As a negative in-
dicator of staying in character, the percentage of swear words (e.g., damn, fuck)
was analyzed.
Building on prior work by the a uthors [7, 8, 9],14 user experiences were mea-
sured on 1 to 5 Likert scales with three items (unless otherwise noted): Flow (4
items, α1=.72; α2=.73), perceived system usability (α1=.85; α2=.83), pres-
ence (α1=.76; α2=.87), curiosity (α1=.76; α2=.84), aesthetic pleasantness
(α1=.83; α2=.86), pride (α1=.89; α2=.84), positive (α1=.82; α2=.87),
and negative affect (α1=.57; α2=.70). Two-item scales: satisfaction (r1=.57;
r2=.46), character believability (r1=.35; r2=.37), effectance (r1=.82;
r2=.61), suspense (r1=.47; r2=.57), enjoyment (r1=.84; r2=.85), and role
adoption/identification (r1=.37; r2=.66).
Meaningful system responses were manually coded, per utterance, on a 1 to
5 scale. One point (non-meaningful system response) was given for completely
wrong responses or ignoring user input. Five points (highly meaningful system
response) were given for highly fitting, meaningful responses to user input. Scores
per utterance were aggregated by averaging them per user, per session.
3Results
To see whether the manual coding of users staying in character is reflected in the
automated content analysis, we conducted a correlation analysis. Results for the
first session show that the more users stayed in character, the more elaborate
were their verbal interactions—a higher word count (r=.494, ρ=.000), and
more six letter words (r=.325, ρ=.023). Staying in character also correlated
with the percentage of pronouns (r=.695, ρ=.000), social words (r=.566,
ρ=.000), positive emotion words (r=.304, ρ=.034), and words reflecting
cognitive (r=.662, ρ=.000) and perception processes (r=.461, ρ=.000).
Surprisingly, given these results, we found no significant correlations between
users staying in character and the LIWC categories for the second session.
For our “ironic” prediction of a negative correlation between users staying in
character and meaningful system response we only found evidence in the second
Breaching Interactive Storytelling’s Implicit Agreement 171
session: More in character users received significantly less meaningful response
(r=−.379; ρ=.006). Further analysis revealed correlations between linguistic
indicators of users staying in character and meaningful system responses: For the
first session, more verbal interactions correlated with lower meaningful system
response (r=−.282; ρ=.047). This result was also found for the second
session (r=−.319; ρ=.025). In addition, the more words users used, the
less meaningful the system responded (r=−.319, ρ=.026). This seems to
confirm our prediction: the more users try to “entertain” the system by providing
elaborate input, the less meaningful the system responds.
To test our prediction that users will stay less in character after the first
session, we conducted a within-subject t-tests. Results confirm our prediction:
users stayed significantly less in character during the second session (M=4.09,
SD =.68 vs. M=4.52, SD =.36); t(50) = 4.28, ρ=.011.
Comparing linguistic indicators of users staying in character for the first and
second session revealed that participants used significantly more articles in the
first session (M=7.52, SD =1.9vs.M=2.69, SD =1.75); t(50) = 3.12
ρ=.007. This could indicate that in the second session, participants tended to
refrain from conventional interpersonal language use. Against our predictions,
participants used significantly more exclamations marks in the second session
(M=3.14, SD =3.6vs.M=1.66, SD =2.75); t(50) = −3.6, ρ=.001,
perhaps signaling anger or frustration with the system’s responses. Also against
our predictions, the second session showed a higher percentage of words in the
category of assent (M=5.22, SD =.48 vs. M=3.69, SD =3.19); t(50) =
−2.55, ρ=.014 and more positive emotion words (M= 14.1, SD =5.02 vs.
M= 11.66, SD =5.81); t(50) = −2.11 ρ=.041. Table 1 provides means and
standard deviations per session, for all measured variables.
Correlation analysis showed that, for the first session, the stronger users stayed
in character, the higher were their self-reported negative affect (r=.346; ρ=
.014) and perceived suspense (r=.286; ρ=.044). For the second session we
also found that staying in character leads to more negative affect (r=.344;
ρ=.013). This confirms our prediction that users who make an attempt to stay
in character have less positive user experiences.
We also correlated the linguistic indicators of users staying in character to
the self-reported user experience measures for both sessions. However, by test-
ing twice for 196 possible correlations, significant scores will occur due to chance
capitalization. We observed 16 significant correlations, which is about the chance
level. Moreover, none of the correlations were consistent over both sessions.
Therefore this analysis did not provide further evidence for our predictions.
4 Discussion
The key finding of the present study is that the more Fa¸cade users stayed “in
character”, the less meaningful system responses they received. In addition, the
more users stayed in character, the more negative affect they experienced. Pos-
sibly as a result of both, users stayed significantly less in character during the
second session.
172 C. Roth and I. Vermeulen
Table 1 . Paired Sample T-Test Comparison (N= 50)
1st session 2nd session
MSD MSD ρ
Rat ing
User staying in character 4.52 .36 4.09 .68 .011*
Meaningful system response 3.86 .55 3.90 .50 .618
Number
Verbal interactions 40.16 19.21 44.34 18.81 .320
Nonverbal interactions 11.68 16.66 22.55 18.97 .004*
User words 139.068.00 125.069.00 .357
Analyzed LIWC dict. words 95.57 3.15 95.35 2.23 .645
Percentage
Six-letter words 6.60 3.14 7.03 2.96 .480
Pronouns 22.69 6.42 24.01 3.79 .283
Articles 7.52 1.90 2.69 1.75 .007*
Swear words .23 .11 .43 .25 .578
Assent words 3.69 3.19 5.22 .482 .014*
Positive emotion words 11.66 5.81 14.15.02 .041*
Negative emotion words 2.89 1.94 2.44 1.76 .227
Social processes words 16.89 6.41 17.95 5.31 .283
Cognitive processes words 13.91 4.10 13.01 4.57 .263
Perception processes words 2.37 1.76 2.88 1.52 .125
Question marks 5.31 4.51 5.27 3.57 .957
Exclamation marks 1.66 2.74 3.14 3.60 .001*
Note: [*] significant difference at ρ<.05.
Meaningful system responses are a key condition in the implicit agreement
users and producers of interactive media make. Interestingly, in current IS sys-
tems meaningful interaction is achieved more easily if users interact less. Fa¸cade
expects interactions at specific points in the narrative, and at these moments
often succeeds in giving meaningful feedback; at other moments it often fails
in doing so. Experienced Fa¸cade users may recognize good interaction moments
and thus have better user experiences. However, more na¨ıve users will try to
intervene, take over control, or steer the conversation in a different direction.
For such users the system does not seem to give very good results. Users who
adapt their verbal behavior by writing less and simpler, following the lead of
the computer characters, experience a more coherent and meaningful narrative,
resulting in higher perceived effectance, system usability, presence, and flow.
Another interesting finding of the current research is that the manually coded
measure for staying in character was corroborated by many of its expected lin-
guistic indicators for the first session, but not for the second. It seems that in
the second session, users found other ways of staying in character than by using
Breaching Interactive Storytelling’s Implicit Agreement 173
conventional interpersonal language. The increased use of assent related words
and exclamation marks in the second session might be taken to support our
theory that some users started to change their “acting style” or at least their
interaction behavior after the first session, in which they learned that the system
does often not understand their intentions.
The results of our study are in line with the assumption of an implicit agree-
ment between users and creators of IS applications. Users that behave in charac-
ter expect the system to respond in a meaningful way. However, the current state
of IS technology is not keeping up with these expectations and therefore is likely
to yield disappointment. As a response, users adapt their in-game behavior and
find different ways to get entertained, for example by acting out of character,
behave inappropriately, or by testing the system’s boundaries and exploiting its
shortcomings. IS creators could benefit from these findings by carefully manag-
ing user expectations in order to prevent disappointment in IS’s entertainment
qualities.
References
1. Blythe, M.A., Overbeeke, K., Monk, A., Wright, P.C. (eds.): Funology: From us-
ability to user enjoyment. Human-Computer Interaction Series. Springer (2005)
2. Klimmt, C., Roth, C., Vermeulen, I., Vorderer, P., Roth, F.S.: Forecasting the Ex-
perience of Future Entertainment Technology: “Interactive Storytelling” and Media
Enjoyment. Games and Culture 7(3), 187–208 (2012)
3. Mehta, M., Dow, S., Mateas, M., MacIntyre, B.: Evaluating a conversation-centered
interactive drama. In: Proceedings of the 6th International Joint Conference on
Autonomous Agents and Multiagent Systems - AAMAS 2007, p. 1. ACM Press,
New York (2007)
4. Mitchell, A., McGee, K.: Reading again for the first time: A model of rereading in
interactive stories. In: Oyarzun, D., Peinado, F., Young, R.M., Elizalde, A., M´endez,
G. (eds.) ICIDS 2012. LNCS, vol. 7648, pp. 202–213. Springer, Heidelberg (2012)
5. Roth, C., Klimmt, C., Vermeulen, I.E., Vorderer, P.: The Experience of Interac-
tive Storytelling: Comparing Fahrenheit with Fa¸cade. In: Anacleto, J.C., Fels, S.,
Graham, N., Kapralos, B., Saif El-Nasr, M., Stanley, K. (eds.) ICEC 2011. LNCS,
vol. 6972, pp. 13–21. Springer, Heidelberg (2011)
6. Roth, C., Vermeulen, I., Vorderer, P., Klimmt, C.: Exploring replay value: shifts and
continuities in user experiences between first and second exposure to an interactive
story. Cyberpsychology, Behavior and Social Networking 15(7), 378–381 (2012)
7. Roth, C., Vermeulen, I., Vorderer, P., Klimmt, C., Pizzi, D., Lugrin, J.L., Cavazza,
M.: Playing in or out of character: user role differences in the experience of interac-
tive storytelling. Cyberpsychology, Behavior and Social Networking 15(11), 630–633
(2012)
8. Tausczik, Y.R., Pennebaker, J.W.: The Psychological Meaning of Words: LIWC
and Computerized Text Analysis Methods. Journal of Language and Social Psy-
chology 29(1), 24–54 (2010)
9. Vermeulen, I.E., Roth, C., Vorderer, P., Klimmt, C.: Measuring user responses to
interactive stories: Towards a standardized assessment tool. In: Aylett, R., Lim,
M.Y., Louchart, S., Petta, P., Riedl, M. (eds.) ICIDS 2010. LNCS, vol. 6432, pp.
38–43. Springer, Heidelberg (2010)