Content uploaded by Jamie Burton
Author content
All content in this area was uploaded by Jamie Burton on Oct 05, 2023
Content may be subject to copyright.
1
AI Feel You: Customer Experience Assessment via Chatbot Interviews
Abstract
Purpose – While customer experience (CE) is recognized as a critical determinant of business
success, both academics and managers are yet to find a means to gain a comprehensive
understanding of CE cost-effectively. We argue that the application of relevant artificial
intelligence (AI) technology could help address this challenge. Employing interactively
prompted narrative storytelling, we investigate the effectiveness of sentiment analysis (SA) on
extracting valuable CE insights from primary qualitative data generated via chatbot interviews.
Design/methodology/approach – Drawing on a granular and semantically clear framework we
developed for studying CE feelings, an AI-augmented chatbot was designed. The chatbot
interviewed a crowdsourced sample of consumers about their recalled service experience
feelings. By combining free-text and closed-ended questions, we were able to compare extracted
sentiment polarities against established measurement scales and empirically validate our novel
approach.
This is an author’s accepted version of the paper. Please cite the accepted version:
Sidaoui, K., Jaakkola, M. and Burton, J. (2020) “AI Feel You: Customer Experience
Assessment via Chatbot Interviews” Journal of Service Management, DOI:
10.1108/JOSM-11-2019-0341
Emerald allows authors to deposit their AAM under the Creative Commons Attribution Non-
commercial International Licence 4.0 (CC BY-NC 4.0). Any reuse is allowed in accordance
with the terms outlined by the licence. To reuse for commercial purposes, permission
should be sought by contacting permissions@emeraldinsight.com.
For the sake of clarity, commercial usage would be considered as, but not limited to:
o Copying or downloading AAMs for further distribution for a fee;
o Any use of the AAM in conjunction with advertising;
o Any use of the AAM by for promotional purposes by for-profit organisations;
o Any use that would confer monetary reward, commercial gain or commercial
exploitation.
Should you have any questions about Emerald’s licensing policies, please contact
permissions@emeraldinsight.com.
2
Findings – We demonstrate that SA can effectively extract CE feelings from primary chatbot
data. Our findings also suggest that further enhancement in accuracy can be achieved via
improvements in the interplay between the chatbot interviewer and SA extraction algorithms.
Research limitations/implications – The proposed customer-centric approach can help service
companies to study and better understand CE feelings in a cost-effective and scalable manner.
The AI-augmented chatbots can also help companies foster immersive and engaging
relationships with customers. Our study focuses on feelings, warranting further research on AI’s
value in studying other CE elements.
Originality/value – The unique inquisitive role of AI-infused chatbots in conducting interviews
and analyzing data in realtime, offers considerable potential for studying CE and other subjective
constructs.
Keywords Customer Experience, Customer Feelings, Sentiment Analysis, Chatbot, Artificial
Intelligence, Storytelling.
Article Classification – Research paper
3
1. Introduction
Understanding customer experience (CE), comprised of experiential elements (e.g.,
cognitive, emotional), is paramount for service organizations aiming to successfully co-create
value with their customers (McColl-Kennedy et al., 2015). As a result, CE has remained an area
of interest for both managers (McIntyre and Virzi, 2018) and researchers (Marketing Science
Institute, 2018) for more than 50 years (Lemon and Verhoef, 2016).
Despite the growth of CE research, the body of literature remains fragmented and
overlapping (Kranzbühler et al., 2018), mainly focusing on the antecedents and outcomes of CE
rather than defining and understanding it as a holistic, complex, and subjective phenomenon
(Kawaf and Tagg, 2017). Additionally, attempts to assess CE using survey approaches tend to
possess an organizational bias. Such approaches focus on aspects deemed to be important by
researchers or managers and not the elements of a context-specific experience, perceived
significant by the customer (Ordenes et al., 2014). Specifically, they fail to recognize that CE is
perceived, and thus should also be measured within the domain of an individual’s overall human
experience (Fisk et al., 2020). To address this, more appropriate approaches to study holistic CE,
spanning the “digital, physical, and social realms” of today’s digital age, are required (Bolton et
al., 2018, p. 777; Holmlund et al., 2020).
Arguably, phenomenological and introspective approaches – involving a process of
narrative inquiry via data collection mechanisms such as storytelling and interviews (Connelly
and Clandinin, 1990) – are better suited for capturing comprehensive understanding of
experiential constructs such as CE (Carù and Cova, 2003; Holbrook and Hirschman, 1982). This
is because, in storytelling, key experiential elements like thoughts, feelings, and behaviors are
4
exchanged in the form of encompassing narratives, which are useful for studying critical events
(Webster, 2007) and understanding CE in context.
While attempts have already been made to understand CE via storytelling in service
research (e.g., Gentile et al., 2007), these methods largely remain within the confines of
interviews conducted by researchers. Such qualitative approaches constitute a minority of service
research as they are more resource, time, and cost-intensive compared to more efficient but
potentially less informative quantitative methods (Benoit et al., 2017). Finding awn approach
that is both highly informative and efficient has until now seemed unattainable. However,
contemporary services firms could now attain the “best of both worlds”.
This could be achieved by employing chatbots and exploiting their ability to engage
customers via storytelling. With growing recognition of their potential in service research
(Kumar et al., 2016), chatbots equipped with artificial intelligence (AI) could automatically
extract CE from narrative conversations with customers – using a sentiment analysis (SA)
algorithm – and hence contribute to CE theory. This could provide a useful starting point for a
cost-effective service excellence strategy for service organizations (Ordenes and Zhang, 2019;
Robinson et al., 2019; Wirtz and Zeithaml, 2018).
The use of AI methods generally, and SA in particular, call for high semantic clarity (i.e.,
precise mapping to linguistic meanings). As a result, a bottom-up abductive approach was
adopted to develop a granular and semantically clear model for feelings, one of the five elements
of CE: thought, feeling, sensation, activity, and relation (De Keyser et al., 2020; Schmitt, 1999).
There are several reasons why focusing on these elements individually, and on feelings in
particular, could yield advantageous results. Firstly, there is an academic need to address the
challenges in measuring feelings in CE (Richins, 1997) as well as understanding them better
5
using empirically validated multi-methods (McColl-Kennedy et al., 2015). Feelings relating to
outcomes of service encounters are holistically connected with a person’s overall assessment of
their human experience. Thus, an improved understanding of this CE element would aid
managers to co-create service encounters with their customers to better satisfy their needs (Fisk
et al., 2020). Secondly, focusing on the CE feeling element addresses managerially-driven
research priorities of (i) establishing robust emotional connections between customers and
brands (Marketing Science Institute, 2016) and (ii) improving emotional experiences (Temkin,
2018). Tackling the entirety of the CE elements simultaneously would require multiple methods
tailored to each element and is beyond the scope of this paper.
The objective of this study is to develop and validate a novel and cost-effective approach,
employing the recognized potential of AI in the service context to gain an improved holistic
understanding of CE. Specifically, the study contributes to CE literature in three key ways, via:
(i) the development of a granular and semantically clear CE feeling model (CEFM) that is
employable using AI techniques; (ii) proposing a unique approach utilizing chatbots and SA to
extract CE (feelings) from primary interview data; and (iii) empirically validating the
effectiveness of this approach in extracting CE feelings. A detailed research agenda geared
towards exploiting the potential of the proposed approach to inform and manage CE is outlined.
The following sections present an overview of relevant CE and AI literature, details of
the AI approach used, key findings, and discussion of contributions and implications, concluding
with limitations and a research agenda for future work.
2. Literature Review
2.1 The feeling component of Customer Experience
6
There is rapid growth in recognizing that “experience is everything”, acknowledging the
importance of CE, and that companies who focus their strategy on improving it could gain a
competitive edge by bridging the experience gap (Clarke and Kinghorn, 2018). This relatively
recent growth in the significance of CE in services has been evolved from the identification of
related concepts including consumption experience (Holbrook and Hirschman, 1982) and the
experience economy (Pine and Gilmore, 1998). Understanding and measuring experience
effectively has been an ultimate goal for philosophers and researchers for some time, but it has
proven difficult in practice.
CE is often defined as a dynamic and holistic, direct or indirect, interaction between a
customer and a firm, involving elements of thoughts, feelings, activities, relations, and
sensations
1
(Lemon and Verhoef, 2016; Schmitt, 1999). The holistic nature of CE
2
renders this
phenomenon challenging to observe and gauge accurately. Consequently, many practitioners tend
to fall back on the use of dated tools such as Net Promoter Score (Reichheld, 2003) and CSAT
(Anderson and Narus, 1984) as proxies (Maklan et al., 2017). To address these shortcomings,
tools such as the EXQ multiple-item scale, which leverages product experience, outcome focus,
moments-of-truth, and peace-of-mind (Klaus and Maklan, 2012), in addition to emerging text
mining approaches (e.g., McColl-Kennedy et al., 2019) have been developed to evaluate CE.
Nevertheless, as the importance and impact of (customer) experiences in our daily lives (Fisk et
al., 2020) keep increasing, further study of this subjective and holistic concept is warranted.
1
While other elements, such as the economic and lifestyle elements (Gentile et al., 2007; Verleye, 2015)
have also been proposed, the above five elements are the most widely adopted (De Keyser et al., 2020).
2
Experience has been conceptualized as being “life itself” (Csikszentmihalyi, 1991, p. 192)
7
This study focuses on CE feelings specifically and responds to the surge of interest in
examining this particular element of experience both in academia and management (Marketing
Science Institute, 2016; McColl-Kennedy et al., 2019). Drawing on a comprehensive review of
CE literature, the experiential element of feelings has been referred to as affect (e.g., Holbrook
and Hirschman, 1982), feelings (e.g., Schmitt, 1999), emotions (e.g., Shaw and Ivens, 2002),
hedonic value (e.g., Klaus and Maklan, 2011), or mood (Bagdare and Jain, 2013). Table 1
summarizes the semantic interpretations associated with the CE feeling element.
Table 1. Consolidated hierarchical interpretations of CE feelings*
Conceptual
level
CE feeling
term used
Key studies
1
Affect
Brakus, Schmitt, & Zarantonello (2009), Csikszentmihalyi &
Larson (1987)
Feeling
Rajaobelina (2018), Klaus & Maklan (2011), Schmitt (1999)
2
Emotion
Heinonen (2018), Verhoef et al. (2009), Gentile et al. (2007)
Hedonic Value
De Ruyter, Chylinski, Mahr, & Keeling (2017), Verleye
(2015), Klaus & Maklan (2011), Otto & Ritchie (1996)
Mood
Bagdare & Jain (2013)
*Key studies (comprehensive review available from authors on request).
Accommodating the interpretations from Table 1 into a semantically precise model
requires understanding how these terms differ in meaning and how they are represented within
the fields of psychology concerned with studying subjective experience. Dividing these terms
into two hierarchical levels (as shown in Table 1) allows for a natural grouping and increased
semantic clarity. Provided that the hierarchical levels can be distinguished based on their
temporal characteristics (Fox, 2018), it is argued here that feelings and affect act as higher-level
umbrella terms (level 1) to the feeling element. Terms such as mood, emotion, and hedonic value
(level 2), on the other hand, represent a more experience-specific embodiment of feelings or
affect (Babin et al., 1994; Beedie et al., 2005; Fox, 2018).
8
From the CE literature, it is also possible to find terms, such as entertainment (Ali et al.,
2016), leisure (Bagdare and Jain, 2013), safety (Otto and Ritchie, 1996), and joy (Bagdare and
Jain, 2013), which all operate hierarchically below level 2. However, these level 3 concepts
represent specific hedonic values attained within an experience and vary depending on the
context under study (Babin et al., 1994; Holbrook, 2006). Thus, to maintain robust model
generalizability, a CE feeling model (CEFM) that focuses on levels 1 and 2 is proposed.
The CEFM (Figure 1) represents the three sub-elements (at level 2; mood, emotion, and
hedonic value) reflecting their context and temporality characteristics (temporally prolonged,
encounter specific, and context-specific) (Babin et al., 1994; Fox, 2018). Capturing all three sub-
elements simultaneously is vital for a holistic understanding of experiential feelings
(Kranzbühler et al., 2018). For instance, CE might be influenced by a customer engaging with a
service in a negative mood. Similarly, emotions detail (one or more) encounter-specific feelings
(i.e., towards firm resources and activities during the encounter) (Larivière et al., 2017; Ordenes
et al., 2014), whereas hedonic value symbolizes how customers feel after having experienced the
service. Combined, these sub-elements enable CEFM to compartmentalize different experiential
feelings throughout the customer journey (pre, mid, and post) (Lemon and Verhoef, 2016).
9
Figure 1. Conceptual framework: CE feeling model (CEFM)
2.2 Enabling CE feeling extraction with AI
The purpose of the CEFM is to provide a consolidated, holistic, and sufficiently granular
framework for CE feelings to allow technologies such as AI to map experiential data onto it. AI
is a prominent technological milestone that some claim ushered in the fourth industrial
revolution (Maynard, 2015). AI technology, along with increasing levels of global (big) data
production, generates new opportunities to advance and improve a myriad of research areas
(Boyd and Crawford, 2012; Marr, 2018). Among these are service and technology (Mende et al.,
2019; Wirtz et al., 2018), customer engagement (Kunz et al., 2017), service encounters (Larivière
et al., 2017), sales (Syam and Sharma, 2018), customer relationship management (Berry and
Linoff, 2004), and CE (Kabadayi et al., 2019; Ordenes et al., 2014; Zolkiewski et al., 2017).
Companies must however understand the benefits and deficiencies of alternative
approaches when they attempt to find the most appropriate AI technology. Each technology stack
consists of various techniques, which utilize different AI methods to achieve a specific objective.
CE
Feelings
Temporally
prolonged
Mood
Encounter
specific
Emotion
Context-
specific
Hedonic
Value
10
For instance, supervised and unsupervised machine (deep) learning methods are useful for
predicting results from either structured (e.g., tabular and transactional) or unstructured (e.g.,
captured images, audio, video, or text) datasets (Brynjolfsson and Mcafee, 2017). AI
technologies involving data mining (e.g., text analysis), for example, add to the body of big data,
which then feeds into machine learning algorithms and ultimately leads to improved prediction
accuracy (Boyd and Crawford, 2012; Ng and Wakenshaw, 2017). In a storytelling environment
based on experiential exchange via narrative, as in this study, an AI technology capable of
extracting insights from unstructured conversations is required. This technology would need to
analyze text via natural language processing to decipher not only feeling-related content but also
context and temporality (Ordenes and Zhang, 2019).
Tackling this task is challenging, specifically for advanced AI methods that rely on
secondary data sources (e.g., social media text) due to the possibility of incomplete data (i.e.,
data which might not be there) and passive data collection (i.e., there is no reciprocal data
exchange). Therefore, primary data collection efforts are deemed necessary.
2.3. Rationale for using chatbot interviewers and sentiment analysis
When companies consider their (primary) data collection techniques, they are faced with
several alternative methods. Each method possesses its unique set of strengths and weaknesses.
Finding an appropriate balance between data quality and cost-effectiveness is key. On the one
hand, traditional (face-to-face) in-depth interviews, represent a more engaging and personal
method in which the interviewer can probe interviewees for clarification or more details
(laddering). This method also provides critical supplementary data, such as facial gestures and
body language (Novick, 2008). Surveys, on the other hand, are generally less costly and can be
deployed faster and with more flexibility (not limited to time or location) while reaching a
11
broader audience. When implemented using technology (e.g., online), surveys can provide
realtime analysis of the results (Benoit et al., 2017).
Chatbots can reap the benefits of both surveys and interviews. They already interact with
customers in various service sectors such as banking and insurance and provide a cost-effective
solution targeting a broad customer base independent of time and location (Riikkinen et al.,
2018). However, by switching the more traditional chatbot role from acting as an information
source (i.e., passively answering questions) to a more inquisitive role (i.e., proactively asking
interview questions), the chatbot starts to resemble an interviewer and assimilates many of the
advantages of widely used interview and survey methods. Thus, chatbot interviews have the
potential to become an efficient and widely used AI approach capable of collecting primary data
via conducting storytelling narrative interviews that are well suited to examine subjective
constructs like CE.
In addition, chatbot technology can also handle multiformat data (i.e., text, audio/voice),
support automation (e.g., automatic transcription and translation), and could be developed to
seek required information via laddering and probing questions. It can also become more
engaging by adapting its ‘personality’ to the interviewee (i.e., it can assume a customizable
persona based on current or past conversations) featuring; eye, electrodermal activity, and facial
gesture, tracking technologies widely considered to be useful for studying emotions (De Keyser
et al., 2019; Ng and Wakenshaw, 2017).
A comparison of key CE data collection approaches (traditional interviews, surveys, and
chatbot interviews), including their potential advantages, is depicted in Table 2. From this, the
potential of chatbot interviews as an effective and efficient data collection approach of CE can be
observed.
12
Table 2. Comparison of key CE data collection approaches
Advantages
Traditional
interviews
Surveys
Chatbot
interviews
Rich data
X
X
Personal/empathetic
X
X
Engaging
X
X
Laddering and probing questions
X
A
Body language observation
X
A
Low cost
X
X
Broad reach/scalability
X
X
Fast deployment/speed
X
X
Flexible availability
X
X
Realtime analysis
X
X
Multiformat conversation availability
X
Automation
X
Adaptable personality
A
Note: “A” denotes further development potential via augmentation
Extant literature recognizes that data mining – a set of tools and techniques utilizing
statistics, artificial intelligence, and machines learning methods in order to discover meaningful
patterns and rules in a dataset – has much potential for analyzing the data collected during a
chatbot interview (Berry and Linoff, 2004, p. 7; Rygielski et al., 2002). Since chatbots can
leverage data mining techniques to extract meaning from text (Feldman and Sanger, 2006;
Ordenes and Zhang, 2019), they are thus technically capable of extracting CE feelings from
collected text responses. This ability further adds to the value of our proposed approach in
utilizing chatbots for data collection purposes.
More specifically, SA, a text mining method used to determine the overall attitudes,
opinions, and emotions within text (Humphreys and Wang, 2018), could be used to map
experiential interview feeling data to the CEFM. For instance, a question relating to a
participant’s mood could be analyzed using SA, resulting in a polarity score for that statement
13
(i.e., positive, negative, and neutral). A typical SA algorithm would compare sets of terms from
the input text against a sentiment lexicon such as WordNet, then calculate the distance between
these terms and provide a polarity score as an output (Feldman, 2013). Thus, SA has the potential
to attribute sentiment scores to experiential interview data, enabling companies to gain useful
quantitative CE insights from natural language.
3. Method
In an attempt to improve the understanding of CE, this research adopts a bottom-up
abductive approach using technologies that collect and evaluate experiential CE element data to
build up a general and context-specific notion of the CE feeling concept (Brodie et al., 2017).
Developing a one-size-fits-all technological solution to the entirety of CE is not desirable, as
studying different elements calls for different methodological strategies (e.g., SA works well for
studying feelings but less so for studying sensations, where electrodermal activity measures and
others may be more suitable). Focusing on a single CE element is also meaningful due to the
fragmented literature on CE and its elements (Kranzbühler et al., 2018; Palmer, 2010); high
levels of semantic clarity are thus required for the effective implementation (e.g., extraction of
experiential data) of the proposed approach.
Methodologically, this study consists of four inter-connected phases: data collection and
preprocessing, scale validation, sentiment analysis, and the comparison between scale and
sentiment scores. The phases are elaborated below.
3.1 Phase 1: Data collection and preprocessing
In Phase 1, data was collected via a crowdsourcing platform, followed by data
preparation, cleaning, and filtering. Crowdsourcing platforms are satisfactory ways to provide a
random pool of participants across which studies related to attitudes and behaviors can be
14
conducted (Hulland and Miller, 2018). The platform employed in this study is Prolific Academic,
which draws on a more transparent and diverse population of consumers more naïve to everyday
experimental research tasks compared to platforms such as Amazon’s MTurk (Palan and Schitter,
2018; Peer et al., 2017).
Using Prolific Academic, 200 participants were recruited to interact with a chatbot named
Marvino. Marvino was designed to ask interview-like questions about a recent service
experience with a firm of the participants’ choice. The chatbot interviewer first asked the
participants to provide a descriptive, free-text answer to an inquiry, which was immediately
followed by closed-end survey-type question(s) in the same (chatbot) conversation so that an
understanding of mood, emotion, and hedonic value was gained for comparison purposes. For
example, questions on a participant’s (recalled) mood would start by asking for a free-form
textual response, followed by questions (items) adopted from an established and validated
measurement scale (Table A II). Between each question, Marvino utilized conversational
acknowledgments (e.g., “Thank you”, “I see”) to promote a more natural storytelling experience.
In order to enhance the generalizability of the CE findings, each participant was
randomly allocated a personal experience to recall, consisting from four options. Positive and
negative shopping, and positive and negative vacation experiences (N = 48, N = 53, N = 47 and
N = 45, respectively). Next, participants were asked how well they recalled the self-chosen
service experience and were filtered out if their answer was less than 5 (i.e., “neutral”) on the 9-
point scale used. This left 193 participants in the final sample where a wide range of
demographics are represented (Table A Indices).
IBM SPSS Statistics and AMOS 25 (2017) were used to analyze the resultant data. A
missing data analysis revealed that only 2.1% of the data was missing. Little’s missing
15
completely at random (MCAR) test (Little, 1988) yielded a non-significant result (χ2 = 27.745, df
= 24, p = .271) suggesting that the absent values can be assumed to be missing entirely by
chance
3
. The multiple imputation (Rubin, 1987) approach was used, and a fully conditional
specification method (m = 10) in SPSS (2017) was employed for estimating the missing values.
3.2 Phase 2: Scale validation
In Phase 2, the reliability and validity of the measurement scales were tested from the
data related to mood, emotion, and hedonic value, as recalled by the participants. The procedure
consisted of a maximum likelihood confirmatory factor analysis (CFA) conducted in AMOS 25
(2017). The measurement scales, along with the corresponding measurement items and their
standardized loadings, relating to each feeling sub-element are shown in Table A II.
3.3 Phase 3: Sentiment analysis and model performance metrics
In Phase 3, the effectiveness of the SA algorithm on the extracted CEFM data was
validated. While there are many SA algorithms to consider, the Valence Aware Dictionary for
Sentiment Reasoning (VADER) algorithm was adopted for its efficiency and performance at
handling short and informal social media text (Hutto and Gilbert, 2014; Ribeiro et al., 2016).
Also, because VADER “was created from a generalizable, valence-based, human-curated gold
standard sentiment lexicon”, it exhibits the flexibility needed to handle human conversation
where domain-specificity would be hard to determine (Ribeiro et al., 2016, p. 7).
The performance of this algorithm is evidenced by how well it managed to predict and
score each descriptive question of the CEFM. Specifically, the scores outputted by the algorithm
were compared to their manually coded labeled counterparts. For finer sentiment granularity
3
Upon further analysis, the missing values appear to be a result of connectivity (i.e., network)
issues between the chatbot and the participants.
16
(Ordenes et al., 2017), the compound scores (-1,1) produced by VADER were normalized to
match the scale items in Table A II (i.e., sentiment score of -1 refers to the scale score of 1
“extremely negative”; sentiment score of zero refers to 5 “neutral”; and sentiment score of 1
refers to 9 “extremely positive”). The statements then underwent a binary classification where
they were split into (i) positive (scores ≥ 5) versus negative (scores < 5) cases, in line with the
compound sentiment score provided by the SA algorithm; and (ii) true versus false, depending on
whether the algorithm correctly predicted the statement sentiment (elaborated below).
To achieve this, a manual coding procedure involved the coder to first categorize the
participant’s statement into one of the five categories (i.e., “extremely negative”, “average
negative”, “neutral”, “average positive”, and “extremely positive”) corresponding to the
respective scale points of 1, 3, 5, 7, and 9. If the score of the SA algorithm fell within two scale
points of the coder’s selected statement category (e.g., if the coder scores the statement as
“average positive” and the sentiment score is 8, which is only 1 point from 7), then the coder
would denote the prediction as true. Otherwise, the prediction was denoted as false. The potential
reasons for false predictions were analyzed further to understand whether an inaccuracy was due
to the SA algorithm or the participant; this and the inter-coder reliability are discussed in the
results section.
Based on the automated (VADER) SA as well as the manual coding, each statement was
categorized into a confusion matrix (Sokolova and Lapalme, 2009), splitting the results into true
positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) depending on
their respective polarity (i.e., positive or negative classification by the algorithm and coders).
This classification allowed for the calculation of key metrics (Table 4) that give valuable
17
information about the effectiveness of the VADER SA algorithm. For a detailed account on these
measures, refer to Sokolova and Lapalme (2009, p. 430).
4. Results
4.1 Phase 2: Scale validation results
The reliability and validity analysis, along with the descriptive statistics, of the key
constructs under study are reported in Table 3. The final CFA model fitted the data well (χ2 =
49.51, df = 25, p = .001, RMSEA = .07, SRMR = .02, GFI = .95, CFI = .99, TLI = .99) (Brown,
2015; Hu and Bentler, 1999; Kline, 2016). In support of convergent validity, all relevant
construct reliabilities (CR) were above the recommended level of .70 (Fornell and Larcker,
1981). The discriminant validity of the scales was also satisfactory as the relevant Maximum
Shared Variance (MSV) indices were shown to be greater than the corresponding average
variance extracted (AVE) indices (Hair et al., 2010), and the square-roots of the AVE appeared to
be greater than the inter-construct correlations (Fornell and Larcker, 1981) as shown in Table 3.
Table 3. Descriptive statistics, correlations, and construct reliability and validity
Construct
Mean (S.D.)
CR
AVE
MSV
1
2
3
1. Experience emotion*
5.29 (2.63)
-
-
-
-
2. Experience mood
5.61 (2.42)
.95
.81
.28
.53
.90
3. Experience hedonic value
6.01 (2.66)
.98
.93
.84
.92
.52
.97
Notes: Square-root of AVE on the diagonal in bold; correlations off-diagonal
* This construct consists of one item only; hence, CR, AVE and MSV are not meaningful
4.2 Phase 3: Sentiment analysis and model performance metrics results
To determine the effectiveness of the VADER algorithm in correctly extracting the
sentiments of the CE feelings in the dataset, a confusion matrix for each of the sub-elements was
18
constructed. The inter-coder reliability kappa (Fleiss, 1971) of .95, for 60
4
data-points for three
coders across each of the CE feeling sub-elements, refers to an “almost perfect” agreement
(Landis and Koch, 1977, p. 165).
The performance of the SA algorithm for each CE feeling sub-element is portrayed in
Table 4. The results suggest that VADER is highly effective in extracting the sub-elements and
CE feelings overall. Specifically, high accuracy, precision, recall, F1-score, specificity, and AUC
scores are reported (Sokolova and Lapalme, 2009). The lowest relative scores are reported for
the extraction of hedonic value, whereas the highest effectiveness is evident for mood.
Table 4. The SA algorithm (VADER) performance on CEFM
Pos
Neg
Total
Accuracy
Precision
Recall
F1-Score
Specificity
AUC
Mood
True
110
60
170
False
4
2
6
Total
114
62
176
.97
.97
.98
.97
.94
.96
Emotion
True
104
58
162
False
5
1
6
Total
109
59
168
.96
.95
.99
.97
.92
.96
Hedonic value
True
103
52
155
False
10
0
10
Total
113
52
165
.94
.91
1.00
.95
.84
.92
CEFM
True
317
170
487
False
19
3
22
Total
336
173
502
.96
.94
.99
.97
.89
.95
Notes: The F1 harmonic mean was used as the F-score measure
CEFM refers to the customer experience feeling model, which consists of mood, emotion, and hedonic
value
4.3 Phase 4: Comparison between scale and sentiment scores
4
A full coding assignment was not deemed necessary, since VADER is already a validated SA
algorithm (Hutto and Gilbert, 2014)
19
While the findings in Phase 3 established that, in principle, the SA algorithm used is
effective for studying CE feelings, the final phase (Phase 4) set out to address two important
follow-up questions: (i) How closely the extracted sentiment scores from the AI chatbot
interview match those of the respective scale scores; and (ii) to what extent the approach’s
potential inaccuracies are due to the SA algorithm? Addressing these questions provides further
understanding regarding the performance and future development potential of this approach.
In addressing the first question, the full sample (N = 193) was used to examine the
similarities between the sentiment scores and the average measurement scale scores for each of
the feeling sub-elements. The findings are promising, as the similarity percentages are 86.88%
for mood, 84.66% for emotion, and 84.33% for hedonic value. A series of paired t-tests further
revealed that, at the 95% confidence level, the differences between the sentiment and scale
scores are statistically insignificant for mood (p = .18) and emotion (p = .07), but statistically
significant for hedonic value (p = .03).
To address the second question, post-hoc analyses with two sub-samples of the full
dataset were conducted. Cases were included in these sub-samples based on the absolute
percentage difference between the sentiment and scale scores, and whether the substantial
differences were considered to be due to the participants (i.e., input error) or the SA algorithm.
Two explicit thresholds (18.75% and 12.5%
5
) for the absolute differences were chosen; the sub-
samples thus consisted of all cases at or above (e.g., difference ≥ 12.5%) each threshold, where
errors caused by the SA algorithm were included, but input errors caused by the participants were
not.
5
These, respectively, refer to ±1.5 and ±1 scale point difference on the 9-point scale used
20
Therefore, the two sub-samples were manually coded to distinguish between input errors
(i.e., participant induced errors such as spelling and unclear statements) and errors caused by the
SA algorithm (VADER). The coding scheme, along with code descriptions and illustrative
examples from our dataset, is presented in Table A IIIError! Reference source not found.. The
inter-coder reliability for two independent coders resulted in a kappa (Fleiss, 1971) of .62 for the
259 data-points across the CE feeling sub-elements, suggesting “substantial agreement” (Landis
and Koch, 1977, p. 165).
The distribution of the coded items for both thresholds (i.e., difference ≥ 18.75% and
12.5%) are shown in Table 5. The number of cases identified at the 18.75% threshold varies
between 43 and 61, and expectedly increases to range between 82 and 97 at the 12.5% threshold.
Evidently, “Statement-Scale Mismatch” (at the 12.5%: 60.98% for mood, 56.10% for emotion,
and 43.30% for hedonic value) represents the most prominent reason for the differences in the
sentiment and scale scores, followed by the sentiment algorithm accuracy, which at the 12.5%
threshold explains between 19.51% (mood) and 31.96% (hedonic value) of the discrepancies.
Table 5. Distribution of the coded items
Sub-elements / Codes
Experience Mood
Experience
Emotion
Experience
Hedonic Value
Threshold
18.75%
12.5%
18.75%
12.5%
18.75%
12.5%
Sentiment Algorithm Accuracy
13.95%
19.51%
21.05%
20.73%
22.95%
31.96%
Statement Unclear
2.33%
3.66%
5.26%
6.10%
8.20%
6.19%
Statement Language
6.98%
3.66%
3.51%
2.44%
3.28%
2.06%
Statement Irrelevant
0%
0%
8.77%
8.54%
16.39%
10.31%
Statement-Scale Mismatch
65.12%
60.98%
56.14%
56.10%
44.26%
43.30%
Statement Multiple Context / Time
11.63%
12.20%
5.26%
6.10%
4.92%
6.19%
Total items*
43
82
57
82
61
97
* Denotes how many items out of the full sample (N = 193) were coded at each threshold
21
A series of paired t-tests, reported in Table 6, concludes the post-hoc analyses. In the two
sub-samples, all such cases where participant-induced errors are the underlying cause for the
difference in sentiment and scale scores are excluded. The results suggest that isolating the effect
of sentiment algorithm accuracy improves the similarity percentages for all feeling sub-elements.
Specifically, using the tighter threshold (12.5%), the similarities improve to 92.58% (Mood, N =
127), 89.90% (emotion, N = 128), and 90.27% (hedonic value, N = 127). The mean differences
also decrease as the thresholds are applied and, more importantly, the hedonic value’s score
differences become statistically insignificant (p = .06). In sum, the findings in Table 6 show that
the two approaches produce closely similar scores, especially when input errors are accounted
for. The potential improvements to the approach proposed in this study are discussed in section
5.3.
Table 6. Similarity percentages and paired difference t-tests for the feeling sub-elements
CE feeling sub-
element
Sample1
Similarity
%2
Mean
difference
(S.D.)
Std.
Error
t
(df)
Sig.
(2-
tailed)
Experience
Mood
Full Sample
86.88%
.14 (1.43)
.10
1.36
(192)
.18
Threshold 18.75%
91.25%
.06 (.87)
.07
.87
(155)
.39
Threshold 12.5%
92.58%
.04 (.76)
.07
.56
(126)
.58
Experience
Emotion
Full Sample
84.66%
.21 (1.61)
.12
1.82
(192)
.07
Threshold 18.75%
89.23%
.06 (1.16)
.10
.63
(147)
.53
Threshold 12.5%
89.90%
.03 (1.16)
.10
.26
(127)
.80
Experience
Hedonic Value
Full Sample
84.33%
.24 (1.56)
.11
2.16
(192)
.03
Threshold 18.75%
89.51%
.23 (.97)
.08
2.84
(145)
.01
Threshold 12.5%
90.27%
.16 (.94)
.08
1.88
(126)
.06
22
Notes: 1 The “Threshold 18.75%” and “Threshold 12.5%” sub-samples include all cases at or
above the difference threshold where the differences were caused by the SA algorithm (i.e.,
excludes all cases with participant’s input errors); 2 “Similarity %” refers to the percentage
similarity between the sentiment score and the average scale score
5. Discussion
The objective of this study was to develop and validate a novel and cost-effective
approach, drawing on the recognized potential of AI in the service context for gaining an
improved understanding of CE. The empirical findings support the effectiveness of SA chatbot
interviews for eliciting and examining CE feelings, highlighting the promise that the approach
holds in the digital age. The key theoretical and methodological contributions, practical
implications, research agenda, and limitations are highlighted and discussed below.
5.1 Theoretical and methodological contributions
This study contributes to the service management literature in three main aspects. First, a
granular and semantically clear framework (CEFM) for studying CE feelings is developed via a
comprehensive literature review. The CEFM informs CE theory by providing a holistic
understanding of CE feelings, consisting of three complementary sub-elements: mood
(temporally prolonged feelings) (e.g., Beedie et al., 2005), emotion (encounter-specific feelings)
(e.g., Fox, 2018), and hedonic value (context-specific feelings) (e.g., Babin et al., 1994). In line
with the recent service literature, the framework recognizes that feelings need to be treated as a
continuous phenomenon occurring within and outside of the service context, where the
experience of service encounters is linked with broader human experience (Fisk et al., 2020;
Lemon and Verhoef, 2016). The semantic clarity embedded into CEFM is critical, as this makes
it deployable by AI techniques such as text mining (Ordenes et al., 2017). Developing such
semantic clarity within the experience domain of service research also responds to the recent call
(Fisk et al., 2020) for researchers to develop service language that can be used to facilitate
23
improved wellbeing for everyone. Specifically, the CEFM distinguishes the different feeling sub-
elements and facilitates the understanding of CE feelings holistically.
Second, leveraging the CEFM, a novel approach where a chatbot and SA are employed to
collect primary data and extract valuable CE (feeling) insights is proposed. The approach offers
distinct advantages over alternatives as it addresses notable weaknesses of traditional interviews
(e.g., high costs and lack of scalability) and survey approaches (e.g., lack of engagement, limited
depth of data, poor response rates). Using a chatbot as an interviewer would enable service
companies to collect rich primary data from large samples quickly and cost-effectively. This
fundamentally atypical use of chatbots (i.e., proactive rather than reactive) helps alleviate many
of the potential downsides of traditional interviews since it inherits the efficiencies typically
related to surveys (Benoit et al., 2017). Using an AI-augmented chatbot interviewer to analyze
and (immediately) respond to CE (feelings) can also improve the way CE is managed.
An AI-enriched chatbot interview approach possesses two important characteristics:
immersion and engagement. Compared to a survey, Marvino adds the crucial element of a
reciprocal – even if simple and linear – immersive conversational narrative around experiences.
Some participants highlighted that they felt being part of a conversation that helped them express
their “experience story” via the “chat experience” (participant 85). Engagement, on the other
hand, helps establish personal trust and gratitude: “It makes the exchange more personal and
adds a bit of fun/imagination” (participant 193). Naming the chatbot Marvino created an
approachable human-like anthropomorphic persona, further signifying the natural human
tendency to exchange experience via storytelling: “the conversation was very nice and this study
was an interesting experience for me ;-) Goodbye Marvino! Thanks for this conversation ;-)”
(participant 172); “it was comfortable to chat with you, Marvino” (participant 133); and more
24
playfully, “you’re perfect Marvino don’t let no-one tell you different … you should become a
therapist” (participant 54). The above comments show how valuable natural storytelling
narratives can be; by helping customers relive and surface their experiences through
conversational retelling, more profound holistic inputs and outputs can be achieved with the use
of AI-augmented chatbot interviewers.
Third, the key findings demonstrate that the SA algorithm used in this study (VADER)
was able to extract the sentiments expressed by the participants adequately. It did particularly
well with mood and emotion, struggling slightly more with hedonic value; this might be due to
the latter indirectly portraying an outcome feeling (i.e., how a participant feels as a result of a
service encounter) as opposed to an explicit feeling (i.e., those of mood and emotion). Direct
comparisons between the extracted sentiment scores and the measurement scale scores offered
further validation for the proposed method; the two approaches produce closely similar scores
especially when input errors are accounted for. The post-hoc analyses reveal that it is essential to
distinguish between the SA algorithm being at fault (i.e., when the sentiment scores do not match
the text description) and participant-induced errors (e.g., typos). In sum, this study makes a
convincing case for the promise that the proposed approach holds. Thus, developing AI-enabled
frameworks such as CEFM, coupled with an analytical chatbot implementation, appears effective
for measuring aspects of CE and consequently complements existing methods such as EXQ and
generic text-mining. This is achieved by incorporating further qualitative analysis of the raw
output conversations (Kuppelwieser and Klaus, 2020). The findings also point to the need for
further refinements to this approach; how this could be done in practice is discussed below.
5.2 Practical implications
25
“I’ve never spoken to a chatbot before! I must say, I found it interesting and futuristic! I love it!”
(Respondent 18)
Each of the key contributions of the study also points to potential managerial
implications. First, adopting and applying the CEFM logic would allow service companies to
establish a deeper understanding of individual CE feelings, serving as a useful guide for any
adjustments to service experience design. Importantly, managers could not only develop an
understanding of how individual service encounters affect customer feelings, but also how their
mood or extrinsic influences affect their hedonic perception of the service overall in the context
of their wider human experience (Fisk et al., 2020).
Second, the use of conversational agents for narrative inquiry has considerable potential
for positive business impact. They serve as cost-effective tools to collect primary data on a large
scale. A fundamental advantage of using chatbot interviewers lies in the manner in which they
direct the conversation and ask participants explicit questions that help unpack complex
phenomena such as CE. Without a chatbot asking for specific sentiments related to the feeling
sub-elements, the adoption of an analytical AI approach relying on secondary data only, would
need to be more complex, use massive training datasets, and assume that the data it requires is
correctly labeled and readily present for analysis. Thus, by using chatbots to collect primary data,
there is less guesswork involved, especially as chatbots can simply ask a question again if they
fail to understand the answer, much like a human interviewer would. Moreover, the approach
facilitates the extraction of feelings that might not be defined in a discrete pre-defined spectrum
(e.g., anger, fear, joy, and surprise), thus complementing such approaches (Madhala et al., 2018).
Thus, by providing chatbots with a positive persona and demonstrated empathetic reciprocal
feelings within the interview experience, these conversational agents become useful tools
26
enabling companies to foster immersive and engaging relationships with customers, resulting in
improved CE.
Third, the empirically validated novel AI-ready tool, combining chatbot-enabled data
collection and text analysis via SA, offers a cost-effective approach for companies to enhance
their understanding of CE. The customer-centric approach helps service companies to bridge the
current gap in CE measurement enabling a resource-effective analytical implementation.
Moreover, chatbot interviewer technology comes with considerable versatility as it offers
companies the potential to collect and analyze diverse (e.g., in terms of demographics) and
complementary qualitative and quantitative datasets from multiple channels (e.g., live chat and
company database). For example, chatbots could tap into local news outlet services and adapt
their personality to the macro context’s overall sentiment. This could provide a more genuine
storytelling experience (e.g., an airline chatbot could sense an overall negative sentiment due to
weather-related cancellations and could portray more empathetic responses as a result), which in
turn could add to the insights gained during conversation.
5.3 Future research agenda
Drawing on the above discussion, Table 7 outlines a detailed future research agenda,
which consists of research topics as well as managerial and technical concerns for the CEFM, the
chatbot interviewer, and the SA algorithm. Further research topics for each of these are outlined
within the contexts of CE feelings, CE elements, and service management.
Concerning the CEFM, the framework depicted in this study assumes that the sub-
elements are related but static. However, in line with Kranzbühler et al. (2018), it is posited that a
more dynamic structure could enable the measurement of CE feelings even more effectively.
Development of such a dynamic model could, for example, take into consideration the
27
temporality of feelings (e.g., mood before and after service encounter) and multiple emotions
exhibited during multiple encounters, and thus help paint a more detailed and holistic picture of
the customer’s feelings. How CE feelings and its sub-elements affect other CE elements (e.g.,
CE thoughts) and service management more broadly would also need to be investigated further.
A relevant question calling for further examination pertains to reducing problematic
participant-induced errors (e.g., unclear statement, statement mixes multiple contexts, statement
irrelevant) in the chatbot interviews. A potential avenue to address this in the multiple context
scenario could, for example, be the incorporation of realtime target-based SA, which would
trigger the chatbot to ask for further clarifications to single out the different contexts (Carvalho et
al., 2009). Additionally, future research should examine to what extent and how chatbot
interviews might influence the participant’s experience and the data collected. It would also be
useful to identify other service management research areas that could benefit from the
deployment of a chatbot interviewer, while examining the key managerial and technical
challenges involved in incorporating this approach.
Finally, the preprocessing phase of the SA algorithm, which is often portrayed in data-
mining frameworks such as CRISP-DM, could be further optimized (Azevedo and Santos, 2008).
Although the implementation of this phase differs between solutions, for this particular case,
removing a more extensive array of stop words and non-influential parts-of-speech (Ordenes and
Zhang, 2019), including spelling and grammar checking could precede the SA task. This would
result in potentially improving CE feeling extraction accuracy, as highlighted in the
managerial/technical SA portion of Table 7.
28
Table 7. Research agenda
Context
Research Topics
Managerial/Technical Concerns
CE Feeling Model (CEFM)
CE Feelings
How does temporality affect each of
the feeling sub-elements? (e.g., how
mood before and after an encounter
influences overall CE feelings)?
How could managers score (attribute
weights to) different feeling sub-
elements?
CE Elements
How do the feeling sub-elements
influence other CE elements?
How do feeling sub-elements influence
CE decisions compared to other CE
elements?
Service
management
How do feeling sub-elements
influence different service constructs
(e.g., customer engagement, and
transformative service research)?
How do feeling sub-elements influence
different service management facets
(e.g., service design, customer
journey)?
Chatbot interviewer
CE Feelings
What is the influence of a chatbot
interviewer on the level of expression
of participant CE feelings?
What are possible methods for
discerning whether participants
expressed valid answers to the
chatbot’s questions?
CE Elements
How well can a chatbot interviewer
address and extract the entire CE
element spectrum?
What strategies, incentives, and levels
of effort would be required for
participants to adequately provide the
details of their CEs?
Service
management
How can different service
management research areas benefit
from a chatbot interviewer?
How would employment of chatbot
interviewers in service management
impact costs and resource utilization?
Sentiment analysis
CE Feelings
How would different algorithms fair
against VADER, and how could SA
be improved?
What technical improvements could be
made to improve the performance of SA
of CE feelings?
CE Elements
To what extent is SA an effective
method of extracting the remaining
CE elements?
What technical SA algorithm
modifications would be necessary when
evaluating different CE elements?
Service
management
How can SA be used to further
service constructs (e.g., service
failure)?
What are the limitations of the current
application of SA in different service
contexts? How could these limitations
be overcome?
5.4 Limitations
29
Despite the promising empirical results reported in this study, some limitations naturally
need to be noted. First, the chatbot utilized here was a non-responsive conversational agent. This
forced participants to restart the entire conversation if they wanted to modify a previously
inputted answer, and this could have affected the results if several participants chose not to
reinitiate the conversation again. Using a more advanced chatbot with the option to
conversationally request to track back and change a previous answer (as you can in natural
discourse) or to clarify a point (as discussed previously) would address this limitation. Second,
the fact that European participants (82.9%) represent an overwhelming majority in the sample,
which is also skewed toward the younger participants (69.4% of the sample are below 35 years
of age), could also affect the findings and needs to be considered when generalizing the findings
to a broader population
6
. Third, the study did not account for technology biases the participants
could have exhibited and how these would impact the interaction and recollection with the
chatbot. For instance, participants might be biased against human or conversational agents.
Researchers could address this by incorporating questions that assess potential technology bias.
Fourth, to preserve storytelling/narrative immersion, a simple affect model was utilized to
measure emotions to alleviate the technical and lengthy measurement scales of more complex
variations – such as Mehrabian and Russell’s model (1974) Pleasure, Arousal, Dominance, or
Plutchik’s (1980) emotional model – even if these might have resulted in a slightly more
accurate understanding of the sub-elements of CE. Fifth, as noted, a small number of missing
values appeared as a result of connectivity (i.e., network) issues between chatbot and
participants, thus researchers should consider Internet stability in their implementations of this
6
Post-hoc tests revealed that our key findings are statistically indifferent for European (vs.
others) and under 35 year olds (vs. others), lending support for wider generalizability.
30
approach. Lastly, VADER was the only SA algorithm used due to the primary objective of the
study, which aims at understanding how well SA fairs against validated measurement scales of
each of the feeling sub-elements. Testing how other SA algorithms fair against VADER would be
interesting and would shed more light on the effectiveness of the proposed chatbot approach.
This, as well as the other limitations, could be addressed in future work.
31
References
Ali, F., Hussain, K. and Omar, R. (2016), “Diagnosing customers experience, emotions and
satisfaction in Malaysian resort hotels”, European Journal of Tourism Research, Vol. 12,
pp. 25–40.
Anderson, J.C. and Narus, J.A. (1984), “A Model of the Distributor’s Perspective of Distributor-
Manufacturer Working Relationships”, Journal of Marketing, Vol. 48 No. 4, pp. 62–74.
Azevedo, A. and Santos, M.F. (2008), “KDD, SEMMA and CRISP-DM: a parallel overview”,
IADIS European Conference on Data Mining 2008, Amsterdam, The Netherlands, July
24-26, 2008. Proceedings, Amsterdam, pp. 182–185.
Babin, B.J., Darden, W.R. and Griffin, M. (1994), “Work and/or Fun: Measuring Hedonic and
Utilitarian Shopping Value”, Journal of Consumer Research, Vol. 20 No. 4, p. 644.
Bagdare, S. and Jain, R. (2013), “Measuring retail customer experience”, International Journal
of Retail & Distribution Management, Vol. 41 No. 10, pp. 790–804.
Batra, R. and Ahtola, O.T. (1991), “Measuring the hedonic and utilitarian sources of consumer
attitudes”, Marketing Letters, Vol. 2 No. 2, pp. 159–170.
Beedie, C., Terry, P. and Lane, A. (2005), “Distinctions between emotion and mood”, Cognition
& Emotion, Vol. 19 No. 6, pp. 847–878.
Benoit, S., Scherschel, K., Ates, Z., Nasr, L. and Kandampully, J. (2017), “Showcasing the
diversity of service research”, Journal of Service Management, Vol. 28 No. 5, pp. 810–
836.
Berry, M.J.A. and Linoff, G.S. (2004), Data Mining Techniques for Marketing, Sales and
Customer Relationship Management, Second Edi., Wiley Publishing, Inc., Indianapolis,
Indiana.
Bolton, R., McColl-Kennedy, J.R., Cheung, L., Gallan, A., Orsingher, C., Witell, L. and Zaki, M.
(2018), “Customer experience challenges: bringing together digital, physical and social
realms”, Journal of Service Management, Vol. 29 No. 5, pp. 776–808.
Boyd, D. and Crawford, K. (2012), “Critical Questions for Big Data”, Information,
Communication & Society, Vol. 15 No. 5, pp. 662–679.
Brakus, J.J., Schmitt, B.H. and Zarantonello, L. (2009), “Brand Experience: What Is It? How Is
It Measured? Does It Affect Loyalty?”, Journal of Marketing, Vol. 73 No. 3, pp. 52–68.
Brodie, R.J., Nenonen, S., Peters, L.D. and Storbacka, K. (2017), “Theorizing with managers to
bridge the theory-praxis gap: Foundations for a research tradition”, European Journal of
Marketing, Vol. 51 No. 7/8, pp. 1173–1177.
Brown, T.A. (2015), Confirmatory Factor Analysis for Applied Research, Guilford Publications.
Brynjolfsson, E. and Mcafee, A. (2017), “The Business of Artificial Intelligence: What it can and
cannot do for your organization.”, Harvard Business Review Digital Articles, pp. 3–11.
Carù, A. and Cova, B. (2003), “Revisiting Consumption Experience”, Marketing Theory, Vol. 3
No. 2, pp. 267–286.
Carvalho, P., Sarmento, L., Silva, M.J. and de Oliveira, E. (2009), “Clues for detecting irony in
user-generated contents”, Proceeding of the 1st International CIKM Workshop on Topic-
Sentiment Analysis for Mass Opinion - TSA ’09, ACM Press, New York, New York, USA,
p. 53.
Clarke, D. and Kinghorn, R. (2018), Experience Is Everything: Here’s How to Get It Right, PwC,
available at: pwc.com/future-of-cx (accessed 22 August 2019).
32
Connelly, F.M. and Clandinin, D.J. (1990), “Stories of Experience and Narrative Inquiry”,
Educational Researcher, Vol. 19 No. 5, pp. 2–14.
Csikszentmihalyi, M. (1991), Flow : The Psychology of Optimal Experience, HarperPerennial,
New York.
Csikszentmihalyi, M. and Larson, R. (1987), “Validity and reliability of the Experience-
Sampling Method.”, The Journal of Nervous and Mental Disease, Vol. 175 No. 9, pp.
526–36.
De Keyser, A., Köcher, S., Alkire (née Nasr), L., Verbeeck, C. and Kandampully, J. (2019),
“Frontline Service Technology infusion: conceptual archetypes and future research
directions”, Journal of Service Management, Vol. 30 No. 1, pp. 156–183.
De Keyser, A., Verleye, K., Lemon, K.N., Keiningham, T.L. and Klaus, P. (2020), “Moving the
Customer Experience Field Forward: Introducing the Touchpoints, Context, Qualities
(TCQ) Nomenclature”, Journal of Service Research, p. 109467052092839.
Feldman, R. (2013), “Techniques and applications for sentiment analysis”, Communications of
the ACM, 1 April, Vol. 56 No. 4, pp. 82–89.
Feldman, R. and Sanger, J. (2006), The Text Mining Handbook, Cambridge University Press,
Cambridge, available at:
https://www.cambridge.org/core/product/identifier/9780511546914/type/book.
Fisk, R.P., Alkire (née Nasr), L., Anderson, L., Bowen, D.E., Gruber, T., Ostrom, A.L. and
Patrício, L. (2020), “Elevating the human experience (HX) through service research
collaborations: introducing ServCollab”, Journal of Service Management, available
at:https://doi.org/10.1108/JOSM-10-2019-0325.
Fornell, C. and Larcker, D.F. (1981), “Evaluating Structural Equation Models with Unobservable
Variables and Measurement Error”, Journal of Marketing Research, Vol. 18 No. 1, p. 39.
Fox, E. (2018), “Perspectives from affective science on understanding the nature of emotion”,
Brain and Neuroscience Advances, Vol. 2, p. 239821281881262.
Gentile, C., Spiller, N. and Noci, G. (2007), “How to Sustain the Customer Experience: An
Overview of Experience Components that Co-create Value With the Customer”,
European Management Journal, Vol. 25 No. 5, pp. 395–410.
Heinonen, K. (2018), “Positive and negative valence influencing consumer engagement”,
Journal of Service Theory and Practice, Vol. 28 No. 2, pp. 147–169.
Hilken, T., De Ruyter, K., Chylinski, M., Mahr, D. and Keeling, D.I. (2017), “Augmenting the
eye of the beholder: exploring the strategic potential of augmented reality to enhance
online service experiences”, Journal of the Academy of Marketing Science, Vol. 45 No. 6,
pp. 884–905.
Holbrook, M.B. (2006), “Consumption experience, customer value, and subjective personal
introspection: An illustrative photographic essay”, Journal of Business Research, Vol. 59
No. 6, pp. 714–725.
Holbrook, M.B. and Hirschman, E.C. (1982), “The Experiential Aspects of Consumption:
Consumer Fantasies, Feelings, and Fun”, Journal of Consumer Research, Vol. 9 No. 2, p.
132.
Holmlund, M., Van Vaerenbergh, Y., Ciuchita, R., Ravald, A., Sarantopoulos, P., Ordenes, F.V.
and Zaki, M. (2020), “Customer experience management in the age of big data analytics:
A strategic framework”, Journal of Business Research, p. S0148296320300345.
33
Hu, L. and Bentler, P.M. (1999), “Cutoff criteria for fit indexes in covariance structure analysis:
Conventional criteria versus new alternatives”, Structural Equation Modeling: A
Multidisciplinary Journal, Vol. 6 No. 1, pp. 1–55.
Hulland, J. and Miller, J. (2018), “‘Keep on Turkin’?”, Journal of the Academy of Marketing
Science, Vol. 46 No. 5, pp. 789–794.
Humphreys, A. and Wang, R.J.-H. (2018), “Automated Text Analysis for Consumer Research”,
edited by Fischer, E. and Price, L.Journal of Consumer Research, Vol. 44 No. 6, pp.
1274–1306.
Hutto, C.J. and Gilbert, E. (2014), “Vader: A parsimonious rule-based model for sentiment
analysis of social media text”, Eighth International AAAI Conference on Weblogs and
Social Media.
IBM Corp. (2017), IBM SPSS Statistics for Windows.
Kabadayi, S., Ali, F., Choi, H., Joosten, H. and Lu, C. (2019), “Smart service experience in
hospitality and tourism services: A conceptualization and future research agenda”,
Journal of Service Management, Vol. 30 No. 3, pp. 326–348.
Kawaf, F. and Tagg, S. (2017), “The construction of online shopping experience: A repertory grid
approach”, Computers in Human Behavior, Vol. 72, pp. 222–232.
Klaus, P. and Maklan, S. (2011), “Bridging the gap for destination extreme sports: A model of
sports tourism customer experience”, Journal of Marketing Management, Vol. 27 No. 13–
14, pp. 1341–1365.
Klaus, P. and Maklan, S. (2012), “EXQ: a multiple‐item scale for assessing service experience”,
Journal of Service Management, Vol. 23 No. 1, pp. 5–33.
Kline, R.B. (2016), Principles and Practice of Structural Equation Modeling, 4th edition., The
Guilford Press.
Kranzbühler, A.-M., Kleijnen, M.H.P., Morgan, R.E. and Teerling, M. (2018), “The Multilevel
Nature of Customer Experience Research: An Integrative Review and Research Agenda”,
International Journal of Management Reviews, Vol. 20, pp. 433–456.
Kumar, V., Dixit, A., Javalgi, R.G. and Dass, M. (2016), “Research framework, strategies, and
applications of intelligent agent technologies (IATs) in marketing”, Journal of the
Academy of Marketing Science, Vol. 44 No. 1, pp. 24–45.
Kunz, W., Aksoy, L., Bart, Y., Heinonen, K., Kabadayi, S., Ordenes, F.V., Sigala, M., et al.
(2017), “Customer engagement in a Big Data world”, Journal of Services Marketing, Vol.
31 No. 2, pp. 161–171.
Kuppelwieser, V.G. and Klaus, P. (2020), “Measuring customer experience quality: The EXQ
scale revisited”, Journal of Business Research, p. S0148296320300564.
Larivière, B., Bowen, D., Andreassen, T.W., Kunz, W., Sirianni, N.J., Voss, C., Wünderlich, N.V.,
et al. (2017), “‘Service Encounter 2.0’: An investigation into the roles of technology,
employees and customers”, Journal of Business Research, Vol. 79, pp. 238–246.
Lemon, K.N. and Verhoef, P.C. (2016), “Understanding Customer Experience Throughout the
Customer Journey”, Journal of Marketing, Vol. 80 No. 6, pp. 69–96.
Little, R.J.A. (1988), “A Test of Missing Completely at Random for Multivariate Data with
Missing Values”, Journal of the American Statistical Association, Vol. 83 No. 404, pp.
1198–1202.
Madhala, P., Jussila, J., Aramo-Immonen, H. and Suominen, A. (2018), “Systematic Literature
Review on Customer Emotions in Social Media”, presented at the ECSM 2018 :
Proceedings of the 5th European Conference on Social Media.
34
Maklan, S., Antonetti, P. and Whitty, S. (2017), “A Better Way to Manage Customer
Experience”, California Management Review, Vol. 59 No. 2, pp. 92–115.
Marketing Science Institute. (2016), MSI Research Priorities 2016-2018, Marketing Science
Institute, Marketing Science Institute, Cambridge, p. 21.
Marketing Science Institute. (2018), MSI Research Priorities 2018-2020, Marketing Science
Institute, Cambridge, Mass., p. 17.
Marr, B. (2018), “How Much Data Do We Create Every Day? The Mind-Blowing Stats
Everyone Should Read”, Forbes, available at:
https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-
every-day-the-mind-blowing-stats-everyone-should-read/ (accessed 4 May 2019).
Maynard, A.D. (2015), “Navigating the fourth industrial revolution”, Nature Nanotechnology,
Vol. 10 No. 12, pp. 1005–1006.
McColl-Kennedy, J.R., Gustafsson, A., Jaakkola, E., Klaus, P., Radnor, Z.J., Perks, H. and
Friman, M. (2015), “Fresh perspectives on customer experience”, Journal of Services
Marketing, Vol. 29 No. 6/7, pp. 430–435.
McColl-Kennedy, J.R., Zaki, M., Lemon, K.N., Urmetzer, F. and Neely, A. (2019), “Gaining
Customer Experience Insights That Matter”, Journal of Service Research, Vol. 22 No. 1,
p. 109467051881218.
McIntyre, E. and Virzi, A.M. (2018), CMO Spend Survey 2018–2019, No. G00361758, Gartner
for Marketers.
Mehrabian, A. and Russell, J.A. (1974), An Approach to Environmental Psychology., the MIT
Press.
Mende, M., Scott, M.L., van Doorn, J., Grewal, D. and Shanks, I. (2019), “Service Robots
Rising: How Humanoid Robots Influence Service Experiences and Elicit Compensatory
Consumer Responses”, Journal of Marketing Research, Vol. 56 No. 4, pp. 535–556.
Ng, I.C.L. and Wakenshaw, S.Y.L. (2017), “The Internet-of-Things: Review and research
directions”, International Journal of Research in Marketing, Vol. 34 No. 1, pp. 3–21.
Novick, G. (2008), “Is there a bias against telephone interviews in qualitative research?”,
Research in Nursing & Health, Vol. 31 No. 4, pp. 391–398.
Ordenes, F. and Zhang, S. (2019), “From words to pixels: text and image mining methods for
service research”, Journal of Service Management, Vol. ahead-of-print No. ahead-of-
print, available at:https://doi.org/10.1108/JOSM-08-2019-0254.
Ordenes, F.V., Ludwig, S., De Ruyter, K., Grewal, D. and Wetzels, M. (2017), “Unveiling what is
written in the stars: Analyzing explicit, implicit, and discourse patterns of sentiment in
social media”, Journal of Consumer Research, Vol. 43 No. 6, pp. 875–894.
Ordenes, F.V., Theodoulidis, B., Burton, J., Gruber, T. and Zaki, M. (2014), “Analyzing
Customer Experience Feedback Using Text Mining”, Journal of Service Research, Vol.
17 No. 3, pp. 278–295.
Otto, J. and Ritchie, B. (1996), “The service experience in tourism”, Tourism Management, Vol.
17 No. 3, pp. 165–174.
Palan, S. and Schitter, C. (2018), “Prolific.ac—A subject pool for online experiments”, Journal
of Behavioral and Experimental Finance, Vol. 17, pp. 22–27.
Palmer, A. (2010), “Customer experience management: a critical review of an emerging idea”,
Journal of Services Marketing, Vol. 24 No. 3, pp. 196–208.
35
Peer, E., Brandimarte, L., Samat, S. and Acquisti, A. (2017), “Beyond the Turk: Alternative
platforms for crowdsourcing behavioral research”, Journal of Experimental Social
Psychology, Vol. 70, pp. 153–163.
Peterson, R. and Sauber, M. (1983), “Mood short form: MSF”, Handbook of Marketing Scales:
Multi-Item Measures for Marketing and Consumer Behavior Research, Eds. Bearden,
WO and Netemeyer, RG (1999), Thousand Oaks: Sage.
Pine, B.J. and Gilmore, J.H. (1998), “Welcome to the experience economy.”, Harvard Business
Review, Vol. 76 No. 4, pp. 97–105.
Plutchik, R. (1980), “A General Psychoevolutionary Theory Of Emotion”, Theories of Emotion,
Elsevier, pp. 3–33.
Rajaobelina, L. (2018), “The Impact of Customer Experience on Relationship Quality with
Travel Agencies in a Multichannel Environment”, Journal of Travel Research, Vol. 57
No. 2, pp. 206–217.
Reichheld, F.F. (2003), “The One Number You Need to Grow”, Harvard Business Review, Vol.
81 No. 12, pp. 46–54.
Ribeiro, F.N., Araújo, M., Gonçalves, P., André Gonçalves, M. and Benevenuto, F. (2016),
“SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis
methods”, EPJ Data Science, Vol. 5 No. 1, p. 23.
Richins, M.L. (1997), “Measuring Emotions in the Consumption Experience”, Journal of
Consumer Research, Vol. 24 No. 2, pp. 127–146.
Riikkinen, M., Saarijärvi, H., Sarlin, P. and Lähteenmäki, I. (2018), “Using artificial intelligence
to create value in insurance”, International Journal of Bank Marketing, Vol. 36 No. 6, pp.
1145–1168.
Robinson, S., Orsingher, C., Alkire, L., De Keyser, A., Giebelhausen, M., Papamichail, K.N.,
Shams, P., et al. (2019), “Frontline encounters of the AI kind: An evolved service
encounter framework”, Journal of Business Research, p. S0148296319305089.
Rubin, D.B. (1987), Multiple Imputation for Nonresponse in Surveys.
Rygielski, C., Wang, J.-C. and Yen, D.C. (2002), “Data mining techniques for customer
relationship management”, Technology in Society, Vol. 24 No. 4, pp. 483–502.
Schmitt, B. (1999), “Experiential Marketing”, Journal of Marketing Management, Vol. 15 No. 1–
3, pp. 53–67.
Shaw, C. and Ivens, J. (2002), Building Great Customer Experiences, Interactive Marketing, Vol.
5, Palgrave Macmillan UK, London, available at:https://doi.org/10.1057/9780230554719.
Sokolova, M. and Lapalme, G. (2009), “A systematic analysis of performance measures for
classification tasks”, Information Processing & Management, Vol. 45 No. 4, pp. 427–437.
Syam, N. and Sharma, A. (2018), “Waiting for a sales renaissance in the fourth industrial
revolution: Machine learning and artificial intelligence in sales research and practice”,
Industrial Marketing Management, Vol. 69, pp. 135–146.
Temkin, B. (2018), 2018 TEMKIN EXPERIENCE RATINGS, U.S., TEMKIN GROUP, Waban,
Massachusetts, p. 23.
Verhoef, P.C., Lemon, K.N., Parasuraman, A., Roggeveen, A., Tsiros, M. and Schlesinger, L.A.
(2009), “Customer Experience Creation: Determinants, Dynamics and Management
Strategies”, Journal of Retailing, Vol. 85 No. 1, pp. 31–41.
Verleye, K. (2015), “The co-creation experience from the customer perspective: its measurement
and determinants”, edited by Elina Jaakkola, Anu Helkkula and Dr, D.Journal of Service
Management, Vol. 26 No. 2, pp. 321–342.
36
Watson, D. and Tellegen, A. (1985), “Toward a consensual structure of mood.”, Psychological
Bulletin, Vol. 98 No. 2, pp. 219–235.
Webster, L. (2007), Using Narrative Inquiry as a Research Method: An Introduction to Using
Critical Event Narrative Analysis in Research on Learning and Teaching, 1st ed.,
Routledge, available at:https://doi.org/10.4324/9780203946268.
Wirtz, J., Patterson, P.G., Kunz, W.H., Gruber, T., Lu, V.N., Paluch, S. and Martins, A. (2018),
“Brave new world: service robots in the frontline”, Journal of Service Management, Vol.
29 No. 5, pp. 907–931.
Wirtz, J. and Zeithaml, V. (2018), “Cost-effective service excellence”, Journal of the Academy of
Marketing Science, Vol. 46 No. 1, pp. 59–80.
Zolkiewski, J., Story, V., Burton, J., Chan, P., Gomes, A., Hunter-Jones, P., O’Malley, L., et al.
(2017), “Strategic B2B customer experience management: the importance of outcomes-
based measures”, Journal of Services Marketing, Vol. 31 No. 2, pp. 172–184.
37
Appendices
Table A I. Sample frequencies
Demographic (N = 193)
Frequency
Percent %
Age
18-24
72
37.3
25-34
62
32.1
35-44
29
15.0
45-54
20
10.4
55+
10
5.2
Gender
Female
87
45.1
Male
101
52.4
Undisclosed
5
2.5
Region
Australia
1
.5
Balkans
9
4.7
Europe
160
82.9
Middle East
3
1.6
North America
12
6.2
South America
1
.5
Southeast Asia
1
.5
Undisclosed
6
3.1
Mobile device
Yes
50
25.9
No
143
74.1
Student
Yes
65
33.7
No
123
63.7
Undisclosed
5
2.6
Table A II. Measurement items and standardized loadings
Source
Constructs
Chatbot questions1
Stand.
loading
Mood Short
Form (MSF)
(Peterson
and Sauber,
1983)
Experience
mood
“You would say your mood in general THAT day was
…” (Extremely negative – Extremely positive)
.881
“THAT day you recall feeling …” (Extremely Dull -
Extremely Cheerful)
.847
“How emotionally comfortable or uncomfortable do
you remember feeling THAT day?” (Extremely
Uncomfortable - Extremely Comfortable)
.795
38
Source
Constructs
Chatbot questions1
Stand.
loading
“As per your recollection, how did you feel THAT
day?” (Extremely Tense - Extremely Calm)
.729
Hedonic
Consumer
Attitudes
(Batra and
Ahtola,
1991)
Experience
hedonic value
“Overall this experience was …” (Extremely
Displeasing - Extremely Nice)
.970
“In the end you felt the experience was …”
(Extremely Unpleasant - Extremely Pleasant)
.944
“How agreeable or disagreeable would you say the
whole service experience was?” (Extremely
Disagreeable - Extremely Agreeable)
.896
“This experience left you feeling …” (Extremely Sad
- Extremely Happy)
.917
Watson and
Tellegen
(1985)
Experience
emotion
“How positive/negative was this interaction2?”
(Extremely Negative - Extremely Positive)
-
Notes: 1All responses were obtained using 9-point scales; 2 “this interaction” refers to a context built by
previous questions inquiring about a memorable service encounter moment, and the service resource
(object or person) (Ordenes et al., 2014)
Table A III. Phase 4 coding scheme
Code
Description
Examples
Sentiment Algorithm
Accuracy
The algorithm did not pick up
sentiments accurately
“Upset as it was a special
occasion” (sentiment score 5.1)
“I didn’t feel happy because of
the meal” (sentiment score 7.3)
Statement Unclear
The statement did not convey the
intended sentiments correctly and
was not clear
“Awkward”, “I felt normal”,
“more secure”
Statement Language
Errors with grammar and spelling
affecting sentiment words
“Annoid”, “irritability”, “10/10”
Statement Irrelevant
The participant did not provide
(any) sentiments targeted at the
sub-element in question
“[company x] flight staff”, “Felt
like my problem was important”,
“She was very nice and
understanding”
Statement-Scale
Mismatch
A mismatch between the scales
answered and the description
(e.g., an average sentiment but an
extreme scale response)
“very disappointed and angry”
(avg. scale score 5.8)
“I was very calm” (avg. scale
score 9.0)
Statement Multiple
Context / Time
Multiple timelines / contexts in
the description
“first I was a bit shocked, then
sad, afterword I was angry”, “I
have been in a very good mood
but feel a bit down this evening”
39