Conference PaperPDF Available


Voice user interfaces (VUI) are currently a trending topic but the ability to measure and improve the user experience (UX) is still missing. We aim to develop a tool selector as a web application that can provide a suitable tool to measure UX quality of VUIs. The UX tool selector for VUIs will include a UX measurement toolbox containing several existing and new VUI assessment methods. The UX tool selector will provide context-dependent measurement recommendations without prior extensive research to evaluate and improve VUIs.
Toward a User Experience Tool Selector
for Voice User Interfaces
Doctoral Consortium
Andreas M. Klein
Department of Computer Languages and Systems
University of Seville
Seville, Spain
Voice user interfaces (VUI) are currently a trending topic but the
ability to measure and improve the user experience (UX) is still
missing. We aim to develop a tool selector as a web application
that can provide a suitable tool to measure UX quality of VUIs.
Te UX tool selector for VUIs will include a UX measurement
toolbox containing several existing and new VUI assessment
methods. Te UX tool selector will provide context-dependent
measurement recommendations without prior extensive research
to evaluate and improve VUIs.
Human-centered computing Human computer interaction
(HCI) HCI design and evaluation methods Accessibility
User Experience, Voice User Interfaces, Web Application,
Conversational User Interfaces, Voice Assistants, Evaluation,
ACM Reference format:
Andreas M. Klein. 2021. Toward a User Experience Tool Selector for Voice
User Interfaces. In Proceedings of 18th International Web for All
Conference (W4A21), April 19-20, 2021, Ljubljana, Slovenia, 2 pages.
Voice user interfaces (VUIs) [1] or voice assistants (VAs) [2]
(general assistants with various services) have developed into a
leading-edge technology with a wide range of applications.
Nowadays, VAs are available in many devices and systems, e.g.,
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the Owner/Author.
W4A '21, April 1920, 2021, Ljubljana, Slovenia
© 2021 Copyright is held by the owner/author(s).
ACM ISBN 978-1-4503-8212-0/21/04.
Alexa (Amazon), Google Assistant (Google), and Siri (Apple).
Since UX is an essential aspect for evaluating interactive systems,
there are already several approaches to measure UX quality of
VUIs [3][5]. However, these methods require extensive
resources and there are no general guidelines to apply UX
assessment tools to VUIs.
Additionally, our pilot study showed users’ concerns and VAs
revealed technical limitations [6]. In contrast, persons with
disability use VUIs very intensively for activities such as web
surfing or games. Hence, we see a demand for UX evaluation that
focuses on the context of use and easy-to-apply measures.
Terefore, we aim to design a UX tool selector for VUIs (see
Figure 1) as a web application that allows one to select a suitable
UX measurement tool to evaluate a VUI test object. Te selector
draws on a UX measurement toolbox that contains standard UX
measurement methods for VUI evaluation. Te context of use of
the given VUI test object determines the tool selection.
Figure 1: Proposed UX tool selector for VUIs
To the best of our knowledge, this is the first approach to
provide context-dependent UX measurement recommendations
for assessing UX quality of VUIs.
For the researcher's interpretation and evaluation of the proposed
thesis, we apply the standardized design science research
methodology (DSRM) [7]. Following the six DSRM steps, we aim
to answer the research questions (RQs) below:
W4A’21, April 19-20, 2021, Ljubljana, Slovenia
1. W
hat factors should be considered when selecting UX tools
for VUIs?
2. Which tools are needed to measure UX quality of VUIs?
3. How can the VUI context of use be comprehensively
4. How can the UX tool selector for VUIs be designed?
5. Are the recommendations for measurement relevant?
To answer these research questions, we plan to apply several
methods: systematic literature review (SLR) [8], user study,
structured interview, observation with VUI users from Germany
and Spain, and a case study or Delphi study.
First, we intend to conduct a SLR to search for factors to select UX
tools for VUIs (RQ1) and to determine which tools exist to
evaluate UX quality of VUIs (RQ2). Current tools are, e.g., UEQ+
framework [9], diary study [10], a conditional voice recorder [11],
or psychophysiological approaches as an addition to traditional
UX assessment [5].
We will then conduct an extensive survey and interviews to
capture the context of use (RQ3) to enable, e.g., an extension of
the UEQ+ framework. We have contact to users with visual
impairments or motor impairments to capture a context of use.
Afterward, we will design the UX tool selector (RQ4) and
evaluate it (RQ5) using distinct methods, e.g., a case study and/or
Delphi study.
A brief literature search revealed that no such UX tool selector for
VUIs is available (RQ1). Applying the tool selector and measuring
with the recommended UX assessment tool will help to meet
users’ needs and increase VUI acceptance. Our pilot study showed
that 49% of technology-based users have no interest in VA use, as
voice interaction is still a challenge due to factors such as speech
intelligibility issues and privacy concerns [6]. VUI improvements
are currently needed to increase UX quality and adoption of such
highly available cutting-edge technology.
In our preliminary literature review we did not find any tool
to measure the complete UX quality of VUIs (RQ2). Therefore, a
new approach for a flexible evaluation of VUI systems was
extended within the UEQ+ modular framework (UEQ+ is based on
the user experience questionnaire) [4, 12]. The UEQ+ contains 20
scales to measure specific UX aspects, which can be combined into
a product-related questionnaire. The newly constructed scales for
voice quality consider the UX aspects of VUIs and fill the voice
interaction gap within UEQ+ [13].
We conducted a pre-test of semi-structured interviews with
visually impaired persons, asking "How do people with disabilities
use VAs?" We asked this in order to identify potential research
gaps considering web accessibility, the measuring tool, and the
context of use (RQ3). We must evaluate the results to finalize our
guidelines with further interviews and observations. VUIs have
great potential within this user group, as persons with disabilities
A. M. Klein
can interact with VAs very effectively, they have a high frequency
of use, and they have a wide range of user experiences [14].
The core of this work is to design a tool selector (RQ4) as a web
application that can be used to select suitable measurements from
the toolbox with regard to the context in which the VUI test object
is used. We consider the VUI context of use holistically, from an
overall business goal to a detailed answer to a follow-up question
[1]. The tool selector will provide recommendations for
measuring with a push of a button.
The next step is to find factors and UX measurement tools for
VUIs with a SLR. Then we will conduct the validation of the UEQ+
scales for voice quality to be included in the toolbox. Furthermore,
we plan to continue with structured interviews of VUI power
users to capture the contexts of use.
Tis work was supported by the NICO project (PID2019-
105455GB-C31) from the Ministerio de Ciencia, Innovación y
Universidades (Spanish Government).
[1] M. H. Cohen, J. P. Giangola, and J. Balogh, Voice User Interface Design. Addison-
Wesley, 2004.
[2] M. B. Hoy, “Alexa, Siri, Cortana, and More: An Introduction to Voice
Assistants,” Med. Ref. Serv. Q., vol. 37, no. 1, pp. 8188, 2018, doi:
[3] A. B. Kocaballi, L. Laranjo, and E. Coiera, “Measuring User Experience in
Conversational Interfaces: A Comparison of Six Questionnaires,” 2018, doi:
[4] A. M. Klein, A. Hinderks, M. Schrepp, and J. Thomaschewski, “Measuring User
Experience Quality of Voice Assistants,” CISTI, Jun. 2020, pp. 14, doi:
[5] F. Le Pailleur, B. Huang, P.-M. Léger, and S. Sénécal, “A New Approach to
Measure User Experience with Voice-Controlled Intelligent Assistants: A Pilot
Study,” in Human-Computer Interaction. Multimodal and Natural Interaction,
2020, pp. 197208.
[6] A. M. Klein, A. Hinderks, M. Rauschenberger, and J. Thomaschewski,
“Exploring Voice Assistant Risks and Potential with Technology-based Users,”
WEBIST '20 -Volume 1: WEBIST, 2020, pp. 147154, doi:
[7] K. Peffers, T. Tuunanen, M. A. Rothenberger, and S. Chatterjee, “A Design
Science Research Methodology for Information Systems Research,” J. Manag.
Inf. Syst., vol. 24, no. 3, pp. 4577, 2007, doi: 10.2753/MIS0742-1222240302.
[8] B. Kitchenham and S. Charters, “Guidelines for performing Systematic
Literature Reviews in Software Engineering,” 2007. [Online]. Available:
[9] M. Schrepp and J. Thomaschewski, “Design and Validation of a Framework for
the Creation of User Experience Questionnaires,” Int. J. Interact. Multimed. Artif.
Intell., p. S. 8895, 2019, doi: 10.9781/ijimai.2019.06.006.
[10] J. Lau, B. Zimmerman, and F. Schaub, “Alexa, Are You Listening? Privacy
Perceptions, Concerns and Privacy-Seeking Behaviors with Smart Speakers,”
Proc. ACM Hum.-Comput. Interact., vol. 2, no. CSCW, Nov. 2018, doi:
[11] M. Porcheron, J. E. Fischer, S. Reeves, and S. Sharples, “Voice Interfaces in
Everyday Life,” CHI, 2018, pp. 112, doi: 10.1145/3173574.3174214.
[12] B. Laugwitz, T. Held, and M. Schrepp, “Construction and Evaluation of a User
Experience Questionnaire,” in HCI and Usability for Education and Work, 2008.
[13] A. M. Klein, A. Hinderks, M. Schrepp, and J. Thomaschewski, “Construction of
UEQ+ Scales for Voice Quality MuC '20, 2020, pp. 15, doi:
[14] F. Masina et al., “Investigating the Accessibility of Voice Assistants With
Impaired Users: Mixed Methods Study,” J. Med. Internet Res., 2020, doi:
... We also assumed there would be improvement requirements regarding speech intelligibility because of, e.g., understanding the context in which the VA is used. To improve VAs and meet users' needs, the context of use has to be captured comprehensively [14]. ...
... Intensive VA users can encompass a wide range of user groups, such as persons with disabilities. Therefore, we intend to ask them about how they use VAs in order to identify research gaps regarding assessment methods and the VA context of use [14]. Our study results can help extend measurement methods, e.g., scale construction for the UEQ+ framework regarding VUI assessment [16]. ...
... Furthermore, an extension of the UEQ+ framework is in progress to capture the context of use generally, especially for VUI devices. The UEQ+ framework is an essential part of a planned UX measurement toolbox for VUIs, allowing one to select a suitable assessment method to evaluate a VUI test object [14]. ...
Currently, voice assistants (VAs) are trendy and highly available. The VA adoption rate of internet users differs among European countries and also in the global view. Due to speech intelligibility and privacy concerns, using VAs is challenging. Additionally, user experience (UX) assessment methods and VA improvement possibilities are still missing, but are urgently needed to overcome users’ concerns and increase the adoption rate. Therefore, we conducted an intercultural study of technology-based users from Germany and Spain, expecting that higher improvement potential would outweigh concerns about VAs. We investigated VA use in terms of availability versus actual use, usage patterns, concerns, and improvement proposals. Comparing Germany and Spain, our findings show that nearly the same amount of intensive VA use is found in both technology-based user groups. Despite cultural differences, further results show very similar tendencies, e.g., frequency of use, privacy concerns, and demand for VA improvements.
Conference Paper
Full-text available
Voice user interfaces (VUIs) or voice assistants (VAs) such as Google Home or Google Assistant (Google), Cortana (Mircosoft), Siri (Apple) or Alexa (Amazon) are highly available in the consumer sector and present a smart home trend. Still, the acceptance seems to be culture-dependent, while the syntax of communication poses a challenge. So, there are some basic questions: 'Why do people buy VAs?' 'What do they use them for?' 'What could be improved in the future?'. We explore the opinion of a German technology-based user group to identify the challenges and opportunities of VAs. We focus on the interaction behaviour, frequency of use, concerns, and opinions of this target group as they show a higher variety of interaction as well as privacy concerns in representative population studies. Our preliminary findings confirm previous results (missing accuracy of commands and serious concerns about privacy issues) and show that technology-based users from Germany are intensive users, although with particular concerns about data collection. Probably, there is a correlation between privacy concerns and speech intelligibility as queries relating to VAs are problematic due to repetitions and refinement.
Conference Paper
Full-text available
The UEQ+ is a modular framework for the construction of UX questionnaires. The researcher can pick those scales that fit his or her research question from a list of 16 available UX scales. Currently, no UEQ+ scales are available to allow measuring the quality of voice interactions. Given that this type of interaction is increasingly essential for the usage of digital products, this is a severe limitation of the possible products and usage scenarios that can be evaluated using the UEQ+. We describe in this paper the construction of three specific scales to measure the UX of voice interactions. Besides, we discuss how these new scales can be combined with existing UEQ+ scales in evaluation projects. CCS CONCEPTS • Human-centred computing • Human computer interaction • HCI design and evaluation methods
Full-text available
Background: Voice assistants allow users to control appliances and functions of a smart home by simply uttering a few words. Such systems hold the potential to significantly help users with motor and cognitive disabilities who currently depend on their caregiver even for basic needs (eg, opening a door). The research on voice assistants is mainly dedicated to able-bodied users, and studies evaluating the accessibility of such systems are still sparse and fail to account for the participants' actual motor, linguistic, and cognitive abilities. Objective: The aim of this work is to investigate whether cognitive and/or linguistic functions could predict user performance in operating an off-the-shelf voice assistant (Google Home). Methods: A group of users with disabilities (n=16) was invited to a living laboratory and asked to interact with the system. Besides collecting data on their performance and experience with the system, their cognitive and linguistic skills were assessed using standardized inventories. The identification of predictors (cognitive and/or linguistic) capable of accounting for an efficient interaction with the voice assistant was investigated by performing multiple linear regression models. The best model was identified by adopting a selection strategy based on the Akaike information criterion (AIC). Results: For users with disabilities, the effectiveness of interacting with a voice assistant is predicted by the Mini-Mental State Examination (MMSE) and the Robertson Dysarthria Profile (specifically, the ability to repeat sentences), as the best model shows (AIC=130.11). Conclusions: Users with motor, linguistic, and cognitive impairments can effectively interact with voice assistants, given specific levels of residual cognitive and linguistic skills. More specifically, our paper advances practical indicators to predict the level of accessibility of speech-based interactive systems. Finally, accessibility design guidelines are introduced based on the performance results observed in users with disabilities.
Full-text available
Existing user experience questionnaires have a fixed number of scales. Each of these scales measures a distinct aspect of user experience. These questionnaires can be used with little effort and provide a number of useful support materials that make the application of such a questionnaire quite easy. However, in practical evaluation scenarios it can happen that none of the existing questionnaires contains all scales necessary to answer the research question. It is of course possible to combine several UX questionnaires in such cases, but due to the variations of item formats this is also not an optimal solution. In this paper, we describe the development and first validation studies of a modular framework that allows the creation of user experience questionnaires that fit perfectly to a given research question. The framework contains several scales that measure different UX aspects. These scales can be combined to cover the relevant research questions.
Conference Paper
Full-text available
User experience (UX) has become an important aspect in the evaluation of interactive systems. In parallel, conversational interfaces have been increasingly used in many work and everyday settings. Although there have been various methods developed to evaluate conversational interfaces, there has been a lack of methods specifically focusing on evaluating user experience. This study reviews the six main questionnaires for evaluating conversational systems in order to assess the potential suitability of these questionnaires to measure various UX dimensions. We found that (i) four questionnaires included assessment items, in varying extents, to measure hedonic, aesthetic and pragmatic dimensions of UX; (ii) two questionnaires assessed affect, and one assessed frustration dimension; and, (iii) enchantment, playfulness and motivation dimensions have not been covered sufficiently by any questionnaires. We recommend using multiple questionnaires to obtain a more complete measurement of user experience or improve the assessment of a particular UX dimension.
Full-text available
Voice assistants are software agents that can interpret human speech and respond via synthesized voices. Apple’s Siri, Amazon’s Alexa, Microsoft’s Cortana, and Google’s Assistant are the most popular voice assistants and are embedded in smartphones or dedicated home speakers. Users can ask their assistants questions, control home automation devices and media playback via voice, and manage other basic tasks such as email, to-do lists, and calendars with verbal commands. This column will explore the basic workings and common features of today’s voice assistants. It will also discuss some of the privacy and security issues inherent to voice assistants and some potential future uses for these devices. As voice assistants become more widely used, librarians will want to be familiar with their operation and perhaps consider them as a means to deliver library services and materials.
Voice-controlled intelligent assistants use a conversational user interface (CUI), a system that relies on natural language processing and artificial intelligence to have verbal interactions with end-users. In this research, we propose a multi-method approach to assess user experience with a smart voice assistant through triangulation of psychometric and psychophysiological measures. The approach aims to develop a richer understanding of what the users experience during the interaction, which could provide new insights to researchers and developers in the field of voice assistant. We apply this new approach in a pilot study, and we show that each method captures a part of emotional variance during the interaction. Results suggest that emotional valence is better captured with psychometric measures, whereas arousal is better detected with psychophysiological measures.
Smart speakers with voice assistants, like Amazon Echo and Google Home, provide benefits and convenience but also raise privacy concerns due to their continuously listening microphones. We studied people's reasons for and against adopting smart speakers, their privacy perceptions and concerns, and their privacy-seeking behaviors around smart speakers. We conducted a diary study and interviews with seventeen smart speaker users and interviews with seventeen non-users. We found that many non-users did not see the utility of smart speakers or did not trust speaker companies. In contrast, users express few privacy concerns, but their rationalizations indicate an incomplete understanding of privacy risks, a complicated trust relationship with speaker companies, and a reliance on the socio-technical context in which smart speakers reside. Users trade privacy for convenience with different levels of deliberation and privacy resignation. Privacy tensions arise between primary, secondary, and incidental users of smart speakers. Finally, current smart speaker privacy controls are rarely used, as they are not well-aligned with users' needs. Our findings can inform future smart speaker designs; in particular we recommend better integrating privacy controls into smart speaker interaction.
Conference Paper
Voice User Interfaces (VUIs) are becoming ubiquitously available, being embedded both into everyday mobility via smartphones, and into the life of the home via ‘assistant’ devices. Yet, exactly how users of such devices practically thread that use into their everyday social interactions remains underexplored. By collecting and studying audio data from month-long deployments of the Amazon Echo in participants’ homes—informed by ethnomethodology and conversation analysis—our study documents the methodical practices of VUI users, and how that use is accomplished in the complex social life of the home. Data we present shows how the device is made accountable to and embedded into conversational settings like family dinners where various simultaneous activities are being achieved. We discuss how the VUI is finely coordinated with the sequential organisation of talk. Finally, we locate implications for the accountability of VUI interaction, request and response design, and raise conceptual challenges to the notion of designing ‘conversational’ interfaces.