Conference Paper

Voice Activated Personal Assistant: Acceptability of Use in the Public Space

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Voice interface is becoming a common feature in mobile devices such as tablets and smartphones. Moreover, voice recognition technology is touted to mature and become the default method to control of a variety of interfaces, including mobile devices. Thus, it is critical to understand the factors that influence the use of voice activated applications in the public domain. The present study examined how the perceived acceptability of using the Voice-Activated Personal Assistant (VAPA) in smartphones influences its reported use. Participants were U.S. smartphone users recruited from Amazon Mechanical Turk. Results showed that participants preferred using the VAPA in a private location, such as their home, but even in that environment, they were hesitant about using it to input private or personally identifying information in comparison to more general, non-private information. Participants’ perceived social acceptability of using the VAPA to transmit information in different contexts could explain these preferred usage patterns.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Pradhan and colleagues studied IVA use among individuals with vision impairments and found similar uses and placement, but also examined factors influencing the accessibility of the device [39]. Researchers have also explored IVA use in public places and how the use of voice-controlled devices impact users [17,27,49]. These studies have contributed a better understanding of IVA use but have also uncovered barriers. ...
... Once adopted, users sometimes assign "human-like" qualities to their device [29] and thereby, establish greater expectations for interaction [13,29] and conversation [9] that are not met by the device. In addition, privacy concerns continue to persist after adoption, causing some to alter their use [13,17,26,30]. Identifying barriers to IVA use and adoption help designers better understand the challenges that negatively impact use. ...
... While we did encounter participants that were apathetic about news of "Alexa listening" [39,51], similar to other studies, privacy emerged as a significant concern and a reason for limiting use [13,17,26,30]. However, some participants' initial interests in using Echo, even for casual tasks or convenience, were sometimes limited by secondary household users due to privacy concerns or differing interaction needs. ...
... b Cannot be explicitly assigned. such as At home, The car, and In public places [3,104,125]. Three of our participants reported using VUIs with smartphones In public places, which could be stationary (e.g., in a restaurant) or mobile (e.g., while walking around a city). ...
... They also interact with applications for Navigation to a location while walking or driving (i.e., while mobile). Our interviewees from the qualitative study also prefer locations such as At home and Mobile [68], which are typical for VUI interaction [3,104,161]. ...
Thesis
Voice user interfaces (VUIs) such as Amazon Alexa, Apple Siri, and Google Assistant are widely used, readily available, and seamlessly integrated into everyday life. They have become more intelligent due to recent advances in artifcial intelligence, which provides new methods of processing contextual information. Despite their widespread use and recent innovations, VUIs face challenges regarding intelligibility, human-like conversation, and privacy. Only a tiny fraction of users perceive VUIs as intelligent and trustworthy as humans. User experience (UX) evaluation is anchored in the human-centered design process. UX is a holistic view of the user’s perception of interaction. The prominent role of UX evaluation methods for designs with graphical user interfaces (GUIs) refects their dominance in computer-based technology. Furthermore, methods are often tailored to specific measurement contexts. Therefore, the human-computer interaction community requires a flexible and adaptable UX evaluation for VUIs. The core goal of this dissertation is to provide context-dependent UX measurement recommendations for VUIs. We apply the standardized design science research methodology. Our approach is based on the User Experience Questionnaire Plus (UEQ+) framework, which allows fexible assessment. One can select from several UX scales measuring distinct aspects to form a questionnaire. However, the UEQ+ was mainly developed to assess GUI-equipped designs. Thus, we contribute three scales measuring relevant UX aspects for VUIs: Response Behavior, Response Quality, and Comprehensibility. We also offer a conceptual structure of the VUI context of use. By applying this structure, we can select relevant UEQ+ scales and customize the questionnaire to fit any context. This enables recommendations for context-dependent UX assessment for VUIs and provides a new flexible measurement method for better evaluation of voice technology.
... Second, scholars have attempted to conceptualize and define smart technologies by referring to existing devices on the market (e.g., Moorthy and Vu 2014;), thereby disregarding the heterogeneity of smart products (see , and neglecting the speed of their technological development. Yet, as recent examples of smart devices (most notably, Google Glasses) have illustrated (Rauschnabel, Brem, and Ivens 2018), the industry's technological turnover rates and consumers' buying patterns cause individual technologies to disappear from the market almost as quickly as they are introduced. ...
... Thus, the voice -interaction software (e.g., Amazon's Alexa) always requires some type of carrier medium (e.g., a smart speaker like the Amazon Echo). Researchers have found that the carrier medium type and the context in which SVITs are used (e.g., in-home vs. out-of-home) influenced consumers' use patterns of the technology(Moorthy and Vu 2014;Cowan et al. 2017). Therefore, it is necessary to distinguish not only between different kinds of SVITs but also between different technology usage contexts(Ng and Wakenshaw 2017). ...
Thesis
Smart (home) devices, often comprising some degree of artificial intelligence, have recently gained centrality in consumers’ lives. Likewise, marketing research shows growing interest in consumers’ use of smart technology, which has resulted in a plethora of works on the topic. However, extant research projects have tended to either take a prophetic, future-oriented or prematurely specific stance. Hence, a substantial theoretical understanding of consumption experiences with smart technology is as of yet missing. Adopting a consumer behavior and service marketing perspective, this thesis aims to close this research gap. Across four research projects, both conceptual and empirical, this dissertation first delimits and specifies the phenomenon of smart digital consumption, before analyzing the transformative impact of smart devices on consumers’ domestic contexts. Additionally, this thesis investigates how consumers build and maintain trust in their smart devices (in this case, smart voice-interaction technologies), and finally examines the hybrid influence of digital and analog contexts on smart service value generation. The findings of this thesis suggest that if marketing researchers aim to contribute to meaningful knowledge about consumers’ smart technology use and want to generate original research results, they first need to establish a more contextual understanding of smart technologies as such and their impact on consumption experiences. To stimulate scientific progress, this thesis concludes by identifying avenues for future research.
... Additionally, American participants express preferences for the input of data using VA from non-private information over private information. As private information is unwillingly submitted to VAs in public places in the presence of other people, it is perceived as unacceptable [5]. ...
... That technology-based users display deep concerns about privacy issues could be explained by the regularly appearing security news of, e.g., DDoS-Attacks with the Internet of Things [24]. Already in 2014, Americans expressed privacy concerns about using Voice-Activated Personal Assistants (VAPA) in public [5]. A Spanish study revealed the main disadvantages of using VAs: lack of speech intelligibility (27%) and security/privacy (9.5%) [21]. ...
Article
Currently, voice assistants (VAs) are trendy and highly available. The VA adoption rate of internet users differs among European countries and also in the global view. Due to speech intelligibility and privacy concerns, using VAs is challenging. Additionally, user experience (UX) assessment methods and VA improvement possibilities are still missing, but are urgently needed to overcome users’ concerns and increase the adoption rate. Therefore, we conducted an intercultural study of technology-based users from Germany and Spain, expecting that higher improvement potential would outweigh concerns about VAs. We investigated VA use in terms of availability versus actual use, usage patterns, concerns, and improvement proposals. Comparing Germany and Spain, our findings show that nearly the same amount of intensive VA use is found in both technology-based user groups. Despite cultural differences, further results show very similar tendencies, e.g., frequency of use, privacy concerns, and demand for VA improvements.
... A body of work investigated how users interact with conversational agents [6,8,9,15,29,40]. Clark et al. investigated dierences between human-to-human and human-to-agent conversations. ...
... Users were often frustrated by the need to combine touch and speech interaction to interact to conversational agents, e.g., selecting a contact to call or unlocking the phone before a query can be entered [9]. Further, users prefer to enter non-private data to conversational agents [15,29] and to use conversational agents in safe, domestic environments [29]. Reported reasons for avoiding speech interaction in public were mainly privacy concerns [29], embarrassment in front of strangers [8,29] and cultural factors [8]. ...
... An online survey [23] showed that participants generally prefer to use a voice assistant in private locations. This holds particularly true for private information, that most users do not like to share aloud. ...
... cp < .001) and group 4. (15.25,CI[6.72,23.78], p < .001, ...
... While this type of finger identification input has been widely studied in devices such as smartphones [41] and smartwatches [25,39,59], its adaptation to the out-of-sight form factor of earbuds is novel. In addition, compared to alternative interaction techniques, such as verbal interaction, which are impractical in public settings (e.g., multi-party conversation) [18,19] or touch screens that require visual attention and can be distracting (e.g., while walking or driving) [7,21], our technique uses simple finger touches, offering direct, straightforward interaction even when eyes-free and mobile. Our final study shows the practical implications of finger identification input for enhancing earbud use in real-world scenarios and applications. ...
Preprint
Wireless earbuds are an appealing platform for wearable computing on-the-go. However, their small size and out-of-view location mean they support limited different inputs. We propose finger identification input on earbuds as a novel technique to resolve these problems. This technique involves associating touches by different fingers with different responses. To enable it on earbuds, we adapted prior work on smartwatches to develop a wireless earbud featuring a magnetometer that detects fields from a magnetic ring. A first study reveals participants achieve rapid, precise earbud touches with different fingers, even while mobile (time: 0.98s, errors: 5.6%). Furthermore, touching fingers can be accurately classified (96.9%). A second study shows strong performance with a more expressive technique involving multi-finger double-taps (inter-touch time: 0.39s, errors: 2.8%) while maintaining high accuracy (94.7%). We close by exploring and evaluating the design of earbud finger identification applications and demonstrating the feasibility of our system on low-resource devices.
... Overestimating privacy risks is leading users to limit or abandon the use of IVAs (Cobb et al., 2021, pp. 54-75;Easwara Moorthy & Vu, 2014;Pradhan, Mehta, & Findlater, 2018;Tabassum et al., 2019). If privacy risks are underestimated, users are more likely to disregard privacy concerns and disclose more personal data to take advantage of the perceived benefits of IVAs (Kang & Oh, 2023;Ketelaar & Van Balen, 2018). ...
... A recent study reveals that 43% of Americans say they have used a VA (either via their mobile phone or in-home smart speaker) in the past month (Lis 2022); the impact of VA commerce in the United States is projected to reach close to $20B by the end of 2024 (Statista 2023). Because of their "always-on" presence in the home environment, smart speakers offer an unfiltered look into the more intimate needs and preferences of consumers (Hoyer et al. 2020;Moorthy and Vu 2014) and provide a form of personalized interaction that is transforming how consumers make purchase decisions (Hsieh and Lee 2021;Jones 2018). In 2023, more than 47 million VA users made a purchase using a smart speaker (Statista 2023), and experts predict that these VAs will revolutionize the way consumers interact with brands, potentially replacing other technologies for many shopping activities (Gartner 2018;Labecki, Klaus, and Zaichkowsky 2018;Lis 2022). ...
... Additionally, the key technologies for the integration of a voice assistant on various platforms were analyzed. In the article "Voice Active Personal Assistant: Acceptability of Use in the Public Space", voice recognition technology is developing rapidly and will not only be a feature of smart devices but also cars and household appliances [8]. In addition, in [9] it is argued in the research "Digital Assistant for the Visually Impaired", that resources based on speech recognition technology, such as the Google Cloud Speech-to-Text API, can be useful for designing applications that allow visually impaired people to interact with the services available on said platform. ...
Article
Full-text available
Voice recognition technology has gained popularity due to the creation of various applications that allow us to perform tasks using voice on devices such as Amazon Alexa and Siri, among others. Today, applications based on this type of technology have begun to be developed to provide better accessibility for people with visual disabilities. This has been possible through Artificial Intelligence (AI), as a branch of Computer Science, which has focused on the development of algorithms and systems that mimic human cognitive capacity. Witnessing how different applications have developed in recent years, including natural language processing, computer vision, robotics, and task automation. In this study, a systematic review of the literature is carried out to analyze developments trends, technologies used, and application areas that use voice recognition through digital assistants in order to identify how these technologies can support people with visual disability. In addition, a graphic taxonomy is presented to synthesizes the information considered. For this review 56 relevant articles were selected from a total of 4048, following 5 quality criteria.
... Prior research has pointed out the significance of consumer attitudes in forming technology-related adoption behaviour (Chang et al., 2005;Hassanein & Head, 2007;Poushneh, 2021). This notion is favoured in technology adoption literature that heavily depends on theories such as the Technology Acceptance Model (TAM), Use of Technology Model and Value-based Adoption Model (Moorthy & Vu, 2014;Sohn & Kwon, 2020). ...
Article
Full-text available
There is an emerging interest in examining user attitudes towards voice assistants (VAs); however, there is limited research on how user attitudes are formulated in different contexts. Drawing from the stereotype content models, the current study attempts to investigate how users perceive and evaluate voice assistants (VAs) in different contexts (i.e., functional vs. social tasks) based on warmth, competence and trustworthiness. Study 1 (N = 123) employs a within‐subjects design to examine how task type (functional vs. social) affects user perceptions and attitudes towards a VA (i.e., Google Assistant). Study 2 (N = 116) and Study 3 (N = 61) examine the boundary effect of perceived psychological power and ease of use. The findings show that attitude is significantly more positive in functional tasks (vs. social), and this effect is mediated by perceived competence. This indirect effect is also significantly moderated by perceived ease of use. Perceived warmth does not mediate the effect of social tasks on attitude, and trust in VAs is a direct outcome of functional tasks. Taken together, this study contributes to both theory and practice in many ways. Specifically, the findings are the first to demonstrate a direct effect of task type on consumer perceptions and attitudes. Additionally, the findings indicate that user evaluations of VAs are still dominated by user perceptions of the competence of the VAs.
... As the adoption of voice assistants grows, understanding how and why people interact with these AI-enabled systems is becoming increasingly important. Moorthy and Vu (2014) found that users preferred to use the voice assistant tool embedded in their smartphones in private rather than a public space. Reis et al. (2017) examined how the features of stand-alone voice assistant devices helped strengthen social bonds among older adults. ...
Article
Full-text available
From requesting Alexa to set a reminder to asking Google Assistant to make a call, artificial intelligence-enabled voice assistants are quickly melding into our lives. This study aims to understand why users interact with a voice assistant system. Results from an online survey identified four types of motivations underlying the use of voice assistants: entertainment, companionship, dynamic control, and functional utility. Results showed that functional utility and dynamic control were positively related to users’ satisfaction, while companionship and entertainment were not. The effect of social presence on users’ satisfaction was also explored. The moderation analyses showed that social presence not only had a main effect but also played a significant role in increasing satisfaction among the users who perceived low levels of functional utility and dynamic control. This study advances a growing body of human-AI interaction literature by demonstrating the underlying mechanism behind voice assistants’ use. Practical and theoretical implications are also discussed.
... Even though voice assistants are highly in demand, yet they are surrounded by some acceptability and security concerns. In a recent study, it has been concluded that users often feel uncomfortable and embarrassed while using voice assistants in public [5]. Their concerns mostly stem from the fact that their voice commands are audible and therefore, accessible to strangers, which results in hesitation and therefore, an alteration in their attitude in a social setting [4]. ...
Article
Full-text available
Although voice recognition systems and techniques existed for a long time, but it is only now, with the advent of voice assistants that they are rapidly becoming an integral part of our daily lives. A voice assistant is always with you-in your car's navigation system, tablet computer, smart phone, smart TV, smart watch, etc.-ready to help you anytime, anywhere. This digital technology, which is still in its growing phase, is a result of constant evolution in the areas of Artificial Intelligence, Machine Learning and Natural Language Processing. Some of the most successful voice assistants are Google Assistant, Apple's Siri, Amazon's Alexa, Samsung's Bixby and Microsoft's Cortana. Owing to their intelligence, convenience and user-friendliness, assistants are becoming increasingly popular with the next generation. This paper will discuss the underlying working principle of voice assistants, their applications, security and privacy issues and future promises that they hold for us.
... .15. T-SNE visualization of the HCNN outputs when using the data of Volunteer 1 and Volunteer 5 as the testing data. ...
Article
Full-text available
As a natural and convenient interaction modality, voice input has now become indispensable to smart devices (e.g. mobile phones and smart appliances). However, voice input is strongly constrained by surroundings and may raise privacy leakage in public areas. In this paper, we present SoundLip, an end-to-end interaction system enabling users to interact with smart devices via silent voice input. The key insight is to use inaudible acoustic signals to capture the lip movements of users when they issue commands. Previous works have considered lip reading as a naive classification task and thus can only recognize individual words. In contrast, our proposed system enables lip reading at both word and sentence levels, which are more suitable for daily-life use. We exploit the built-in speakers and microphones of smart devices to emit acoustic signals and listen to their reflections, respectively. In order to better abstract representations from multi-frequency and multi-modality acoustic signals, we elaborate a hierarchical convolutional neural network (HCNN) to serve as the front-end as well as recognize individual word commands. Then, for the sentence-level recognition, we exploit a multi-task encoder-decoder network to get around temporal segmentation and output sentences in an end-to-end way. We evaluate SoundLip on 20 individual words and 70 sentences from 12 participants. Our system achieves an accuracy of 91.2% at word-level and a word error rate of 7.1% at sentence-level in both user-independent and environment-independent settings. Given its innovative solution and promising performance, we believe that SoundLip has made a significant contribution to the advancement of silent voice input technology.
... Text-based conversational agents have been extensively studied in the domain of digital health interventions [80][81][82][83] and can be considered as a precursor to VCAs [9]. Moreover, voice modality may differ in their appropriateness of app, compared with text modality, depending on the health-related context (eg, public spaces [84,85] and type of user [24][25][26]86,87]). Thus, future research should not only standardize the research in terms of implementation and evaluation measures but also consistently evaluate this technology against what we could consider the gold standard of conversational agents. ...
Article
Full-text available
Background: This systematic literature review aims to provide a better understanding of the current methods on VCAs delivering interventions for the prevention and management of chronic and mental conditions. Objective: This systematic literature review aims to provide a better understanding of the current methods on VCAs delivering interventions for the prevention and management of chronic and mental conditions. Methods: We conducted a systematic literature review using PubMed Medline, EMBASE, PsycINFO, Scopus, and Web of Science databases. We included primary research involving the prevention and/or management of chronic or mental conditions through a VCA and reporting an empirical evaluation of the system in terms of system accuracy and/or in terms of technology acceptance. Two independent reviewers conducted screening and data extraction and measured their agreement with Cohen’s kappa. A narrative approach was applied to synthesize the selected records. Results: Twelve out of 7’170 articles met the inclusion criteria. All studies were non-experimental. The VCAs provided behavioral support (N=5), health monitoring services (N=3), or both (N=4). The interventions were delivered via smartphone (N=5), tablet (N=2), or smart speakers (N=3). In two cases, no device was specified. Three VCAs targeted cancer, while two VCAs each targeted diabetes and heart failure. The other VCAs targeted hearing-impairment, asthma, Parkinson's disease, dementia and autism, “intellectual disability”, and depression. The majority of the studies (N=7) assessed technology acceptance but only a minority (N=3) used validated instruments. Half of the studies (N=6) reported either performance measures on speech recognition or on the ability of VCA’s to respond to health-related queries. Only a minority of the studies (N=2) reported behavioral measure or a measure of attitudes towards intervention-related health behavior. Moreover, only a minority of studies (N=4) reported controlling for participant’s previous experience with technology. Finally, risk bias varied markedly. Conclusions: The heterogeneity in the methods, the limited number of studies identified, and the high risk of bias, show that research on VCAs for chronic and mental conditions is still in its infancy. Although results in system accuracy and technology acceptance are encouraging, there still is a need to establish more conclusive evidence on the efficacy of VCAs for the prevention and management of chronic and mental conditions, both in absolute terms and in comparison to standard healthcare.
... We explicitly leave aside legal questions, and we assume that the IoT service provider handles any user data as described in it's privacy policy. We also do not discuss the responsibility of the users, say, not to use voice assistants in public spaces [40]. In particular, we make the following contributions: ...
Conference Paper
Voice assistants like Amazon Alexa, Google Assistant or Siri are becoming increasingly popular. Such assistants allow for complex interactions with smart Internet-of-Things (IoT) devices that do not have a traditional user interface, such as monitor and keyboard. However, while voice assistants foster the proliferation of numerous convenient services from smart homes to connected cars, they are problematic from the perspective of user privacy. In many cases, IoT devices are permanently listening for keywords in sensitive areas such as living rooms or bed rooms. Once such a word is recognized, voice samples are sent to the voice-assistant provider into the cloud for further analyses. We explore how the users of IoT devices can anonymize the voice recordings sent to the voice-assistant provider. To this end, we identify categories of information sent to the provider, we describe an anonymization approach based on dummy voice commands, and we describe a prototypical anonymization device based on a Raspberry PI. Our device confirms that it is possible to anonymize some information sent to Alexa with limited inconveniences for the user.
... Second, researchers conceptualize SVITs by discovering their possible range of abilities (e.g., online shopping or controlling other smart devices; Cowan et al. 2017;Li and Lee 2017;Manikonda et al. 2017;Porcheron et al. 2017;Chen and Wang 2018;Knote et al. 2018;Santos et al. 2018). Third, another category of definitions conceptualizes SVITs by relating the concept with consumer technologies available in the market, such as Amazon's Alexa or Apple's Siri (Moorthy and Vu 2014;Kiseleva et al. 2016;Vyturina et al. 2017;Lopatovska et al. 2018). ...
Chapter
Full-text available
AI-based voice assistant (VA) technologies are facing an unprecedented growth. VA are available as a standalone device like Amazon Echo dot or Google home and also as an extension such as Google maps and OK Google. Extant research has mostly focused on the device specific characteristics to explain the adoption of VA. In this research, we take a different approach and examine the psychological determinants of VA adoption. We look at how factors such as playfulness, escapism, anthropomorphism, and visual appeal of VA influence the attitudes (hedonic and utilitarian) of consumers. Moreover, we also examine the effects of psychological characteristics of VA on usage intentions and satisfaction, which lead to a favorable word-of-mouth (WOM) behavior that is critical for adoption of a technology. Using a structural equation modeling approach, our results suggest that psychological factors have a significant positive influence on both attitudes. Hedonic attitude further influences satisfaction and utilitarian attitude positively impacts usage and satisfaction, which have a positive association with WOM. Our research offers useful insights to marketers to increase the VA adoption and makes contributions to the literature.
... An SA built with custom features and voice-triggered communication tailored to meet user's requirements will significantly augment the user's capabilities to accomplish a task. To this end, a voice-based interaction is hypothesized to be better than other modes of communication in making the task execution quick, efficient, and user-friendly [195,196]. Some of the other technologies, such as personal digital assistants (PDA), mobile applications, and software enabled touch screens, are also used to assist a performer in achieving a task goal. ...
Thesis
With the incorporation of artificial intelligence into 21st-century machines, the collaboration between humans and machines has become quite complex for real-time applications. The role of a synthetic or artificial assistant in everyday tasks such as setting up reminders, managing calendars, and responding to search queries may not pose a significant risk. However, the penetration of such synthetic assistants in virtually every field has opened a path for a new area called Human Machine Teaming (HMT). When it comes to crucial tasks such as patient treatment and care, defense, and industrial production, the use of non-standardized HMT technologies may pose risk to human lives as well as billions of taxpayer dollars. A thorough literature survey revealed that there are no standardization or benchmarking methods have been established for HMTs. This dissertation hypothesizes that to standardize an HMT, there is an inevitable need to first develop task tailored intelligent systems, customized HMT simulation methods, and measurement techniques. To address these hypothesized needs, this dissertation presents new design methodologies, simulations, and experiment validations for HMTs. In this dissertation, the conducted research is presented and discussed in five phases with some exclusive objectives. Phase I of the research study begins with an initial state-of-the-art literature survey. This includes analysis of all the available architectures and development methodologies as well as the establishment of a few conceptual basics that are essential for the HMT framework. Furthermore, the survey also discusses the different HMT components and human-machine systems (HMS) simulation methods available in the literature. Finally, the detailed objectives of the research needed to validate the stated hypotheses are discussed. In Phase II, all the metrics available to measure HMTs are analyzed with the aim of constructing a matrix of metrics sorted based on different classifications and relationships to HMT, to achieve a final goal of constructing a common set of metrics for HMT benchmarking. The metrics are gathered through a keyword based systemic review from popular scholarly repositories and analyzed using metadata of metrics, such as measurement type, face value, dependency on adjacent metrics, and available standardized measuring methods. From there, they are categorized into different sets and models to measure HMT performance. This meta-analysis resulted in a color-coded chart of HMT metrics that are presented in this phase. More specifically, it is a matrix of metrics sorted based on different classifications and relationships to HMT. Furthermore, a set of common metrics is drawn based on the above study, and the selection criteria established are presented in this phase, which can be repeated for any similar future study. Finally, this phase presents models that can be used to measure different HMT performances through selecting common metrics sets. Phase III discusses the development of intelligent systems that can be used as machines in HMTs. The tailored intelligent system can be called a synthetic agent (SA). This phase deals with SA in detail, particularly examining the backgrounds of SA and the continuous requirements of SA for this research. Furthermore, system design and detailed development of a voice-based synthetic assistant (VBSA) are also presented in this section. The VBSA constitutes a performance model of developed systems. The resultant voice-based synthetic assistant prototype is significant in constructing an HMT and is also effective in measuring an HMT’s different parameters, such as performance and efficiency. Finally, Phase III presents performance and operation analysis of the developed VBSA. Phase IV of this research consists of human-in-the-loop (HITL) simulation and human factor user studies of generalized HMT architectures using controlled HMT scenarios, such as emergency care provider (ECP) treating patients and visual data processing that represents real-world applications. As part of this HMT simulation studies, the impact of each parameter related to machines and humans versus HMT is presented from the perspective of performance, rules, roles, and operation limitations. This phase also presents statistical analyses of measured performances with respect to participant groups. These statistical analyses are used as evidence to understand HMTs and components of HMT behavior. Furthermore, Phase V presents guidelines for designing future HMTs and performing standardization studies in the pursuit of developing standardization techniques for benchmarking HMTs that can be used in critical situations. This phase concludes by rationally proving hypothesized research methods that include SA development, as metrics can be used to standardized future HMTs. Finally, future work is discussed in providing the guidelines for next-generation HMT research.
... Therefore, an SA built with custom features and voice-triggered communication significantly augments a medic's capabilities to accomplish a task. Moreover, a voice-based interaction is hypothesized to be better than other modes of communication in making the treatment process quick, efficient, and user-friendly (Jiang et al., 2015;Moorthy & Vu, 2014). Some of the other technologies, such as a pocket card with a barcode reader, mobile applications, and medical software with touch screens, are also used for information transfer between different medical teams. ...
Preprint
Full-text available
As part of a perennial project, our team is actively engaged in developing new synthetic assistant (SA) technologies to assist in training combat medics and medical first responders. It is critical that medical first responders are well trained to deal with emergencies more effectively. This would require real-time monitoring and feedback for each trainee. Therefore, we introduced a voice-based SA to augment the training process of medical first responders and enhance their performance in the field. The potential benefits of SAs include a reduction in training costs and enhanced monitoring mechanisms. Despite the increased usage of voice-based personal assistants (PAs) in day-to-day life, the associated effects are commonly neglected for a study of human factors. Therefore, this paper focuses on performance analysis of the developed voice-based SA in emergency care provider training for a selected emergency treatment scenario. The research discussed in this paper follows design science in developing proposed technology; at length, we discussed architecture and development and presented working results of voice-based SA. The empirical testing was conducted on two groups as user studies using statistical analysis tools, one trained with conventional methods and the other with the help of SA. The statistical results demonstrated the amplification in training efficacy and performance of medical responders powered by SA. Furthermore, the paper also discusses the accuracy and time of task execution (t) and concludes with the guidelines for resolving the identified problems.
... According to Bohn et al (2005), in any ambient intelligence system, such as the lamppost infrastructure, it is important to account for personal and social boundaries in addition to utilitarian value; on a related note, Koelle et al (2018) refer to intruding the social spheres of others. There is evidence that especially infrequent users feel socially embarrassed about using their speechbased assistants with other people present (Cowan et al, 2017;Moorthy & Vu, 2014). Furthermore, the adoption of (and interacting with) IPAs is affected by the transparency of data usage (Cowan et al, 2017) and by how trustworthy the IPA provider is considered by the user (Liao et al, 2019). ...
Conference Paper
Full-text available
Speech interactions are often associated with virtual assistants and smart home devices, designed primarily for private contexts. A less developed domain is speech interfaces in public contexts. In a smart city development project, we explored the potential of distributed conversational speech interfaces in lampposts. Deploying a research-through-design method, we created a lo-fi prototype of the speech interface that test subjects could interact with during experiments in a lab setting. Our first exploratory prototype consisted of a loudspeaker that acted as the interface and preconceived dialogues designed to investigate the boundaries of desirable and acceptable experiences regarding issues such as privacy. Experiencing the interaction with this rudimentary prototype helped people envision potential use cases and reflect on privacy issues: the dialogues revealed subjective limits of what kind of (personal) information people were willing to share with the lamppost. They also elicited thoughts on possible consequences in the social context of citizens.
... Many studies showed privacy concerns for the use of voice assistant and voice search in the public space [17,18,45]. Referring to a survey in 2016 [58], 39% of the smartphone consumers used voice assistants in the home but only 6% used in public. ...
Article
Speech input, such as voice assistant and voice message, is an attractive interaction option for mobile users today. However, despite its popularity, there is a use limitation for smartphone speech input: users need to press a button or say a wake word to activate it before use, which is not very convenient. To address it, we match the motion that brings the phone to mouth with the user's intention to use voice input. In this paper, we present ProxiTalk, an interaction technique that allows users to enable smartphone speech input by simply moving it close to their mouths. We study how users use ProxiTalk and systematically investigate the recognition abilities of various data sources (e.g., using a front camera to detect facial features, using two microphones to estimate the distance between phone and mouth). Results show that it is feasible to utilize the smartphone's built-in sensors and instruments to detect ProxiTalk use and classify gestures. An evaluation study shows that users can quickly acquire ProxiTalk and are willing to use it. In conclusion, our work provides the empirical support that ProxiTalk is a practical and promising option to enable smartphone speech input, which coexists with current trigger mechanisms.
... Diao et al. [16] discuss security problems that show how voice assistant components are potential security threats. Moorthy and Vu [38] discuss privacy issues that arise from using voice assistants in public such as being overheard. Indeed, privacy preferences are often nuanced and context dependent. ...
Article
Full-text available
Voice has become a widespread and commercially viable interaction mechanism with the introduction of voice assistants (VAs), such as Amazon’s Alexa, Apple’s Siri, Google Assistant, and Microsoft’s Cortana. Despite their prevalence, we do not have a detailed understanding of how these technologies are used in domestic spaces. To understand how people use VAs, we conducted interviews with 19 users, and analyzed the log files of 82 Amazon Alexa devices, totaling 193,665 commands, and 88 Google Home Devices, totaling 65,499 commands. In our analysis, we identified music, search, and IoT usage as the command categories most used by VA users. We explored how VAs are used in the home, investigated the role of VAs as scaffolding for Internet of Things device control, and characterized emergent issues of privacy for VA users. We conclude with implications for the design of VAs and for future research studies of VAs.
... In addition, as the popularity of VPA devices increases, so does research in the field. Moorthy and Vu (2014) attempted to understand the perceived acceptability of using VPAs in smartphones, and showed that smartphone users prefer to use VPAs in a private location rather than in a public space. Reis et al. (2017) analyzed the features of the commercialized VPA devices and preliminary proposed the best product to strengthen the elderlies' social bonds. ...
Article
Full-text available
With the development of artificial intelligence technology, the market for virtual personal assistant (VPA) devices is emerging as a new battleground for global information technology companies. This study develops a comprehensive research model, based on perceived value theory, to explain potential customers’ intentions to adopt and use VPA devices. It investigates the relationship between perceived usefulness, perceived enjoyment, and product-related characteristics (i.e., portability, automation, and visual attractiveness). The research model and hypotheses are evaluated through Partial least squares analysis, using 313 survey samples. The results show that perceived usefulness and enjoyment have a significant impact on usage intention. Among the three constructs reflecting software- and hardware-based utilitarian value, content quality has the strongest impact on perceived usefulness. From the perspective of hedonic value, content quality, which is also a utilitarian attribute of VPA devices, and visual attractiveness positively affect perceived enjoyment. This study concludes by discussing implications and offering useful suggestions for academia and practice.
... Therefore, an SA built with custom features and voice-triggered communication significantly augments a medic's capabilities to accomplish a task. Moreover, a voice-based interaction is hypothesized to be better than other modes of communication in making the treatment process quick, efficient, and user-friendly (Jiang et al. 2015;Moorthy and Vu 2014). Some of the other technologies, such as a pocket card with a barcode reader, mobile applications, and medical software with touch screens, are also used for information transfer between different medical teams. ...
Article
Full-text available
As part of a perennial project, our team is actively engaged in developing new synthetic assistant (SA) technologies to assist in training combat medics and medical first responders. It is critical that medical first responders are well trained to deal with emergencies more effectively. This would require real-time monitoring and feedback for each trainee. Therefore, we introduced a voice-based SA to augment the training process of medical first responders and enhance their performance in the field. The potential benefits of SAs include a reduction in training costs and enhanced monitoring mechanisms. Despite the increased usage of voice-based personal assistants (PAs) in day-to-day life, the associated effects are commonly neglected for a study of human factors. Therefore, this paper focuses on performance analysis of the developed voice-based SA in emergency care provider training for a selected emergency treatment scenario. The research discussed in this paper follows design science in developing proposed technology; at length, we discussed architecture and development and presented working results of voice-based SA. The empirical testing was conducted on two groups as user studies using statistical analysis tools, one trained with conventional methods and the other with the help of SA. The statistical results demonstrated the amplification in training efficacy and performance of medical responders powered by SA. Furthermore, the paper also discusses the accuracy and time of task execution (t) and concludes with the guidelines for resolving the identified problems.
Article
Voice user interface (VUI) systems, such as Alexa, Siri, and Google Assistant, are popular and widely available. Still, challenges such as privacy and the ability to have a dialog remain. In the latter example, the user expects a human‐like conversation, that is, that the VUI understands the dialog and its context. However, this VUI feature of context‐aware interaction is rather error prone. For this reason, we intend to explore the VUI context of use and its impact on interaction, that is, relevant user experience (UX). We see a demand for context‐dependent UX measurement because analyzing the context of use and UX assessment are both critical human‐centered design (HCD) methods. Therefore, we examine the VUI context of use by asking users about how, where, and for what they use VUIs, as well as their UX and improvement proposals. We interviewed people with disabilities who rely on VUIs and people without disabilities who use VUIs for convenience or fun. We identified VUI context‐of‐use categories and factors and explored their impacts on relevant UX qualities. Our result is a matrix containing these elements; thus, it provides an overview of the contextual UX of our target group's VUI interaction. We intend to develop a VUI context‐of‐use conceptual structure in the future based on this matrix, which is needed to create an automated context‐dependent UX measurement recommendation tool for VUIs. This conceptual structure could also be useful for automated UX testing in the context of VUI.
Article
We investigate silent speech as a hands-free selection method in eye-gaze pointing. We first propose a stripped-down image-based model that can recognize a small number of silent commands almost as fast as state-of-the-art speech recognition models. We then compare it with other hands-free selection methods (dwell, speech) in a Fitts' law study. Results revealed that speech and silent speech are comparable in throughput and selection time, but the latter is significantly more accurate than the other methods. A follow-up study revealed that target selection around the center of a display is significantly faster and more accurate, while around the top corners and the bottom are slower and error prone. We then present a method for selecting menu items with eye-gaze and silent speech. A study revealed that it significantly reduces task completion time and error rate.
Chapter
Full-text available
With Covid-19 upending many established norms, this research provides insights on how the pandemic has the potential to dilute decades of advancement in gender equality. To explore this issue, data was collected from female university teachers. It is possible that they may have more chances of experiencing the intensified gendered division of work. The study attempts to explore the impact of change in house help arrangement in Covid-19 on female university teachers. To understand this, firstly, the employment of house helps in pre Covid-19 was compared to house help employment during Covid-19. This difference was recognized as the added workload that resulted from the unemployment of house helps. Then finally, the reallocation of added workload shows how the burden was distributed among family members. Results unveil that women are shouldering the intensified burden brought by the pandemic way more than men. This research from a developing nation context sheds light on how the pre existing gender expectations are leading women to experience the burden of gendered labor more intensely in Covid-19.
Article
Full-text available
Personal virtual assistants (PVAs) based on artificial intelligence are frequently used in private contexts but have yet to find their way into the workplace. Regardless of their potential value for organizations, the relentless implementation of PVAs at the workplace is likely to run into employee resistance. To understand what motivates such resistance, it is necessary to investigate the primary motivators of human behavior, namely emotions. This paper uncovers emotions related to organizational PVA use, primarily focusing on threat emotions. To achieve our goal, we conducted an in-depth qualitative study, collecting data from 45 employees in focus-group discussions and individual interviews. We identified and categorized emotions according to the framework for classifying emotions Beaudry and Pinsonneault (2010) designed. Our results show that loss emotions, such as dissatisfaction and frustration, as well as deterrence emotions, such as fear and worry, constitute valuable cornerstones for the boundaries of organizational PVA use.
Chapter
Intelligent Personal Assistants (IPAs) are increasingly present in people’s daily lives, with natural language interaction allowed by voice assistants improving the user experience. In the television ecosystem, the integration of voice assistants is also enabling the interaction with media devices in a globally more satisfying way. Additionally, the inclusion of proactive behaviours is considered to be one of the critical factors for the improvement of the user experience. To better understand this dimension of voice assistants, the present article analyses the proactivity concept and examples of voice assistants that incorporate proactive behaviours. Also, voice assistants have been analysed in the television ecosystem and it was realized a lack of systems presenting a real proactive behaviour able to assist users in a television context.
Article
This case study seeks to increase understanding of how agency is fostered in human‐AI interaction by providing insight from Uber's development of a conversational voice‐user‐interface (VUI) for its driver application. Additionally, it provides user researchers with insight on how to identify agency's importance early in the product development process and communicate it effectively to product stakeholders. First, the case reviews the literature to provide a firm theoretical basis of agency. It then describes the implementation of a novel in‐car Wizard‐Of‐Oz study and its usefulness in identifying agency as a critical mediator of driver interaction with the VUI before software‐development. Afterward, three factors which impacted driver agency and product usage are discussed – conversational agency, use of the VUI in social contexts and perception of the VUI persona. Finally, the case describes strategies used to convince the engineering and product teams to prioritize features to increase agency. As a result, the findings led to substantive changes to the VUI to increase agency and enhance the user experience.
Article
Natural Language Interfaces allow human-computer interaction through the translation of human intention into devices’ control commands, analyzing the user’s speech or gestures. This novel interaction mode arises from advancements of artificial intelligence, expert systems, speech recognition, semantic web, dialog systems, and natural language processing, bringing the concept of Intelligent Personal Assistant (IPA). There is currently a vast literature on this subject. However, in the best of our knowledge, there is no thorough analysis of the state-of-the-art in the field. In this context, we present in this article a survey of the field, discussing the main trends, critical areas, and challenges of an IPA. Another contribution is the proposition of a taxonomy for IPA classification. The method used to achieve these objectives consisted of a systematic literature review based on the population, intervention, comparison, outcome, and context (PICOC) criteria. As a result, we started from more than 3472 scientific articles published in the last six years, searched on a set of databases chosen to increase the probability of finding highly relevant articles. The review selected the 58 most significant articles, identifying challenges and open questions. We also discuss in the article the current status, usage, security and privacy issues, types, and architectures regarding an IPA. We conclude that usability, security, and privacy directly affect the confidence of the user in adopting an IPA.
Conference Paper
Intelligent Personal Assistants (IPAs) are widely available on devices such as smartphones. However, most people do not use them regularly. Previous research has studied the experiences of frequent IPA users. Using qualitative methods we explore the experience of infrequent users: people who have tried IPAs, but choose not to use them regularly. Unsurprisingly infrequent users share some of the experiences of frequent users, e.g. frustration at limitations on fully hands-free interaction. Significant points of contrast and previously unidentified concerns also emerge. Cultural norms and social embarrassment take on added significance for infrequent users. Humanness of IPAs sparked comparisons with human assistants, juxtaposing their limitations. Most importantly, significant concerns emerged around privacy, monetization, data permanency and transparency. Drawing on these findings we discuss key challenges, including: designing for interruptability; reconsideration of the human metaphor; issues of trust and data ownership. Addressing these challenges may lead to more widespread IPA use.
Article
A review of popular technology adoption models identified several factors that are likely to influence Voice Activated Personal Assistant (VAPA) use in public spaces. To inform design decisions of how to make the private use of the VAPA in public spaces more acceptable from the users’ point of view, an online survey was conducted to investigate the likelihood of usage of the smartphone VAPA such as Apple’s Siri (compared to the usage of smartphone keyboard) as a function of location (private vs. public) and type of information (private vs. nonprivate). Responses from participants showed that users were more cautious of transmitting private than nonprivate information. This effect of type of information was amplified in the social context of public locations and when using conspicuous methods of information input such as the VAPA. Participants also preferred using the VAPA in private locations and showed no preference of location for keyboard entries. Correlations between likelihood of usage of VAPA and the social acceptability ratings were positive and predicted similar patterns of smartphone usage.
Conference Paper
Full-text available
This paper presents a study on privacy and secrecy requirements that users feel while in the presence of other people. They are viewed as issues of a social activity and pertain to the desire that the content of messages or the act of writing or reading them is not perceived by others. We assess the needs for privacy according to the message's themes and acquaintance type with the recipient. We also present and discuss our findings considering user strategies in coping with the required privacy using both a quantitative and a qualitative approach. The study results show clearly the need to consider those requirements in the design of messaging applications for mobile devices. Circa 50% of the messages analyzed required privacy on the act of writing/reading. The reasons are multifaceted and vary according to the addressees and the content type reaching 70% for specific cases. We close the paper, with a proposal of a personal, multimodal and inconspicuous communication framework, which not only allows users to define their vocabulary, but also entry and output methods from a range of different modalities.
Article
Full-text available
Computer systems cannot improve organizational performance if they aren't used. Unfortunately, resistance to end-user systems by managers and professionals is a widespread problem. To better predict, explain, and increase user acceptance, we need to better understand why people accept or reject computers. This research addresses the ability to predict peoples' computer acceptance from a measure of their intentions, and the ability to explain their intentions in terms of their attitudes, subjective norms, perceived usefulness, perceived ease of use, and related variables. In a longitudinal study of 107 users, intentions to use a specific system, measured after a one-hour introduction to the system, were correlated 0.35 with system use 14 weeks later. The intention-usage correlation was 0.63 at the end of this time period. Perceived usefulness strongly influenced peoples' intentions, explaining more than half of the variance in intentions at the end of 14 weeks. Perceived ease of use had a small but significant effect on intentions as well, although this effect subsided over time. Attitudes only partially mediated the effects of these beliefs on intentions. Subjective norms had no effect on intentions. These results suggest the possibility of simple but powerful models of the determinants of user acceptance, with practical value for evaluating systems and guiding managerial interventions aimed at reducing the problem of underutilized computer technology.
Conference Paper
Full-text available
We describe studies of preferences about information sharing aimed at identifying fundamental concerns with privacy and at understanding how people might abstract the details of sharing into higher-level classes of recipients and information that are treated similarly. Thirty people specified what information they are willing to share with whom.. Although people vary in their overall level of comfort in sharing, we identified key classes of recipients and information. Such abstractions highlight the promise of developing expressive controls for sharing and privacy.
Conference Paper
Full-text available
Mobile phones are becoming increasingly personalized in terms of the data they store and the types of services they provide. At the same time, field studies have reported that there are a variety of situations in which it is natural for people to share their phones with others. However, most mobile phones support a binary security model that offers all-or-nothing access to the phone. We interviewed 12 smartphone users to explore how security and data privacy concerns affected their willingness to share their mobile phones. The diversity of guest user categorizations and associated security constraints expressed by the participants suggests the need for a security model richer than today's binary model. Author Keywords Mobile phone sharing, phone privacy, phone security.
Conference Paper
Full-text available
The proliferation of cell phones has led to an ever increasing number of inappropriate interruptions. Context-aware telephony applications, in which callers are provided with context information about the receivers, has been proposed as a solution for this problem. This approach, however, raises many privacy issues that may render it infeasible. In this paper, we report on an in-situ study of user privacy preferences and patterns of sharing different types of context information with different social relations. We found that participants disclosed their context information generously, suggesting that context-aware telephony is not only feasible, but also desirable. Our data shows a distinct sharing pattern across social relations and different types of context information. We discuss the implications of the results for designers of context-aware telephony in particular and context- aware applications in general.
Conference Paper
Full-text available
Technology adoption models specify a pathway of technology acceptance from external variables to beliefs, intentions, adoption and actual usage. Mobile phone adoption has been studied from a variety of perspectives, including sociology, computer-supported cooperative work and human-computer interaction. What is lacking is a model integrating all these factors influencing mobile phone adoption. This paper investigates technology adoption models as a strategy to match mobile phone design to user's technological needs and expectations. Based on the literature study we integrate three existing technology adoption models and then evaluate the proposed model with interviews and a survey. The contribution of this paper is a model for representing the factors that influence mobile phone adoption.
Article
Full-text available
Information technology (IT) acceptance research has yielded many competing models, each with different sets of acceptance determinants. In this paper, we (1) review user acceptance literature and discuss eight prominent models, (2) empiri- cally compare the eight models and their exten- sions, (3) formulate a unified model that integrates elements across the eight models, and (4) empiri- cally validate the unified model. The eight models reviewed are the theory of reasoned action, the technology acceptance model, the motivational model, the theory of planned behavior, a model combining the technology acceptance model and the theory of planned behavior, the model of PC utilization, the innovation diffusion theory, and the social cognitive theory. Using data from four organizations over a six-month period with three points of measurement, the eight models ex- plained between 17 percent and 53 percent of the variance in user intentions to use information technology. Next, a unified model, called the Unified Theory of Acceptance and Use of Tech- nology (UTAUT), was formulated, with four core determinants of intention and usage, and up to four moderators of key relationships. UTAUT was then tested using the original data and found to outperform the eight individual models (adjusted
Chapter
The purpose of this chapter is to outline a discussion of the first phase of an ethnographic study of mobile phone use on train carriages. The chapter constitutes a brief exploration of some ordinary features of action and interaction as constitutive features of social action and technology use. As such the discussion is designed to be suggestive of how these initial findings can be analysed and developed further. The analytical orientation is informed by an ethnomethodological approach insofar as it is concerned with the rules of mobile phone use “as instructions for seeing” (Harper and Hughes, 1993) and understanding the “character of interpersonal communication” (Heath and Luff, 1993) in public places.
Book
The SAGE Glossary of the Social and Behavioral Sciences provides college and university students with a highly accessible, curriculum-driven reference work, both in print and on-line, defining the major terms needed to achieve fluency in the social and behavioral sciences. Comprehensive and inclusive, its interdisciplinary scope covers such varied fields as anthropology, communication and media studies, criminal justice, economics, education, geography, human services, management, political science, psychology, and sociology. In addition, while not a discipline, methodology is at the core of these fields and thus receives due and equal consideration. At the same time we strive to be comprehensive and broad in scope, we recognize a need to be compact, accessible, and affordable. Thus the work is organized in A-to-Z fashion and kept to a single volume of approximately 600 to 700 pages.
Article
Managing personal information such as to-dos and contacts has become our daily routines, consuming more time than needed. Existing PIM tools require extensive involvement of human users. This becomes a problem in using mobile devices due to their physical constraints. To address the limitations of traditional PIM tools, we propose a model of mobile PIM agent (PIMA) that aims to improve PIM on mobile devices through natural language interface and application integration. We conducted a user study to evaluate PIMA empirically with prototype systems. The results show that mobile PIMA improved perceived usefulness, ease-of-use, and efficiency of PIM on mobile devices, which in turn accounted for positive attitude and intention to use the system. The findings of this study provide suggestions for designing and developing PIM applications on mobile devices.
Article
This paper reports a series of investigations, which aim to test the appropriateness of voice recognition as an interaction method for mobile phone use. First, a KLM model was used in order to compare the speed of using voice recognition against using multi-tap and predictive text (the two most common methods of text entry) to interact with the phone menus and compose a text message. The results showed that speech is faster than the other two methods and that a combination of input methods provides the quickest task completion times. The first experiment used a controlled message creation task to validate the KLM predictions. This experiment also confirmed that the result was not due to a speed/accuracy trade off and that participants preferred to use the combination of input methods rather than a single method for menu interaction and text composition. The second experiment investigated the effect of limited visual feedback (when walking down the road or driving a car for example) on interaction, providing further evidence in support of speech as a useful input method. These experiments not only indicate the usefulness of voice in SMS input but also that users could also be satisfied with voice input in hands-busy, eyes-busy situations.
The adoption and appropriation of Siri
  • J Siftar
Where speech recognition is going | MIT Technology Review
  • W Knight
Siri loves you, but do you love Siri? PowerPoint slides
  • J Siftar