Setup for dialogue collection.

Setup for dialogue collection.

Source publication
Article
Full-text available
Engagement represents how much a user is interested in and willing to continue the current dialogue. Engagement recognition will provide an important clue for dialogue systems to generate adaptive behaviors for the user. This paper addresses engagement recognition based on multimodal listener behaviors of backchannels, laughing, head nodding, and e...

Context in source publication

Context 1
... dialogue was one-on-one, and the subject and ERICA sat on chairs facing each other. Figure 2 shows a snapshot of the dialogue. The dialogue scenario was as follows. ...

Similar publications

Article
Full-text available
Tree species classification using hyperspectral imagery is a challenging task due to the high spectral similarity between species and large intra-species variability. This paper proposes a solution using the Multiple Instance Adaptive Cosine Estimator (MI-ACE) algorithm. MI-ACE estimates a discriminative target signature to differentiate between a...

Citations

... In the second category, we clustered the studies in which humans and robots are involved in a conversation, e.g., the robot is a storyteller [31], the robot explains some paintings [11], the robot speaks with the users [32][33][34][35], fostering the interaction between them [36]. Another scenario in which the detection of user engagement is quite relevant regards cognitive therapy, where a robotic platform is used to elicit certain behaviors in children with Autism Spectrum Disorder [20,[37][38][39][40]. ...
... Excluding the works that did not specify it (i.e., 9 out of 28), in one case the robot was tested at home [20], in two cases the robot was tested in public places (e.g., museum [42] and shops [43]), and in 6 cases the interaction took place at school (i.e., elementary school [24-27, 44, 46, 47], kindergarten [45]). In the remaining work, the interaction occurred in a laboratory setting (i.e., office [33,48] and university environments [29,35,36,41,49]). Based on the duration and the settings chosen for the interactions, the number of involved participants largely varied among the works (maximum: 227 participants; minimum: 2 participants). ...
... Taking the definition from the Pediatric Assessment of Rehabilitation Engagement scale, [18] defines behavioral engagement as a proactive tendency to adapt to the changes and experiences of the interaction, as well as sharing intentions and desire to improve or change the interaction. It can be [46,47] x x x [48] x x [18] x x x x [20] x x x x [42] x x [44] x x x x [37] x [34] x [43] x * [30] x * [45] x [33] x x [35] x x [27] x x [36] x x * [11] x [28] x [26] x x x * [25] x x * [29] x [24] x x * *Indicates that only attention is considered simplified as the motivation [58], which mostly encourages action and participation in the task [5]. ...
Article
Full-text available
The concept of engagement is widely adopted in the human–robot interaction (HRI) field, as a core social phenomenon in the interaction. Despite the wide usage of the term, the meaning of this concept is still characterized by great vagueness. A common approach is to evaluate it through self-reports and observational grids. While the former solution suffers from a time-discrepancy problem, since the perceived engagement is evaluated at the end of the interaction, the latter solution may be affected by the subjectivity of the observers. From the perspective of developing socially intelligent robots that autonomously adapt their behaviors during the interaction, replicating the ability to properly detect engagement represents a challenge in the social robotics community. This systematic review investigates the conceptualization of engagement, starting with the works that attempted to automatically detect it in interactions involving robots and real users (i.e., online surveys are excluded). The goal is to describe the most worthwhile research efforts and to outline the commonly adopted definitions (which define the authors’ perspective on the topic) and their connection with the methodology used for the assessment (if any). The research was conducted within two databases (Web of Science and Scopus) between November 2009 and January 2023. A total of 590 articles were found in the initial search. Thanks to an accurate definition of the exclusion criteria, the most relevant papers on automatic engagement detection and assessment in HRI were identified. Finally, 28 papers were fully evaluated and included in this review. The analysis illustrates that the engagement detection task is mostly addressed as a binary or multi-class classification problem, considering user behavioral cues and context-based features extracted from recorded data. One outcome of this review is the identification of current research barriers and future challenges on the topic, which could be clustered in the following fields: engagement components, annotation procedures, engagement features, prediction techniques, and experimental sessions.
... Unfortunately, questionnaires only provide a total rating after the interaction and not during the interaction, and are difficult to use with children. Other methods are based on video or audio data and measure participants' output behaviors, such eye gaze, head movements (nodding), verbal utterances and facial expressions or a combination of these behaviors [29][30][31]. Eye gaze is especially important, because it can indicate where the participant's attention is directed and can relatively easily be measured automatically, making it ideal for real-time engagement tutoring interactions. ...
Article
Full-text available
In this paper, we examine to what degree children of 3–4 years old engage with a task and with a social robot during a second-language tutoring lesson. We specifically investigated whether children’s task engagement and robot engagement were influenced by three different feedback types by the robot: adult-like feedback, peer-like feedback and no feedback. Additionally, we investigated the relation between children’s eye gaze fixations and their task engagement and robot engagement. Fifty-eight Dutch children participated in an English counting task with a social robot and physical blocks. We found that, overall, children in the three conditions showed similar task engagement and robot engagement; however, within each condition, they showed large individual differences. Additionally, regression analyses revealed that there is a relation between children’s eye-gaze direction and engagement. Our findings showed that although eye gaze plays a significant role in measuring engagement and can be used to model children’s task engagement and robot engagement, it does not account for the full concept and engagement still comprises more than just eye gaze.
... 'I see.') are less likely to keep the user engaged in the conversation. Besides these linguistic features, we are considering use of non-linguistic features such as backchannels, laughing, head nodding, and eye-gaze [14]. We trained the engagement recognition model using only linguistic features and confirmed that the recognition accuracy was 70.0%. ...
Article
Full-text available
Many people are now engaged in remote conversations for a wide variety of scenes such as interviewing, counseling, and consulting, but there is a limited number of skilled experts. We propose a novel framework of parallel conversations with semi-autonomous avatars, where one operator collaborates with several remote robots or agents simultaneously. The autonomous dialogue system mostly manages the conversation, but switches to the human operator when necessary. This framework circumvents the requirement for autonomous systems to be completely perfect. Instead, we need to detect dialogue breakdown or disengagement. We present a prototype of this framework for attentive listening.
... To this end, engagement recognition has been widely studied by investigating multi-modal user behaviors [23]. We proposed an engagement recognition model based on listener behaviors such as backchannels, laughs, head nods, and eye contact [24]. We automatically detected these listener behaviors to implement a real-time engagement recognition system. ...
Preprint
Full-text available
Following the success of spoken dialogue systems (SDS) in smartphone assistants and smart speakers, a number of communicative robots are developed and commercialized. Compared with the conventional SDSs designed as a human-machine interface, interaction with robots is expected to be in a closer manner to talking to a human because of the anthropomorphism and physical presence. The goal or task of dialogue may not be information retrieval, but the conversation itself. In order to realize human-level "long and deep" conversation, we have developed an intelligent conversational android ERICA. We set up several social interaction tasks for ERICA, including attentive listening, job interview, and speed dating. To allow for spontaneous, incremental multiple utterances, a robust turn-taking model is implemented based on TRP (transition-relevance place) prediction, and a variety of backchannels are generated based on time frame-wise prediction instead of IPU-based prediction. We have realized an open-domain attentive listening system with partial repeats and elaborating questions on focus words as well as assessment responses. It has been evaluated with 40 senior people, engaged in conversation of 5-7 minutes without a conversation breakdown. It was also compared against the WOZ setting. We have also realized a job interview system with a set of base questions followed by dynamic generation of elaborating questions. It has also been evaluated with student subjects, showing promising results.
... Papers can be divided into those that consider engagement as a process and those that treat engagement as a state. The state point of view assumes that one is either engaged or not engaged (e.g., Inoue et al., 2018), while the process point of view assumes that there are different processes that unfold during an interaction. Here the action of getting engaged is part of the construct of engagement itself. ...
Article
Full-text available
Engagement is a concept of the utmost importance in human-computer interaction, not only for informing the design and implementation of interfaces, but also for enabling more sophisticated interfaces capable of adapting to users. While the notion of engagement is actively being studied in a diverse set of domains, the term has been used to refer to a number of related, but different concepts. In fact it has been referred to across different disciplines under different names and with different connotations in mind. Therefore, it can be quite difficult to understand what the meaning of engagement is and how one study relates to another one accordingly. Engagement has been studied not only in human-human, but also in human-agent interactions i.e., interactions with physical robots and embodied virtual agents. In this overview article we focus on different factors involved in engagement studies, distinguishing especially between those studies that address task and social engagement, involve children and adults, are conducted in a lab or aimed for long term interaction. We also present models for detecting engagement and for generating multimodal behaviors to show engagement.
... Automatic detection of disengagement and engagement states is a pressing challenge in a wide range of contexts. It is relevant for human-robot and human-agent interaction [10,13], task engagement [16], content [15,18] and conversational engagement [3,19]. ...
... In the studies focused on engagement detection during conversation it is regarded as "the process where two (or more) participants establish, maintain and end their perceived connection" [17,22] and reflects how much the subject is interested in and willing to continue the current dialogue [10]. Nakano and Ishii [17] estimated the degree of engagement during conversation by measuring the user's attentional behavior (paying attention to the shared object and to interlocutor indicates the listener's engagement). ...
... There are some unimodal [15,27] as well as multimodal approaches [10,16,18,25,28] to engagement and disengagement detection. For example, images [15], speech [27], body posture [20] and gaze direction [17] can be used as a single modality for engagement/disengagement detection. ...
Conference Paper
Full-text available
Engagement/disengagement detection is a challenging task emerging in a range of human-human and human-computer interaction problems. While being important, the issue is still far from being solved and a number of studies involving in-the-wild data have been conducted by now. Disambiguation in the definition of en-gaged/disengaged states makes it hard to collect, annotate and analyze such data. In this paper we describe different approaches to building engagement/disengagement models working with highly imbalanced multimodal data from natural conversations. We set a baseline result of 0.695 (unweighted average recall) by direct classification. Then we try to detect disengagement by means of engagement regression models, as they have strong negative correlation. To deal with imbalanced data we apply class weighting and data augmentation techniques (SMOTE and mixup). We experiment with combinations of modalities in order to find the most contributing ones. We use features from both audio (speech) and video (face, body, lips, eyes) channels. We transform original features using Principal Component Analysis and experiment with several types of modality fusion. Finally, we combine approaches and increase the performance up to 0.715 using four modalities (all channels except face). Audio and lips features appear to be the most contributing ones, which may be tightly connected with speech.
... Therefore, we use not only the integrated labels but also different annotators' labels. In our previous work, we proposed a hierarchical Bayesian model to recognize each annotator's label [30]. In this paper, we propose a neural network to predict the integrated label by considering the different annotators' labels. ...
Chapter
Full-text available
Speech technology has made significant advances with the introduction of deep learning and large datasets, enabling automatic speech recognition and synthesis at a practical level. Dialogue systems and conversational AI have also achieved dramatic advances based on the development of large language models. However, the application of these technologies to humanoid robots remains challenging because such robots must operate in real time and in the real world. This chapter reviews the current status and challenges of spoken dialogue technology for communicative robots and virtual agents. Additionally, we present a novel framework for the semi-autonomous cybernetic avatars investigated in this study.