Shizuka Nakamura’s research while affiliated with Kyoto University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (19)


An Attentive Listening System for Autonomous Android ERICA: Comparative Evaluation with Human Attentive ListenersアンドロイドERICAの傾聴対話システム–人間による傾聴との比較評価–
  • Article

September 2021

·

44 Reads

·

3 Citations

Transactions of the Japanese Society for Artificial Intelligence

Koji Inoue

·

·

Kenta Yamamoto

·

[...]

·

An attentive listening system for autonomous android ERICA is presented. Our goal is to realize a humanlike natural attentive listener for elderly people. The proposed system generates listener responses: backchannels, repeats, elaborating questions, assessments, and generic responses. The system incorporates speech processing using a microphone array and real-time dialogue processing including continuous backchannel prediction and turn-taking prediction. In this study, we conducted a dialogue experiment with elderly people. The system was compared with a WOZ system where a human operator played the listener role behind the robot. As a result, the system showed comparable scores in basic skills of attentive listening, such as easy to talk, seriously listening, focused on the talk, and actively listening. It was also found that there is still a gap between the system and the human (WOZ) for high-level attentive listening skills such as dialogue understanding, showing interest, and empathy towards the user.


End-to-end Modeling for Selection of Utterance Constructional Units via System Internal States

March 2021

·

13 Reads

Lecture Notes in Electrical Engineering

In order to make conversational agents or robots conduct human-like behaviors, it is important to design a model of the system internal states. In this paper, we address a model of favorable impression to the dialogue partner. The favorable impression is modeled to change according to user’s dialogue behaviors and also affect following dialogue behaviors of the system, specifically selection of utterance constructional units. For this modeling, we propose a hierarchical structure of logistic regression models. First, from the user’s dialogue behaviors, the model estimates the level of user’s favorable impression to the system and also the level of the user’s interest in the current topic. Then, based on the above results, the model predicts the system’s favorable impression to the user. Finally, the model determines selection of utterance constructional units in the next system turn. We train each of the logistic regression models individually with a small amount of annotated data of favorable impression. Afterward, the entire multi-layer network is fine-tuned with a larger amount of dialogue behavior data. An experimental result shows that the proposed method achieves higher accuracy on the selection of the utterance constructional units, compared with methods that do not take into account the system internal states.


A Job Interview Dialogue System with Autonomous Android ERICA

March 2021

·

51 Reads

·

7 Citations

Lecture Notes in Electrical Engineering

We demonstrate a job interview dialogue with the autonomous android ERICA which plays the role of an interviewer. Conventional job interview dialogue systems ask only pre-defined questions. The job interview system of ERICA generates follow-up questions based on the interviewee’s response on the fly. The follow-up questions consist of two kinds of approaches: selection-based and keyword-based. The first type question is based on selection from a pre-defined question set, which can be used in many cases. The second type of question is based on a keyword extracted from the interviewee’s response, which digs into the interviewee’s response dynamically. These follow-up questions contribute to realizing natural and trained dialogue.


A Character Expression Model Affecting Spoken Dialogue Behaviors

January 2021

·

41 Reads

·

4 Citations

Lecture Notes in Electrical Engineering

We address character (personality) expression for a spoken dialogue system in order to accommodate it in particular dialogue tasks and social roles. While conventional studies investigated controlling the linguistic expressions, we focus on spoken dialogue behaviors to express systems’ characters. Specifically, we investigate spoken dialogue behaviors such as utterance amount, backchannel frequency, filler frequency, and switching pause length in order to express three character traits: extroversion, emotional instability, and politeness. In this study, we evaluate this model with a natural spoken dialogue corpus. The results reveal that this model expresses reasonable characters according to the dialogue tasks and the participant roles. Furthermore, it is also shown that this model is able to express different characters among participants given the same role. A subjective experiment demonstrated that subjects could perceive the characters expressed by the model.



A Job Interview Dialogue System That Asks Follow-up Questions: Implementation and Evaluation with an Autonomous Android掘り下げ質問を行う就職面接対話システムの自律型アンドロイドでの実装と評価

September 2020

·

85 Reads

·

1 Citation

Transactions of the Japanese Society for Artificial Intelligence

A spoken dialogue system that plays the role of an interviewer for job interviews is presented. In this work, ourgoal is to implement an automated job interview system where candidates can use it as practice before the real interview.Conventional job interview systems ask only pre-defined questions, which make the dialogue monotonous andfar from human-human interviews. We propose follow-up question generation based on the assessment of candidateresponses and keyword extraction. This model was integrated into the dialogue system of the autonomous androidERICA to conduct subject experiments. The proposed job interview system was compared with the baseline systemthat did not generate any follow-up questions and selected among pre-defined questions. The experimental resultsshow that the proposed system is significantly better in subjective evaluations regarding impressions of job interviewpractice, the quality of questions, and the presence of the interviewer.





Figure 1. Dialogue scene between user (left) and the humanoid ERICA (right), which is remotely operated.
Figure 3. Procedure of backchannel generation in spoken dialogue system.
Figure 4. These lexical forms were selected partly from past works on backchannel analysis [26-28] and partly from the Corpus of Spontaneous Japanese (CSJ) , 3 which is a database containing a large collection of Japanese spoken language data and information for use in linguistic research. Equivalent English expressions for the backchannels in Figure 4 include 'um', 'wow', 'oh my!', 'yeah', 'right', 'really', 'great', 'I see', 'wonderful', and 'that sounds hard'. Following this procedure, the lexical forms of the assessment backchannel are decided. We adopted these assessment backchannels for expressing reactive emotions. Although the backchannel generation method adopted in the present study uses only linguistic information, other methods introducing prosodic features can also be used [3,48].
Figure 5. Occurrence frequencies of listener's emotion categories depending on speaker's valence values.
Annotated excerpt of human-robot dialogue dataset.

+3

Expressing reactive emotion based on multimodal emotion recognition for natural conversation in human–robot interaction
  • Article
  • Full-text available

September 2019

·

539 Reads

·

38 Citations

Human–human interaction consists of various nonverbal behaviors that are often emotion-related. To establish rapport, it is essential that the listener respond to reactive emotion in a way that makes sense given the speaker's emotional state. However, human–robot interactions generally fail in this regard because most spoken dialogue systems play only a question-answer role. Aiming for natural conversation, we examine an emotion processing module that consists of a user emotion recognition function and a reactive emotion expression function for a spoken dialogue system to improve human–robot interaction. For the emotion recognition function, we propose a method that combines valence from prosody and sentiment from text by decision-level fusion, which considerably improves the performance. Moreover, this method reduces fatal recognition errors, thereby improving the user experience. For the reactive emotion expression function, the system's emotion is divided into emotion category and emotion level, which are predicted using the parameters estimated by the recognition function on the basis of distributions inferred from human–human dialogue data. As a result, the emotion processing module can recognize the user's emotion from his/her speech, and expresses a reactive emotion that matches. Evaluation with ten participants demonstrated that the system enhanced by this module is effective to conduct natural conversation.

Download

Citations (15)


... An attentive listening system (Inoue et al. 2020) was developed by integrating the aforementioned modules. Attentive listening is useful for seniors who need to be heard and maintain their communication skills. ...

Reference:

Spoken Dialogue Technology for Semi-Autonomous Cybernetic Avatars
An Attentive Listening System with Android ERICA: Comparison of Autonomous and WOZ Interactions
  • Citing Conference Paper
  • January 2020

... In the early research field on human-robot interaction, interactive capabilities were mainly used for information-providing tasks in real environments [6,13,21]. Recently, advances in autonomous conversational technologies such as natural speech synthesis and speech recognition have made it possible to handle more complex conversational tasks [14,23]. Social robots are expected to play a role in helping people solve their problems and mental health [8,15,18]. ...

A Job Interview Dialogue System with Autonomous Android ERICA
  • Citing Chapter
  • March 2021

Lecture Notes in Electrical Engineering

... The synthesis task tries to generate behaviors to express artificial personalities for human-computer interaction. For instance, [8] controlled the amount of utterances, backchannels and fillers, and switching pause length for a robot to express extraversion, emotional instability and politeness. With the recognition and synthesis functions, systems can be designed to have behaviors similar to those of human beings in particular scenarios such as job interview and health care. ...

A Character Expression Model Affecting Spoken Dialogue Behaviors
  • Citing Chapter
  • January 2021

Lecture Notes in Electrical Engineering

... We consider collaborative and competitive interaction game contexts and investigate both PBs and OBs during naturally occurring deception behaviour. Lastly, as social robots have begun to take on different social yet professional roles such as an interviewer [4,30], or a teacher [36], or a therapist [11] or a detective [24], we consider the Human-robot game interaction context and foresee a future where robots detect deception in real-time. The paper investigates the following research questions (RQs): ...

Job Interviewer Android with Elaborate Follow-up Question Generation
  • Citing Conference Paper
  • October 2020

... These systems paved the way for more advanced automated survey methods. Recent research has leveraged the rapid advancements in speech-to-text (STT) models, large language models (LLMs), and text-to-speech (TTS) engines, significantly enhancing the naturalness and adaptability of automated interviewers (Cuevas et al., 2024;Ge et al., 2022;Inoue et al., 2020;Nagasawa et al., 2023;Zeng et al., 2023). Zeng et al., 2023 demonstrated that LLMs could conduct semi-structured interviews, and Wuttke et al., 2024 showed that LLMs were able to conduct conversational interviews, retrieving data comparable to traditional methods with additional scalability. ...

A Job Interview Dialogue System That Asks Follow-up Questions: Implementation and Evaluation with an Autonomous Android掘り下げ質問を行う就職面接対話システムの自律型アンドロイドでの実装と評価
  • Citing Article
  • September 2020

Transactions of the Japanese Society for Artificial Intelligence

... Over the years, numerous scholars have extensively researched the prediction of pause fillers using various methods, which has resulted in more natural and authentic text generation. For instance, Nakanishi R. et al. proposed a method based on analyzing humanrobot interaction data and machine learning models to predict the occurrence and appropriate forms of pause fillers, aiming to generate them at the beginning of system utterances in humanoid robot spoken dialog systems, indicating turn-taking or turn-holding intentions [1]. Balagopalan A. et al. compared two common methods for AD detection on a matched dataset, assessing the advantages of domain knowledge and BERT pre-trained transfer models in predicting pauses and interruptions [2]. ...

Generating Fillers Based on Dialog Act Pairs for Smooth Turn-Taking by Humanoid Robot
  • Citing Chapter
  • September 2019

Lecture Notes in Electrical Engineering

... Meanwhile, Li et al. [22] noted that voice-based systems that use the question-answer format have difficulties in creating natural interactions because they do not recognize the emotions and context of the users. Therefore, backchannels can be provided either after or during the utterance of the other person; however, it is important to consider the emotions and context of users when delivering them [23,24,25]. ...

Expressing reactive emotion based on multimodal emotion recognition for natural conversation in human–robot interaction

... For example, [12] examined the influence of backchannel selection on extroversion expression using the virtual agent SAL, which acts as an interlocutor in interaction. [13] analyzed the relationship between personality traits and dialogue behaviors, to control utterance amount, backchannel, filler, and switching pause length for the humanoid robot ERICA to express extroversion, emotional instability, and politeness. However, most research has ignored the fact that impression depends not only on the emitter (system/speaker) but also largely on the perception by the receiver (user/listener). ...

Dialogue Behavior Control Model for Expressing a Character of Humanoid Robots
  • Citing Conference Paper
  • November 2018

... However, the majority of prior work focused on synthesizing monologues but ignored the aspects of spoken interaction which is necessary for emotion and identity. Personality is largely dependent on fillers and backchannels [6,37], whose pragmatic meanings are highly conveyed by prosody [36]. Taking the backchannel "really" as an example, it can be used for expressing interest, surprise, or disappointment, depending on its speaking style. ...

A Dialogue Behavior Control Model for Expressing a Characters of Humanoid Robots
  • Citing Article
  • September 2018

Transactions of the Japanese Society for Artificial Intelligence