Lab

Speech Music and Hearing (TMH)

About the lab

Featured research (2)

The large majority of previous work on human-robot conversations in a second language has been performed with a human wizard-of-Oz. The reasons are that automatic speech recognition of non-native conversational speech is considered to be unreliable and that the dialogue management task of selecting robot utterances that are adequate at a given turn is complex in social conversations. This study therefore investigates if robot-led conversation practice in a second language with pairs of adult learners could potentially be managed by an autonomous robot. We first investigate how correct and understandable transcriptions of second language learner utterances are when made by a state-of-the-art speech recogniser. We find both a relatively high word error rate (41%) and that a substantial share (42%) of the utterances are judged to be incomprehensible or only partially understandable by a human reader. We then evaluate how adequate the robot utterance selection is, when performed manually based on the speech recognition transcriptions or autonomously using (a) predefined sequences of robot utterances, (b) a general state-of-the-art language model that selects utterances based on learner input or the preceding robot utterance, or (c) a custom-made statistical method that is trained on observations of the wizard’s choices in previous conversations. It is shown that adequate or at least acceptable robot utterances are selected by the human wizard in most cases (96%), even though the ASR transcriptions have a high word error rate. Further, the custom-made statistical method performs as well as manual selection of robot utterances based on ASR transcriptions. It was also found that the interaction strategy that the robot employed, which differed regarding how much the robot maintained the initiative in the conversation and if the focus of the conversation was on the robot or the learners, had marginal effects on the word error rate and understandability of the transcriptions but larger effects on the adequateness of the utterance selection. Autonomous robot-led conversations may hence work better with some robot interaction strategies.
Four different interaction styles for the social robot Furhat acting as a host in spoken conversation practice with two simultaneous language learners have been developed, based on interaction styles of human moderators of language cafés. We first investigated, through a survey and recorded sessions of three-party language café style conversations, how the interaction styles of human moderators are influenced by different factors (e.g., the participants language level and familiarity). Using this knowledge, four distinct interaction styles were developed for the robot: sequentially asking one participant questions at the time (Interviewer); the robot speaking about itself, robots and Sweden or asking quiz questions about Sweden (Narrator); attempting to make the participants talk with each other (Facilitator); and trying to establish a three-party robot–learner–learner interaction with equal participation (Interlocutor). A user study with 32 participants, conversing in pairs with the robot, was carried out to investigate how the post-session ratings of the robot’s behavior along different dimensions (e.g., the robot’s conversational skills and friendliness, the value of practice) are influenced by the robot’s interaction style and participant variables (e.g., level in the target language, gender, origin). The general findings were that Interviewer received the highest mean rating, but that different factors influenced the ratings substantially, indicating that the preference of individual participants needs to be anticipated in order to improve learner satisfaction with the practice. We conclude with a list of recommendations for robot-hosted conversation practice in a second language.

Lab head

Joakim Gustafson
Department
  • Department of Speech, Music and Hearing (TMH)

Members (9)

Jonas Beskow
  • KTH Royal Institute of Technology
Rolf Carlson
  • KTH Royal Institute of Technology
Olov Engwall
  • KTH Royal Institute of Technology
André Pereira
  • KTH Royal Institute of Technology
Daniel Neiberg
  • KTH Royal Institute of Technology
Patrik Jonell
  • KTH Royal Institute of Technology
Dimosthenis Kontogiorgos
  • Universität Potsdam
Agnes Axelsson
  • KTH Royal Institute of Technology
Ronald Cumbal
Ronald Cumbal
  • Not confirmed yet
Birger Moëll
Birger Moëll
  • Not confirmed yet