Stefan Ultes

Stefan Ultes
Mercedes Benz R&D · RD/CUV

Dr.-Ing.

About

95
Publications
13,409
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,345
Citations
Introduction
I am the Dialogue Research Lead at Mercedes Benz R&D working on the next generation of the #HeyMercedes voice assistant. Previously, I was a Research Associate at the Spoken Dialogue Systems group at the University of Cambridge. I have received my PhD at the Dialogue Systems Group at Ulm University with my dissertation "User-centred Adaptive Spoken Dialogue Modelling". My research focuses on natural and intuitive conversations with intelligent agents where we expect the system to understand natural input and behave in a natural way. We expect the system to adapt the communication style to the user taking into account the situation and environment in which the interaction takes place.
Additional affiliations
January 2016 - present
University of Cambridge
Position
  • Research Associate
Description
  • Research Project: Open Domain Statistical Spoken Dialogue Systems
February 2011 - January 2016
Ulm University
Position
  • Research Assistant
Education
February 2011 - November 2015
Ulm University
Field of study
  • Engineering
October 2003 - September 2010
Karlsruhe Institute of Technology
Field of study
  • Computer Science

Publications

Publications (95)
Article
With the aim of designing a spoken dialogue system which has the ability to adapt to the user's communication idiosyncrasies, we investigate whether it is possible to carry over insights from the usage of communication styles in human-human interaction to human-computer interaction. In an extensive literature review, it is demonstrated that communi...
Article
Full-text available
This paper introduces a natural language understanding (NLU) framework for argumentative dialogue systems in the information-seeking and opinion building domain. The proposed framework consists of two sub-models, namely intent classifier and argument similarity. Intent classifier model stack BiLSTM with attention mechanism on top of pre-trained BER...
Preprint
This work combines information about the dialogue history encoded by pre-trained model with a meaning representation of the current system utterance to realize contextual language generation in task-oriented dialogues. We utilize the pre-trained multi-context ConveRT model for context representation in a model trained from scratch; and leverage the...
Chapter
Multimodal machine learning (MMML) is a vibrant multi-disciplinary research field which addresses some of the original goals of artificial intelligence by integrating and modelling multiple communicative modalities, including linguistic, acoustic and visual messages. With the goal of better understanding and modelling behaviours of ageing individua...
Article
Learning suitable and well-performing dialogue behaviour in statistical spoken dialogue systems has been in the focus of research for many years. While most work that is based on reinforcement learning employs an objective measure like task success for modelling the reward signal, we propose to use a reward signal based on user satisfaction. We pro...
Preprint
This paper presents an automatic method to evaluate the naturalness of natural language generation in dialogue systems. While this task was previously rendered through expensive and time-consuming human labor, we present this novel task of automatic naturalness evaluation of generated language. By fine-tuning the BERT model, our proposed naturalnes...
Preprint
One challenge for dialogue agents is to recognize feelings of the conversation partner and respond accordingly. In this work, RoBERTa-GPT2 is proposed for empathetic dialogue generation, where the pre-trained auto-encoding RoBERTa is utilised as encoder and the pre-trained auto-regressive GPT-2 as decoder. With the combination of the pre-trained Ro...
Article
Persuasive argumentation depends on multiple aspects, which include not only the content of the individual arguments, but also the way they are presented. The presentation of arguments is crucial – in particular in the context of dialogical argumentation. However, the effects of different discussion styles on the listener are hard to isolate in hum...
Chapter
In this work, we introduce BEA, an argumentative Dialogue System that assists the user in his or her opinion forming regarding a certain controversial topic. To this end, we establish an opinion model based on weighted bipolar argumentation graphs that allows the system to infer the influence of preferences expressed by the user on all related aspe...
Preprint
Full-text available
This paper introduces a natural language understanding (NLU) framework for argumentative dialogue systems in the information-seeking and opinion building domain. Our approach distinguishes multiple user intents and identifies system arguments the user refers to in his or her natural language utterances. Our model is applicable in an argumentative d...
Chapter
The cultural background has a great influence on the people’s behaviour and perception. With the aim of designing a culturally sensitive conversational assistant, we have investigated whether culture-specific parameters may be trained by use of a supervised learning approach. We have used a dialogue management framework based on the concept of prob...
Article
Full-text available
Information about a subjective user opinion towards an argument is crucial for argumentative systems in order to present appropriate content and adapt their behaviour to the individual user. However, requesting explicit feedback regarding the discussed arguments is often impractical and can hinder the interaction. To address this issue, we investig...
Chapter
We present an indoor navigation system that is based on natural spoken interaction. The system navigates the user through the University of Ulm based on scripts, supporting three different routes and varying communication styles for the system descriptions. Furthermore, the system is able to cope with incomplete scripts and inconclusive situations...
Preprint
As conversational agents become integral parts of many aspects of our lives, current approaches are reaching bottlenecks of performance that require increasing amounts of data or increasingly powerful models. It is also becoming clear that such agents are here to stay and accompany us for long periods of time. If we are, therefore, to design agents...
Conference Paper
Research states that persuasion is subjective. Moreover, people use behavioral cues all the time, very often even without noticing and are often not aware of being persuaded by non-rational cues. In order to draw attention to these effects, we want to enable virtual agents to adapt their behavior during interaction to the listener in order to incre...
Preprint
Learning suitable and well-performing dialogue behaviour in statistical spoken dialogue systems has been in the focus of research for many years. While most work which is based on reinforcement learning employs an objective measure like task success for modelling the reward signal, we use a reward based on user satisfaction estimation. We propose a...
Preprint
In order to take up the challenge of realising user-adaptive system behaviour, we present an extension for the existing OwlSpeak Dialogue Manager which enables the handling of dynamically created dialogue actions. This leads to an increase in flexibility which can be used for adaptation tasks. After the implementation of the modifications and the i...
Preprint
Statistical spoken dialogue systems usually rely on a single- or multi-domain dialogue model that is restricted in its capabilities of modelling complex dialogue structures, e.g., relations. In this work, we propose a novel dialogue model that is centred around entities and is able to model relations as well as multiple entities of the same type. W...
Chapter
Finding a good dialogue policy using reinforcement learning usually relies on objective criteria for modelling the reward signal, e.g., task success. In this contribution, we propose to use user satisfaction instead represented by the metric Interaction Quality (IQ). Comparing the user satisfaction-based reward to the baseline of task success, we s...
Chapter
In the present paper, we conduct a comparative evaluation of a multitude of information-seeking domains, using two well-known but fundamentally different algorithms for policy learning: GP-SARSA and DQN. Our goal is to gain an understanding of how the nature of such domains influences performance. Our results indicate several main domain characteri...
Chapter
In this work, we present the development and evaluation of a social companion and conversational partner for the specific user group of elderly persons. With the aim of designing a user-adaptive system, we respond to the desires of the elderly which have been identified during various interviews and create a companion that talks and listens to the...
Chapter
Collecting a large amount of real human-computer interaction data in various domains is a cornerstone in the development of better data-driven spoken dialog systems. The DialPort project is creating a portal to collect a constant stream of real user conversational data on a variety of topics. In order to keep real users attracted to DialPort, it is...
Chapter
In this paper, we investigate the applicability of soft changes to system behaviour, namely changing the amount of elaborateness and indirectness displayed. To this end, we examine the impact of elaborateness and indirectness on the perception of human-computer communication in a user study. Here, we show that elaborateness and indirectness influen...
Preprint
Cross-domain natural language generation (NLG) is still a difficult task within spoken dialogue modelling. Given a semantic representation provided by the dialogue manager, the language generator should generate sentences that convey desired information. Traditional template-based generators can produce sentences with all necessary information, but...
Conference Paper
This work introduces EVA, a multimodal argumentative Dialogue System that is capable of discussing controversial topics with the user. The interaction is structured as an argument game in which the user and the system select respective moves in order to convince their opponent. EVA's response is presented as a natural language utterance by a virtua...
Preprint
Even though machine learning has become the major scene in dialogue research community, the real breakthrough has been blocked by the scale of data available. To address this fundamental obstacle, we introduce the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple dom...
Preprint
Full-text available
In recent years, we have seen deep learning and distributed representations of words and sentences make impact on a number of natural language processing tasks, such as similarity, entailment and sentiment analysis. Here we introduce a new task: understanding of mental health concepts derived from Cognitive Behavioural Therapy (CBT). We define a me...
Conference Paper
We present a study addressing the questions of how varying communication styles of a spoken user interface are perceived by users and whether there exist global preferences in the communication styles elaborateness and indirectness. A total of 60 participants had two conversations each with Amazon's Alexa where Alexa used varying wordings for its o...
Preprint
Full-text available
This paper presents two ways of dealing with scarce data in semantic decoding using N-Best speech recognition hypotheses. First, we learn features by using a deep learning architecture in which the weights for the unknown and known categories are jointly optimised. Second, an unsupervised method is used for further tuning the weights. Sharing weigh...
Article
Full-text available
Reinforcement learning (RL) is a promising approach to solve dialogue policy optimisation. Traditional RL algorithms, however, fail to scale to large domains due to the curse of dimensionality. We propose a novel Dialogue Management architecture, based on Feudal RL, which decomposes the decision into two steps; a first step where a master policy se...
Chapter
Many different approaches for Interaction Quality (IQ) estimating of Spoken Dialogue Systems have been investigated. While dialogues clearly have a sequential nature, statistical classification approaches designed for sequential problems do not seem to work better on automatic IQ estimation than static approaches, i.e., regarding each turn as being...
Chapter
Getting a good estimation of the Interaction Quality (IQ) of a spoken dialogue helps to increase the user satisfaction as the dialogue strategy may be adapted accordingly. Therefore, some research has already been conducted in order to automatically estimate the Interaction Quality. This article adds to this by describing how Recurrent Neural Netwo...
Chapter
While approaches on automatic recognition of human emotion from speech have already achieved reasonable results , a lot of room for improvement still remains there. In our research, we select the most essential features by applying a self-adaptive multi-objective genetic algorithm. The proposed approach is evaluated using data from different langua...
Article
Full-text available
Dialogue assistants are rapidly becoming an indispensable daily aid. To avoid the significant effort needed to hand-craft the required dialogue flow, the Dialogue Management (DM) module can be cast as a continuous Markov Decision Process (MDP) and trained through Reinforcement Learning (RL). Several RL models have been investigated over recent year...
Conference Paper
Access to health care related information can be vital and should be easily accessible. However, immigrants often have difficulties to obtain the relevant information due to language barriers and cultural differences. In the KRISTINA project, we address those difficulties by creating a socially competent multimodal dialogue system that can assist i...
Article
Full-text available
Reinforcement learning is widely used for dialogue policy optimization where the reward function often consists of more than one component, e.g., the dialogue success and the dialogue length. In this work, we propose a structured method for finding a good balance between these components by searching for the optimal reward component weighting. To r...
Article
Full-text available
Deep reinforcement learning (RL) methods have significant potential for dialogue policy optimisation. However, they suffer from a poor performance in the early stages of learning. This is especially problematic for on-line learning with real users. Two approaches are introduced to tackling this problem. Firstly, to speed up the learning process, tw...
Article
Full-text available
Human conversation is inherently complex, often spanning many different topics/domains. This makes policy learning for dialogue systems very challenging. Standard flat reinforcement learning methods do not provide an efficient framework for modelling such dialogues. In this paper, we focus on the under-explored problem of multi-domain dialogue mana...
Article
Full-text available
Spoken dialogue systems allow humans to interact with machines using natural speech. As such, they have many benefits. By using speech as the primary communication medium, a computer interface can facilitate swift, human-like acquisition of information. In recent years, speech interfaces have become ever more popular, as is evident from the rise of...
Article
Full-text available
This paper presents a deep learning architecture for the semantic decoder component of a Statistical Spoken Dialogue System. In a slot-filling dialogue, the semantic decoder predicts the dialogue act and a set of slot-value pairs from a set of n-best hypotheses returned by the Automatic Speech Recognition. Most current models for spoken language un...
Article
Full-text available
Spoken dialogue systems allow humans to interact with machines using natural speech. As such, they have many benefits. By using speech as the primary communication medium, a computer interface can facilitate swift, human-like acquisition of information. In recent years, speech interfaces have become ever more popular, as is evident from the rise of...
Article
Full-text available
Recently a variety of LSTM-based conditional language models (LM) have been applied across a range of language generation tasks. In this work we study various model architectures and different ways to represent and aggregate the source information in an end-to-end neural dialogue system framework. A method called snapshot learning is also proposed...
Article
Full-text available
We describe a two-step approach for dialogue management in task-oriented spoken dialogue systems. A unified neural network framework is proposed to enable the system to first learn by supervision from a set of dialogue data and then continuously improve its behaviour via reinforcement learning, all using gradient-based algorithms on one single mode...
Conference Paper
Full-text available
In this paper, we describe the principles and technologies that underpin the development of an adaptive dialogue manager framework, tailored to carrying out human-agent conversations in a natural, robust and exible manner. Our research focus is twofold. First, the investigation of dialogue strategies that can handle dynamically created user and sy...
Conference Paper
Full-text available
We present work in progress on an intelligent embodied conversation agent in the basic care and healthcare domain. In contrast to most of the existing agents, the presented agent is aimed to have linguistic cultural, social and emotional competence needed to interact with elderly and migrants. It is composed of an ontology-based and reasoning-drive...
Conference Paper
Full-text available
The ability to compute an accurate reward function is essential for optimising a dialogue policy via reinforcement learning. In real-world applications, using explicit user feedback as the reward signal is often unreliable and costly to collect. This problem can be mitigated if the user's intent is known in advance or data is available to pre-train...
Article
Full-text available
Teaching machines to accomplish tasks by conversing naturally with humans is challenging. Currently, developing task-oriented dialogue systems requires creating multiple components and typically this involves either a large amount of handcrafting, or acquiring labelled datasets and solving a statistical learning problem for each component. In this...
Article
Full-text available
Many different approaches for estimating the Interaction Quality (IQ) of Spoken Dialogue Systems have been investigated. While dialogues clearly have a sequential nature, statistical classification approaches designed for sequential problems do not seem to work better on automatic IQ estimation than static approaches, i.e., regarding each turn as b...
Chapter
Adaptivity of intelligent environments to their surroundings provided by the ATRACO Spoken Dialogue Manager is only one means of adaptation. Recent work in Spoken Dialogue Systems focuses on the integration of user-centred adaptation means to alter the content, flow and structure of the ongoing dialogue. In this chapter, we introduce a general user...
Book
This book covers key topics in the field of intelligent ambient adaptive systems. It focuses on the results worked out within the framework of the ATRACO (Adaptive and TRusted Ambient eCOlogies) project. The theoretical background, the developed prototypes, and the evaluated results form a fertile ground useful for the broad intelligent environment...