Oliver Joseph Lemon

Oliver Joseph Lemon
Heriot-Watt University · School of Mathematical and Computer Sciences

Ph.D.

About

389
Publications
60,174
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,490
Citations
Introduction
Director of the Interaction Lab. Some of our projects: * SPRING (H2020): conversational AI for HRI * MuMMER (H2020): social HRI in retail * Amazon Alexa Prize (2017, 2018) * BABBLE (EPSRC): machine learning for spoken dialogue systems * JAMES: socially appropriate Human-Robot Interaction (FP7) * ABC-POMDP: Belief Compression for POMDP spoken dialogue systems (EPSRC) * SpaceBook: combining geo-spatial data with speech interfaces (FP7) See http://www.macs.hw.ac.uk/InteractionLab for details
Additional affiliations
September 1998 - June 1999
Trinity College Dublin
Position
  • Lecturer
October 2009 - present
Heriot-Watt University
January 2003 - September 2009
University of Edinburgh
Education
September 1992 - December 1995
University of Edinburgh
Field of study
  • AI

Publications

Publications (389)
Article
Full-text available
Artificially intelligent agents equipped with strategic skills that can negotiate during their interactions with other natural or artificial agents are still underdeveloped. This paper describes a successful application of Deep Reinforcement Learning (DRL) for training intelligent agents with strategic conversational skills, in a situated dialogue...
Conference Paper
Full-text available
We present a data-driven approach to learn user-adaptive referring expression generation (REG) policies for spoken dialogue systems. Referring expressions can be difficult to understand in technical domains where users may not know the technical 'jargon' names of the domain entities. In such cases, dialogue systems must be able to model the user's...
Chapter
Full-text available
We present and evaluate a new model for Natural Language Generation (NLG) in Spoken Dialogue Systems, based on statistical planning, given noisy feedback from the current generation context (e.g. a user and a surface realiser). The model is adaptive and incremental at the turn level, and optimises NLG actions with respect to a data-driven objective...
Article
Full-text available
We propose a method for learning dialogue management policies from a fixed data set. The method addresses the challenges posed by Information State Update (ISU)-based dialogue systems, which represent the state of a dialogue as a large set of features, resulting in a very large state space and a huge policy space. To address the problem that any fi...
Conference Paper
Full-text available
The Generalized Referring Expression Comprehension (GREC) task extends classic REC by generating image bounding boxes for objects referred to in natural language expressions , which may indicate zero, one, or multiple targets. This generalization enhances the prac-ticality of REC models for diverse real-world applications. However, the presence of...
Conference Paper
Full-text available
Ambiguous Candidate Identification (ACI) in multimodal dialogue is the task of identifying all potential objects that a user's utterance could be referring to in a visual scene, in cases where the reference cannot be uniquely determined. End-to-end models are the dominant approach for this task, but have limited real-world applicability due to unre...
Preprint
Full-text available
This study explores replacing Transformers in Visual Language Models (VLMs) with Mamba, a recent structured state space model (SSM) that demonstrates promising performance in sequence modeling. We test models up to 3B parameters under controlled conditions, showing that Mamba-based VLMs outperforms Transformers-based VLMs in captioning, question an...
Preprint
AI personal assistants deployed via robots or wearables require embodied understanding to collaborate with humans effectively. However, current Vision-Language Models (VLMs) primarily focus on third-person view videos, neglecting the richness of egocentric perceptual experience. To address this gap, we propose three key contributions. First, we int...
Article
Despite the many recent achievements in developing and deploying social robotics, there are still many underexplored environments and applications for which systematic evaluation of such systems by end-users is necessary. While several robotic platforms have been used in gerontological healthcare, the question of whether or not a social interactive...
Article
In recent years, several machine learning models have been proposed. They are trained with a language modelling objective on large-scale text-only data. With such pretraining, they can achieve impressive results on many Natural Language Understanding and Generation tasks. However, many facets of meaning cannot be learned by “listening to the radio”...
Preprint
Drawing on examples from several research projects at the National Robotarium and Alana AI, including SPRING (social robots for elder care), RES-Q+ (spoken dialogue to support stroke patients), and RNIB (assistive visual dialogue for partially-sighted people), I will discuss lessons learned and research directions in developing future generative AI...
Preprint
Full-text available
This paper evaluates the extent to which current Large Language Models (LLMs) can capture task-oriented multi-party conversations (MPCs). We have recorded and transcribed 29 MPCs between patients, their companions, and a social robot in a hospital. We then annotated this corpus for multi-party goal-tracking and intent-slot recognition. People share...
Preprint
Full-text available
We demonstrate an embodied conversational agent that can function as a receptionist and generate a mixture of open and closed-domain dialogue along with facial expressions, by using a large language model (LLM) to develop an engaging conversation. We deployed the system onto a Furhat robot, which is highly expressive and capable of using both verba...
Preprint
Full-text available
SimpleMTOD is a simple language model which recasts several sub-tasks in multimodal task-oriented dialogues as sequence prediction tasks. SimpleMTOD is built on a large-scale transformer-based auto-regressive architecture, which has already proven to be successful in uni-modal task-oriented dialogues, and effectively leverages transfer learning fro...
Conference Paper
Full-text available
This workshop will explore future work in the area of intelligent, conversational, data-driven health interfaces both from patients' and health care professionals' perspectives. We aim to bring together a diverse set of experts and stakeholders to jointly discuss the opportunities and challenges at the intersection of public health care provisionin...
Article
Full-text available
Research at the Interaction Lab focuses on human-agent communication using conversational Natural Language. The ultimate goal is to create systems where humans and AI agents (including embodied robots) can spontaneously form teams and coordinate shared tasks through the use of Natural Language conversation as a universal communication interface. Th...
Preprint
Guessing games are a prototypical instance of the "learning by interacting" paradigm. This work investigates how well an artificial agent can benefit from playing guessing games when later asked to perform on novel NLP downstream tasks such as Visual Question Answering (VQA). We propose two ways to exploit playing guessing games: 1) a supervised le...
Preprint
Full-text available
We study the problem of integrating syntactic information from constituency trees into a neural model in Frame-semantic parsing sub-tasks, namely Target Identification (TI), FrameIdentification (FI), and Semantic Role Labeling (SRL). We use a Graph Convolutional Network to learn specific representations of constituents, such that each constituent i...
Preprint
Full-text available
In visual guessing games, a Guesser has to identify a target object in a scene by asking questions to an Oracle. An effective strategy for the players is to learn conceptual representations of objects that are both discriminative and expressive enough to ask questions and guess correctly. However, as shown by Suglia et al. (2020), existing models f...
Preprint
Full-text available
In this paper, we propose a minimum set of concepts and signals needed to track the social state during Human-Robot Interaction. We look into the problem of complex continuous interactions in a social context with multiple humans and robots, and discuss the creation of an explainable and tractable representation/model of their social interaction. W...
Preprint
Full-text available
Approaches to Grounded Language Learning typically focus on a single task-based final performance measure that may not depend on desirable properties of the learned hidden representations, such as their ability to predict salient attributes or to generalise to unseen situations. To remedy this, we present GROLLA, an evaluation framework for Grounde...
Preprint
Goal-oriented dialogue systems are now being widely adopted in industry where it is of key importance to maintain a rapid prototyping cycle for new products and domains. Data-driven dialogue system development has to be adapted to meet this requirement --- therefore, reducing the amount of data and annotations necessary for training such systems is...
Preprint
Full-text available
We present a new neural architecture for wide-coverage Natural Language Understanding in Spoken Dialogue Systems. We develop a hierarchical multi-task architecture, which delivers a multi-layer representation of sentence meaning (i.e., Dialogue Acts and Frame-like structures). The architecture is a hierarchy of self-attention mechanisms and BiLSTM...
Preprint
Full-text available
In the EU-funded MuMMER project, we have developed a social robot designed to interact naturally and flexibly with users in public spaces such as a shopping mall. We present the latest version of the robot system developed during the project. This system encompasses audio-visual sensing, social signal processing, conversational interaction, perspec...
Preprint
Full-text available
Smart speakers and robots become ever more prevalent in our daily lives. These agents are able to execute a wide range of tasks and actions and, therefore, need systems to control their execution. Current state-of-the-art such as (deep) reinforcement learning, however, requires vast amounts of data for training which is often hard to come by when i...
Preprint
Full-text available
Learning with minimal data is one of the key challenges in the development of practical, production-ready goal-oriented dialogue systems. In a real-world enterprise setting where dialogue systems are developed rapidly and are expected to work robustly for an ever-growing variety of domains, products, and scenarios, efficient learning from a limited...
Preprint
Full-text available
The overall objective of 'social' dialogue systems is to support engaging, entertaining, and lengthy conversations on a wide variety of topics, including social chit-chat. Apart from raw dialogue data, user-provided ratings are the most common signal used to train such systems to produce engaging responses. In this paper we show that social dialogu...
Conference Paper
Full-text available
In a traditional role-playing game (RPG) conversing with a Non-Playable Character (NPC) typically appears somewhat unrealistic and can break immersion and user engagement. In commercial games, the player usually selects one of several possible predefined conversation options which are displayed as text or labels on the screen, to progress the conve...
Preprint
Full-text available
Spontaneous spoken dialogue is often disfluent, containing pauses, hesitations, self-corrections and false starts. Processing such phenomena is essential in understanding a speaker's intended meaning and controlling the flow of the conversation. Furthermore, this processing needs to be word-by-word incremental to allow further downstream processing...
Article
Full-text available
This paper examines the educational efficacy of a learning environment in which children diagnosed with Autism Spectrum Conditions (ASC) engage in social interactions with an artificially intelligent (AI) virtual agent and where a human practitioner acts in support of the interactions. A multi-site intervention study in schools across the UK was co...
Conference Paper
Full-text available
Enabling a robot to properly interact with users plays a key role in the effective deployment of robotic platforms in domestic environments. Robots must be able to rely on interaction to improve their behaviour and adaptively understand their operational world. Semantic mapping is the task of building a representation of the environment , that can...
Article
Ubiquitous mobile computing offers innovative approaches in the delivery of information that can facilitate free roaming of the city, informing and guiding the tourist as the city unfolds before them. However making frequent visual reference to mobile devices can be distracting, the user having to interact via a small screen thus disrupting the exp...
Article
Full-text available
Open-domain social dialogue is one of the long-standing goals of Artificial Intelligence. This year, the Amazon Alexa Prize challenge was announced for the first time, where real customers get to rate systems developed by leading universities worldwide. The aim of the challenge is to converse "coherently and engagingly with humans on popular topics...
Conference Paper
Full-text available
Working in human populated environments requires fast and robust action selection and execution especially when deliberately trying to interact with humans. This work presents the combination of a high-level planner (ROSPlan) for action sequencing and automatically generated finite state machines (PNP) for execution. Using this combined system we a...
Preprint
We present a multi-modal dialogue system for interactive learning of perceptually grounded word meanings from a human tutor. The system integrates an incremental, semantic parsing/generation framework - Dynamic Syntax and Type Theory with Records (DS-TTR) - with a set of visual classifiers that are learned throughout the interaction and which groun...
Preprint
We present an optimised multi-modal dialogue agent for interactive learning of visually grounded word meanings from a human tutor, trained on real human-human tutoring data. Within a life-long interactive learning period, the agent, trained using Reinforcement Learning (RL), must be able to handle natural conversations with human users and achieve...
Article
Full-text available
Natural, spontaneous dialogue proceeds incrementally on a word-by-word basis; and it contains many sorts of disfluency such as mid-utterance/sentence hesitations, interruptions, and self-corrections. But training data for machine learning approaches to dialogue processing is often either cleaned-up or wholly synthetic in order to avoid such phenome...
Article
Full-text available
We investigate an end-to-end method for automatically inducing task-based dialogue systems from small amounts of unannotated dialogue data. It combines an incremental semantic grammar - Dynamic Syntax and Type Theory with Records (DS-TTR) - with Reinforcement Learning (RL), where language generation and dialogue management are a joint decision prob...
Conference Paper
Full-text available
Most of today's task-based spoken dialogue systems perform poorly if the user goal is not within the system's task domain. On the other hand, chatbots cannot perform tasks involving robot actions but are able to deal with unforeseen user input. To overcome the limitations of each of these separate approaches and be able to exploit their strengths,...
Conference Paper
Full-text available
We present an optimised multi-modal dialogue agent for interactive learning of visually grounded word meanings from a human tutor, trained on real human-human tutoring data. Within a life-long interactive learning period, the agent, trained us- ing Reinforcement Learning (RL), must be able to handle natural conversations with human users, and achie...
Article
Decision-making is often dependent on uncertain data, e.g. data associated with confidence scores or probabilities. This article presents a comparison of different information presentations for uncertain data and, for the first time, measures their effects on human decision-making, in the domain of weather forecast generation. We use a game-based s...
Conference Paper
Full-text available
The ability to quickly adapt to new environments and incorporate new knowledge is of great importance for robots operating in unstructured environments and interacting with non-expert users. This paper reports on our current progress in tackling this problem. We propose the development of a framework for teaching robots to perform tasks using natur...
Article
Full-text available
Recognition of social signals, from human facial expressions or prosody of speech, is a popular research topic in human-robot interaction studies. There is also a long line of research in the spoken dialogue community that investigates user satisfaction in relation to dialogue characteristics. However, very little research relates a combination of...