Article

Natural language generation for social robotics: opportunities and challenges

Authors:
If you want to read the PDF, try requesting it from the authors.

Abstract

In the increasingly popular and diverse research area of social robotics, the primary goal is to develop robot agents that exhibit socially intelligent behaviour while interacting in a face-to-face context with human partners. An important aspect of face-to-face social conversation is fluent, flexible linguistic interaction; face-to-face dialogue is both the basic form of human communication and the richest and most flexible, combining unrestricted verbal expression with meaningful non-verbal acts such as gestures and facial displays, along with instantaneous, continuous collaboration between the speaker and the listener. In practice, however, most developers of social robots tend not to use the full possibilities of the unrestricted verbal expression afforded by face-to-face conversation; instead, they generally tend to employ relatively simplistic processes for choosing the words for their robots to say. This contrasts with the work carried out Natural Language Generation (NLG), the field of computational linguistics devoted to the automated production of high-quality linguistic content; while this research area is also an active one, in general most effort in NLG is focused on producing high-quality written text. This article summarizes the state of the art in the two individual research areas of social robotics and natural language generation. It then discusses the reasons why so few current social robots make use of more sophisticated generation techniques. Finally, an approach is proposed to bringing some aspects of NLG into social robotics, concentrating on techniques and tools that are most appropriate to the needs of socially interactive robots. This article is part of the theme issue ‘From social brains to social robots: applying neurocognitive insights to human–robot interaction’.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Indeed, social robots need to be able to sense signals from humans, respond adequately, understand and generate natural language, have reasoning capacities, plan actions and execute movements in line with what is required by the specific context or situation. In this part of our theme issue, three contributions cover the areas of research related to technical solutions for HRI: computational architectures [9], classification and prediction of human behaviour and expressions [10], and natural language processing [11]. These three contributions provide examples of challenges that roboticists and artificial intelligence experts need to face in order to design robots endowed with capabilities crucial for social interactions with humans. ...
... Finally, Foster [11] addresses another crucial competence that is required to enable natural HRI: the ability of the technical system to understand spoken natural language and respond appropriately. Foster provides an overview of methods used in HRI for natural language generation. ...
Article
Amidst the fourth industrial revolution, social robots are resolutely moving from fiction to reality. With sophisticated artificial agents becoming ever more ubiquitous in daily life, researchers across different fields are grappling with the questions concerning how humans perceive and interact with these agents and the extent to which the human brain incorporates intelligent machines into our social milieu. This theme issue surveys and discusses the latest findings, current challenges and future directions in neuroscience- and psychology-inspired human–robot interaction (HRI). Critical questions are explored from a transdisciplinary perspective centred around four core topics in HRI: technical solutions for HRI, development and learning for HRI, robots as a tool to study social cognition, and moral and ethical implications of HRI. Integrating findings from diverse but complementary research fields, including social and cognitive neurosciences, psychology, artificial intelligence and robotics, the contributions showcase ways in which research from disciplines spanning biological sciences, social sciences and technology deepen our understanding of the potential and limits of robotic agents in human social life. This article is part of the theme issue ‘From social brains to social robots: applying neurocognitive insights to human–robot interaction’.
... For instance, when a human assembles furniture, and a robot helps to find the correct pieces, the robot should direct its human partner and describe the target objects effectively. Expressions used for describing objects in terms of their distinguishing features are called referring expressions, and referring expression generation is defined as "choosing the words and phrases to express domain objects" [16]. ...
... Generating appropriate referring expressions has the potential of significantly improving human-robot collaboration. It is one of the most studied areas in natural language generation for social robotics because the problem contains a relatively straightforward input and output [16]. Studies on referring expression generation based on primarily rule-based templates or algorithms [21,35,36,38], and recent studies have addressed this problem using learning-based methods [12,25,31]. ...
Preprint
Full-text available
Effective verbal communication is crucial in human-robot collaboration. When a robot helps its human partner to complete a task with verbal instructions, referring expressions are commonly employed during the interaction. Despite many studies on generating referring expressions, crucial open challenges still remain for effective interaction. In this work, we discuss some of these challenges (i.e., using contextual information, taking users' perspectives, and handling misinterpretations in an autonomous manner).
... On the communication level, speech recognition algorithms are used to interpret human intentions or commands through speech and vocal signals. In this regard, natural language processing (NLP) is concerned with understanding the interactions between computers and human languages through tools such as neural network architectures and learning algorithms [16]. ...
Article
Full-text available
Purpose of Review Research in assistive and rehabilitation robotics is a growing, promising, and challenging field emerged due to various social and medical needs such as aging populations, neuromuscular, and musculoskeletal disorders. Such robots can be used in various day-to-day scenarios or to support motor functionality, training, and rehabilitation. This paper reflects on the human-robot interaction perspective in rehabilitation and assistive robotics and reports on current issues and developments in the field. Recent Findings The survey on the literature reveals that new efforts are put on utilizing machine learning approaches alongside novel developments in sensing technology to adapt the systems with user routines in terms of activities for assistive systems and exercises for rehabilitation devices to fit each user’s need and maximize their effectiveness. Summary A review of recent research and development efforts on human-robot interaction in assistive and rehabilitation robotics is presented in this paper. First, different subdomains in assistive and rehabilitation robotic research are identified, and accordingly, a survey on the background and trends of such developments is provided.
... In this sub-area the interaction complexity increases with the possibility of physical and emotional human contact where ethical and safety aspects must be considered. Even with this complexity several works were carried out to explore specific aspects, such as: autism treatment [3], social cues [4], natural language [5], elderly care [6], robot acceptance [7], education [8], expressing emotions [9] and dexterous in-hand manipulation [10]. It is worth noting that a common approach used to reduce the complexity of the experiment involves the use of the Wizard of Oz (WOZ) [11] technique, where an human operator remotely controls the robotic agent emulating an intelligence that is not yet technically viable [12]. ...
Article
Full-text available
This work contributes to the social robotics area by defining an architecture, called Cognitive Model Development Environment (CMDE) that models the interaction between cognitive and robotic systems. The communication between these systems is formalized with the definition of an ontology, called OntPercept, that models the perception of the environment using the information captured by the sensors present in the robotic system. The formalization offered by the OntPercept ontology simplifies the development, reproduction and comparison of experiments. The validation of the results required the development of two additional components. The first, called Robot House Simulator (RHS), provides an environment where robot and human can interact socially with increasing levels of cognitive processing. The second component is represented by the cognitive system that models the behavior of the robot with the support of artificial intelligence based systems.
... Face and speech recognition and sound location are fraught with difficulties, particularly in a noisy social environment (Deniz et al. 2007). Natural language generation is data intensive and requires sufficient training data (Foster 2019). These and other substantial limitations should be considered in a social context. ...
Article
Full-text available
Socially inspired robotics involves drawing on the observation and study of human social interactions to apply them to the design of sociable robots. As there is increasing expectation that robots may participate in social care and provide some relief for the increasing shortage of human care workers, social interaction with robots becomes of increasing importance. This paper demonstrates the potential of socially inspired robotics through the exploration of a case study of the interaction of a partially sighted social worker with a support worker. This is framed within the capability approach in which the interaction of a human and a sociable robot is understood as resulting in a collaborative capability which is grounded the relationship between the human and the robot rather than the autonomous capabilities of the robot. The implications of applying the case study as an analogy for human–robot interaction are expressed through a discussion of capabilities and social practice and policy. The study is attenuated by a discussion of the technical limits of robots and the extensive complexity of the social context in which it is envisaged sociable robots may be employed.
... The participants appreciated that they can maintain a coherent conversation with the robot where it checked on them periodically and they could give feedback about how they were doing. These results suggest that participants accepted the use of a robot coach to deliver mindfulness sessions over time but expect functionalities such as generation of natural language and expressive gestures for interaction from an autonomous platform [11,23]. We also found significant links between participants' personality traits and their perception scores which suggests that person-specific customization should be included in the design of an autonomous coach. ...
Preprint
Social robots are starting to become incorporated into daily lives by assisting in the promotion of physical and mental wellbeing. This paper investigates the use of social robots for delivering mindfulness sessions. We created a teleoperated robotic platform that enables an experienced human coach to conduct the sessions in a virtual manner by replicating upper body and head pose in real time. The coach is also able to view the world from the robot's perspective and make a conversation with participants by talking and listening through the robot. We studied how participants interacted with a teleoperated robot mindfulness coach over a period of 5 weeks and compared with the interactions another group of participants had with a human coach. The mindfulness sessions delivered by both types of coaching invoked positive responses from the participants for all the sessions. We found that the participants rated the interactions with human coach consistently high in all aspects. However, there is a longitudinal change in the ratings for the interaction with the teleoperated robot for the aspects of motion and conversation. We also found that the participants' personality traits -- conscientiousness and neuroticism influenced the perceptions of the robot coach.
... However, to be believable and pleasant companions, robots should also generate natural language. The reasons why so few current social robots make use of sophisticated generation techniques are discussed by Foster in her paper on "natural language generation for social robotics: opportunities and challenges" [22]. ...
... However, to be believable and pleasant companions, robots should also generate natural language. The reasons why so few current social robots make use of sophisticated generation techniques are discussed by Foster in her paper on "natural language generation for social robotics: opportunities and challenges" [22]. ...
Preprint
Full-text available
Milgram's reality-virtuality continuum applies to interaction in the physical space dimension, going from real to virtual. However, interaction has a social dimension as well, that can go from real to artificial depending on the companion with whom the user interacts. In this paper we present our vision of the Reality-Artificiality bidimensional Continuum (RAC), we identify some challenges in its design and development and we discuss how reliable interactions might be supported inside RAC.
... Referring expressions have been a long consideration of many robotics applications and they are defined as ''choosing the words and phrases to express domain objects'' [12]. Existing studies have been focused on comprehension [13][14][15][16] and generation [8,9,17] of these expressions, and to achieve these, they have exploited spatial relations [4][5][6][7]. ...
Article
Full-text available
For effective verbal communication in collaborative tasks, robots need to account for the different perspectives of their human partners when referring to objects in a shared space. For example, when a robot helps its partner find correct pieces while assembling furniture, it needs to understand how its collaborator perceives the world and refer to objects accordingly. In this work, we propose a method to endow robots with perspective-taking abilities while spatially referring to objects. To examine the impact of our proposed method, we report the results of a user study showing that when the objects are spatially described from the users’ perspectives, participants take less time to find the referred objects, find the correct objects more often and consider the task easier.
... Following the seminal work by Reiter and Dale [23], the most comprehensive survey on DTG to-date has been that by Gatt and Krahmer [28]. Although several articles have taken a close examination of NLG sub-fields such as dialogue systems [29], poetry generation [30], persuasive text generation [31], social robotics [32], or exclusively focus on issues central to NLG such as faithfulness [33] and hallucination [34], a detailed break-down of the last half-decade of innovations has been missing since the last exhaustive body of work. As recent NLG surveys either portray DTG solely on the premise of image and video captioning [35] or provide only a peripheral depiction [36], the need for a close and consolidated examination of developments in neural DTG is more pertinent now than ever. ...
Preprint
Full-text available
The neural boom that has sparked natural language processing (NLP) research through the last decade has similarly led to significant innovations in data-to-text generation (DTG). This survey offers a consolidated view into the neural DTG paradigm with a structured examination of the approaches, benchmark datasets, and evaluation protocols. This survey draws boundaries separating DTG from the rest of the natural language generation (NLG) landscape, encompassing an up-to-date synthesis of the literature, and highlighting the stages of technological adoption from within and outside the greater NLG umbrella. With this holistic view, we highlight promising avenues for DTG research that not only focus on the design of linguistically capable systems but also systems that exhibit fairness and accountability.
... HCRs are envisioned to deliver meaningful benefits through effective interaction with humans to fulfil their expectations [41]. In contrast to automation, which follows pre-programmed "rules" and is limited to specific actions, autonomous robots are required to have a contextguided behavior adaptation capability, which would allow them to have a degree of self-governance [42]. It should enable them to learn and respond actively to situations that were not pre-programmed by the developer. ...
Article
Full-text available
Although progress is being made in affective computing, issues remain in enabling the effective expression of compassionate communication by healthcare robots. Identifying, describing and reconciling these concerns are important in order to provide quality contemporary healthcare for older adults with dementia. The purpose of this case study was to explore the development issues of healthcare robots in expressing compassionate communication for older adults with dementia. An exploratory descriptive case study was conducted with the Pepper robot and older adults with dementia using high-tech digital cameras to document significant communication proceedings that occurred during the activities. Data were collected in December 2020. The application program for an intentional conversation using Pepper was jointly developed by Tanioka's team and the Xing Company, allowing Pepper's words and head movements to be remotely controlled. The analysis of the results revealed four development issues, namely, (1) accurate sensing behavior for "listening" to voices appropriately and accurately interacting with subjects; (2) inefficiency in "listening" and "gaze" activities; (3) fidelity of behavioral responses; and (4) deficiency in natural language processing AI development, i.e., the ability to respond actively to situations that were not pre-programmed by the developer. Conversational engagements between the Pepper robot and patients with dementia illustrated a practical usage of technologies with artificial intelligence and natural language processing. The development issues found in this study require reconciliation in order to enhance the potential for healthcare robot engagement in compassionate communication in the care of older adults with dementia.
... A limited scope would afford the socially-immature ASI an initial foothold to build social intelligence in a particular area of application; but as ASI has more than just language and complex, deeply seated knowledge structures to contend with, ASI will likely need to be able to accurately interpret meaning from combinations of gestures and verbalizations. The development of ASI will require imbuing agents with social interaction abilities that, to be capable of successfully interacting, enable encoding, decoding, perception, and interpretation of a variety of social signals-a major goal, and challenge, for ASI research (Foster, 2019;Joo et al., 2019). However, a true socially intelligent agent should be able to engage with a human agent and derive their intentions, beliefs, goals (i.e., the ASI develops an artificial ToM), and use these models to anticipate what explanations or information may be relevant to the given human agent. ...
Article
Full-text available
In this paper, we discuss the development of artificial theory of mind as foundational to an agent's ability to collaborate with human team members. Agents imbued with artificial social intelligence will require various capabilities to gather the social data needed to inform an artificial theory of mind of their human counterparts. We draw from social signals theorizing and discuss a framework to guide consideration of core features of artificial social intelligence. We discuss how human social intelligence, and the development of theory of mind, can contribute to the development of artificial social intelligence by forming a foundation on which to help agents model, interpret and predict the behaviors and mental states of humans to support human-agent interaction. Artificial social intelligence will need the processing capabilities to perceive, interpret, and generate combinations of social cues to operate within a human-agent team. Artificial Theory of Mind affords a structure by which a socially intelligent agent could be imbued with the ability to model their human counterparts and engage in effective human-agent interaction. Further, modeling Artificial Theory of Mind can be used by an ASI to support transparent communication with humans, improving trust in agents, so that they may better predict future system behavior based on their understanding of and support trust in artificial socially intelligent agents.
... Based on these studies, some simple systems [44,27] for instruction creation were developed using handcrafted templates, i.e., slotting the content into pre-built linguistic structures. Some complicated ones [17] made use of linguistically motivated rules or full-fledged grammars, to better emulate the way people compose instructions and produce outputs in a more flexible and extensible manner [22]. Recent solutions [15,57,18,23] lean on end-to-end, data-driven techniques, without manually crafted templates or rules. ...
Preprint
Since the rise of vision-language navigation (VLN), great progress has been made in instruction following -- building a follower to navigate environments under the guidance of instructions. However, far less attention has been paid to the inverse task: instruction generation -- learning a speaker~to generate grounded descriptions for navigation routes. Existing VLN methods train a speaker independently and often treat it as a data augmentation tool to strengthen the follower while ignoring rich cross-task relations. Here we describe an approach that learns the two tasks simultaneously and exploits their intrinsic correlations to boost the training of each: the follower judges whether the speaker-created instruction explains the original navigation route correctly, and vice versa. Without the need of aligned instruction-path pairs, such cycle-consistent learning scheme is complementary to task-specific training targets defined on labeled data, and can also be applied over unlabeled paths (sampled without paired instructions). Another agent, called~creator is added to generate counterfactual environments. It greatly changes current scenes yet leaves novel items -- which are vital for the execution of original instructions -- unchanged. Thus more informative training scenes are synthesized and the three agents compose a powerful VLN learning system. Extensive experiments on a standard benchmark show that our approach improves the performance of various follower models and produces accurate navigation instructions.
Conference Paper
Social robots are becoming incorporated in daily human lives, assisting in the promotion of the physical and mental wellbeing of individuals. To investigate the design and use of social robots for delivering mindfulness training, we develop a teleoperation framework that enables an experienced Human Coach (HC) to conduct mindfulness training sessions virtually, by replicating their upper-body and head movements onto the Pepper robot, in real-time. Pepper’s vision is mapped onto a Head-Mounted Display (HMD) worn by the HC and a bidirectional audio pipeline is set up, enabling the HC to communicate with the participants through the robot. To evaluate the participants’ perceptions of the teleoperated Robot Coach (RC), we study the interactions between a group of participants and the RC over 5 weeks and compare these with another group of participants interacting directly with the HC. Growth modelling analysis of this longitudinal data shows that the HC ratings are consistently greater than 4 (on a scale of 1􀀀5) for all aspects while an increase is witnessed in the RC ratings over the weeks, for the Robot Motion and Conversation dimensions. Mindfulness training delivered by both types of coaching evokes positive responses from the participants across all the sessions, with the HC rated significantly higher than the RC on Animacy, Likeability and Perceived Intelligence. Participants’ personality traits such as Conscientiousness and Neuroticism are found to influence their perception of the RC. These findings enable an understanding of the differences between the perceptions of HC and RC delivering mindfulness training, and provide insights towards the development of robot coaches for improving the psychological wellbeing of individuals.
Conference Paper
Face-to-face conversation is the basic-and richest-form of human communication. While modern conversational user interfaces are increasingly able to incorporate more and more features of face-to-face conversation, including unrestricted verbal communication and continuous social coordination among the participants, most systems do not take full advantage of the interaction possibilities provided by multimodal, embodied, non-verbal communication. In this position paper, we discuss how this limitation affects the possible applications of conversational user interfaces, and describe how current research in embodied communication and social robotics has the potential to address this limitation, with possible benefits to both research communities.
Article
Full-text available
Artificial Neural Networks are powerful function approximators capable of modelling solutions to a wide variety of problems, both supervised and unsupervised. As their size and expressivity increases, so too does the variance of the model, yielding a nearly ubiquitous overfitting problem. Although mitigated by a variety of model regularisation methods, the common cure is to seek large amounts of training data---which is not necessarily easily obtained---that sufficiently approximates the data distribution of the domain we wish to test on. In contrast, logic programming methods such as Inductive Logic Programming offer an extremely data-efficient process by which models can be trained to reason on symbolic domains. However, these methods are unable to deal with the variety of domains neural networks can be applied to: they are not robust to noise in or mislabelling of inputs, and perhaps more importantly, cannot be applied to non-symbolic domains where the data is ambiguous, such as operating on raw pixels. In this paper, we propose a Differentiable Inductive Logic framework ($\partial$ILP), which can not only solve tasks which traditional ILP systems are suited for, but shows a robustness to noise and error in the training data which ILP cannot cope with. Furthermore, as it is trained by backpropagation against a likelihood objective, it can be hybridised by connecting it with neural networks over ambiguous data in order to be applied to domains which ILP cannot address, while providing data efficiency and generalisation beyond what neural networks on their own can achieve.
Conference Paper
Full-text available
We develop the first system to combine task-based and chatbot-style dialogue in a multimodal system for Human-Robot Interaction. We show that Reinforcement Learning is beneficial for training dialogue management (DM) in such systems -- providing a scalable method for training from data and/or simulated users. We first train in simulation, and evaluate the benefits of a combined chat/task policy over systems which can only perform chat or task-based conversation. In a real user evaluation, we then show that a trained combined chat/task multimodal dialogue policy results in longer dialogue interactions than a rule-based approach, suggesting that the learned dialogue policy provides a more engaging mixture of chat and task interaction than a rule-based DM method.
Conference Paper
Full-text available
Accurately measuring perceptions of robots has become increasingly important as technological progress permits more frequent and extensive interaction between people and robots. Across four studies, we develop and validate a scale to measure social perception of robots. Drawing from the Godspeed Scale and from the psychological literature on social perception, we develop an 18-item scale (The Robotic Social Attribute Scale; RoSAS) to measure people's judgments of the social attributes of robots. Factor analyses reveal three underlying scale dimensions-warmth, competence, and discomfort. We then validate the RoSAS and show that the discomfort dimension does not reflect a concern with unfamiliarity. Using images of robots that systematically vary in their machineness and gender-typicality, we show that the application of these social attributes to robots varies based on their appearance.
Conference Paper
Full-text available
Research on why people refuse or abandon the use of technology in general, and robots specifically, is still scarce. Consequently, the academic understanding of people’s underlying reasons for non-use remains weak. Thus, vital information about the design of these robots including their acceptance and refusal or abandonment by its users is needed. We placed 70 autonomous robots within people’s homes for a period of six months and collected reasons for refusal and abandonment through questionnaires and interviews. Based on our findings, the challenge for robot designers is to create robots that are enjoyable and easy to use to capture users in the short-term, and functionally-relevant to keep those users in the longer-term. Understanding the thoughts and motives behind non-use may help to identify obstacles for acceptance, and therefore enable developers to better adapt technological designs to the benefit of the users.
Article
Full-text available
Modern robotics applications that involve human-robot interaction require robots to be able to communicate with humans seamlessly and effectively. Natural language provides a flexible and efficient medium through which robots can exchange information with their human partners. Significant advancements have been made in developing robots capable of interpreting free-form instructions, but less attention has been devoted to endowing robots with the ability to generate natural language. We propose a navigational guide model that enables robots to generate natural language instructions that allow humans to navigate a priori unknown environments. We first decide which information to share with the user according to their preferences, using a policy trained from human demonstrations via inverse reinforcement learning. We then "translate" this information into a natural language instruction using a neural sequence-to-sequence model that learns to generate free-form instructions from natural language corpora. We evaluate our method on a benchmark route instruction dataset and achieve a BLEU score of 72.18% when compared to human-generated reference instructions. We additionally conduct navigation experiments with human participants that demonstrate that our method generates instructions that people follow as accurately and easily as those produced by humans.
Article
Full-text available
Thanks to the efforts of our community, autonomous robots are becoming capable of ever more complex and impressive feats. There is also an increasing demand for, perhaps even an expectation of, autonomous capabilities from end-users. However, much research into autonomous robots rarely makes it past the stage of a demonstration or experimental system in a controlled environment. If we don't confront the challenges presented by the complexity and dynamics of real end-user environments, we run the risk of our research becoming irrelevant or ignored by the industries who will ultimately drive its uptake. In the STRANDS project we are tackling this challenge head-on. We are creating novel autonomous systems, integrating state-of-the-art research in artificial intelligence and robotics into robust mobile service robots, and deploying these systems for long-term installations in security and care environments. To date, over four deployments, our robots have been operational for a combined duration of 2545 hours (or a little over 106 days), covering 116km while autonomously performing end-user defined tasks. In this article we present an overview of the motivation and approach of the STRANDS project, describe the technology we use to enable long, robust autonomous runs in challenging environments, and describe how our robots are able to use these long runs to improve their own performance through various forms of learning.
Article
Full-text available
Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities. In this survey, we classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual or multimodal representational space. We provide a detailed review of existing models, highlighting their advantages and disadvantages. Moreover, we give an overview of the benchmark image datasets and the evaluation measures that have been developed to assess the quality of machine-generated image descriptions. Finally we extrapolate future directions in the area of automatic image description generation.
Conference Paper
Full-text available
This paper introduces a research effort to develop and evaluate social robots for second language tutoring in early childhood. The L2TOR project will capitalise on recent observations in which social robots have been shown to have marked benefits over screen-based technologies in education, both in terms of learning outcomes and motivation. As language acquisition benefits from early, personalised and interactive tutoring, current language tutoring delivery is often ill-equipped to deal with this: classroom resources are at present inadequate to offer one-to-one tutoring with (near) native speakers in educational and home contexts. L2TOR will address this by furthering the science and technology of language tutoring robots. This document describes the main research strands and expected outcomes of the project.
Conference Paper
Full-text available
Social robots working in public space often stimulate children's curiosity. However, sometimes children also show abusive behavior toward robots. In our case studies, we observed in many cases that children persistently obstruct the robot's activity. Some actually abused the robot by saying bad things, and at times even kicking or punching the robot. We developed a statistical model of occurrence of children's abuse. Using this model together with a simulator of pedestrian behavior, we enabled the robot to predict the possibility of an abuse situation and escape before it happens. We demonstrated that with the model the robot successfully lowered the occurrence of abuse in a real shopping mall.
Article
Full-text available
The natural language generation (NLG) component of a spoken dialogue system (SDS) usually needs a substantial amount of handcrafting or a well-labeled dataset to be trained on. These limitations add significantly to development costs and make cross-domain, multi-lingual dialogue systems intractable. Moreover, human languages are context-aware. The most natural response should be directly learned from data rather than depending on predefined syntaxes or rules. This paper presents a statistical language generator based on a joint recurrent and convolutional neural network structure which can be trained on dialogue act-utterance pairs without any semantic alignments or predefined grammar trees. Objective metrics suggest that this new model outperforms previous methods under the same experimental conditions. Results of an evaluation by human judges indicate that it produces not only high quality but linguistically varied utterances which are preferred compared to n-gram and rule-based systems.
Article
Full-text available
We present a novel approach to sentence simplification which departs from previous work in two main ways. First, it requires neither hand written rules nor a training corpus of aligned standard and simplified sentences. Second, sentence splitting operates on deep semantic structure. We show (i) that the unsupervised framework we propose is competitive with four state-of-the-art supervised systems and (ii) that our semantic based approach allows for a principled and effective handling of sentence splitting.
Article
Full-text available
Because face-to-face dialogue is our first and most common form of communication, we can use it as a prototype to evaluate other forms of communication. Three features are fully present only in face-to-face dialogue: unrestricted verbal expression, meaningful non-verbal acts such as gestures and facial displays, and instantaneous collaboration between speaker and listener. In this paper, we explicate these three dimensions and then use them to measure other communication systems: written text, television, and electronic mail. Users of these other systems often spontaneously accommodate to their limitations by inventing dialogue-like features. Finally, we propose that the design of new communication systems could benefit by using face-to-face dialogue as both a standard and a source of solutions. Résumé: Notre principal outil pour communiquer, le plus universel, consiste en rapports verbaux directs: on peut donc s'en servir comme étalon afin de jauger d'autres moyens de communication. Cette analyse démontre que trois traits se retrouvent dans leur intégralité uniquement dans les rapports verbaux directs: a) une expression orale sans contraintes; b) un paralangage expressif significatif (gestuel, mimique); c) une interaction immédiate entre le sujet émetteur et le sujet récepteur. Dans ce travail, nous examinons en détail ces trois caractéristiques pour les utiliser ensuite dans l'appréciation d'autres moyens de communiquer: l'écrit, la télévision et le courrier électronique. Les personnes se servant de ces médias suppléent souvent aux lacunes de ces derniers en recréant instinctivement les particularités de la conversation. Pour terminer, nous voudrions dire que les rapports verbaux directs--utilisés comme norme, comme clé de problèmes--pourraient être utiles à la conception de nouveaux systèmes de communication.
Conference Paper
Full-text available
We present the design, implementation, and evaluation of a peripheral empathy-evoking robotic conversation companion, Kip1. The robot’s function is to increase people’s awareness to the effect of their behavior towards others, potentially leading to behavior change. Specifically, Kip1 is designed to promote non- aggressive conversation between people. It monitors the conversation’s nonverbal aspects and maintains an emotional model of its reaction to the conversation. If the conversation seems calm, Kip1 responds by a gesture designed to communicate curious interest. If the conversation seems aggressive, Kip1 responds by a gesture designed to communicate fear. We describe the design process of Kip1, guided by the principles of peripheral and evocative. We detail its hardware and software systems, and a study evaluating the effects of the robot’s autonomous behavior on couples’ conversations. We find support for our design goals. A conversation companion reacting to the conversation led to more gaze attention, but not more verbal distraction, compared to a robot that moves but does not react to the conversation. This suggests that robotic devices could be designed as companions to human-human interaction without compromising the natural communication flow between people. Participants also rated the reacting robot as having significantly more social human character traits and as being significantly more similar to them. This points to the robot’s potential to elicit people’s empathy.
Article
Full-text available
We present and evaluate a novel approach to natural language generation (NLG) in statistical spoken dialogue systems (SDS) using a data-driven statistical optimization framework for incremental information presentation (IP), where there is a trade-off to be solved between presenting “enough" information to the user while keeping the utterances short and understandable. The trained IP model is adaptive to variation from the current generation context (e.g. a user and a non-deterministic sentence planner), and it incrementally adapts the IP policy at the turn level. Reinforcement learning is used to automatically optimize the IP policy with respect to a data-driven objective function. In a case study on presenting restaurant information, we show that an optimized IP strategy trained on Wizard-of-Oz data outperforms a baseline mimicking the wizard behavior in terms of total reward gained. The policy is then also tested with real users, and improves on a conventional hand-coded IP strategy used in a deployed SDS in terms of overall task success. The evaluation found that the trained IP strategy significantly improves dialogue task completion for real users, with up to a 8.2% increase in task success. This methodology also provides new insights into the nature of the IP problem, which has previously been treated as a module following dialogue management with no access to lower-level context features (e.g. from a surface realizer and/or speech synthesizer).
Article
Full-text available
GIVE-2.5 evaluates eight natural language generation (NLG) systems that guide human users through solving a task in a virtual environment. The data is collected via the Internet, and to date, 536 interactions of subjects with one of the NLG systems have been recorded. The systems are compared using both task performance measures and subjective ratings by human users.
Conference Paper
Full-text available
We introduce a humanoid robot bartender that is capable of dealing with multiple customers in a dynamic, multi-party social setting. The robot system incorporates state-of-the-art components for computer vision, linguistic processing, state management, high-level reasoning, and robot control. In a user evaluation, 31 participants interacted with the bartender in a range of social situations. Most customers successfully obtained a drink from the bartender in all scenarios, and the factors that had the greatest impact on subjective satisfaction were task success and dialogue efficiency.
Article
Full-text available
This paper describes SimpleNLG, a re-alisation engine for English which aims to provide simple and robust interfaces to generate syntactic structures and linearise them. The library is also flexible in al-lowing the use of mixed (canned and non-canned) representations.
Article
Full-text available
We describe a chart realization algorithm for Combinatory Categorial Grammar (CCG), and show how it can be used to efficiently realize a wide range of coordination phenomena, including argument cluster coordination and gapping. The algorithm incorporates three novel methods for improving the efficiency of chart realization: (i) using rules to chunk the input logical form into sub-problems to be solved independently prior to further combination; (ii) pruning edges from the chart based on the n-gram score of the edge’s string, in comparison to other edges with equivalent categories; and (iii) formulating the search as a best-first anytime algorithm, using n-gram scores to sort the edges on the agenda. The algorithm has been implemented as an extension to the OpenCCG open source CCG parser, and initial performance tests indicate that the realizer is fast enough for practical use in natural language dialogue systems.
Article
Full-text available
This study emphasizes the need for standardized measurement tools for human robot interaction (HRI). If we are to make progress in this field then we must be able to compare the results from different studies. Aliterature review has been performed on the measurements of five key concepts in HRI: anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety. The results have been distilled into five consistent questionnaires using semantic differential scales. We report reliability and validity indicators based on several empirical studies that used these questionnaires. It is our hope that these questionnaires can be used by robot developers to monitor their progress. Psychologists are invited to further develop the questionnaires by adding new concepts, and to conduct further validations where it appears necessary.
Conference Paper
Full-text available
In this paper, we present evidence that although current models for introduction of robotic companions stress individual encounters, a social community alternative is promising. This argument emerges from an experiment we conducted with a small interactive robot at two local nursing homes. Here we give a brief introduction to the robot and our experience at the homes. We compare the robot used to a semi-robotic toy whose use initially suggested to us the benefits of social community models in the presentation of robotics to the elderly. We find that even where individual encounters are significant, sensitivity to social dimensions improve the benefits of these encounters
Article
Full-text available
The experience of interacting with a robot has been shown to be very different in comparison to people’s interaction experience with other technologies and artifacts, and often has a strong social or emotional component—a difference that poses potential challenges related to the design and evaluation of HRI. In this paper we explore this difference, and its implications on evaluating HRI. We outline how this difference is due in part to the general complexity of robots’ overall context of interaction, related to their dynamic presence in the real world and their tendency to invoke a sense of agency. We suggest that due to these differences HCI evaluation methods should be applied to HRI with care, and we present a survey of select HCI evaluation techniques from the perspective of the unique challenges of robots. We propose a view on social interaction with robots that we call the holistic interaction experience, and introduce a set of three perspectives for exploring social interaction with robots: visceral factors of interaction, social mechanics, and social structures. We demonstrate how our three perspectives can be used in practice, both as guidelines to discuss and categorize robot interaction, and as a component in the evaluation process. Further, we propose an original heuristic for brainstorming various possibilities of interaction experiences based on a concept we call the interaction experience map.
Article
Full-text available
This article challenges the received wisdom that template-based approaches to the generation of language are necessarily inferior to other approaches as regards their maintainability, linguistic well-foundedness, and quality of output. Some recent NLG systems that call themselves ''template-based'' will illustrate our claims.
Book
MuMMER (MultiModal Mall Entertainment Robot) is a four-year, EU-funded project with the overall goal of developing a humanoid robot (SoftBank Robotics’ Pepper robot being the primary robot platform) with the social intelligence to interact autonomously and naturally in the dynamic environments of a public shopping mall, providing an engaging and entertaining experience to the general public. Using co-design methods, we will work together with stakeholders including customers, retailers, and business managers to develop truly engaging robot behaviours. Crucially, our robot will exhibit behaviour that is socially appropriate and engaging by combining speech-based interaction with non-verbal communication and human-aware navigation. To support this behaviour, we will develop and integrate new methods from audiovisual scene processing, social-signal processing, high-level action selection, and human-aware robot navigation. Throughout the project, the robot will be regularly deployed in Ideapark, a large public shopping mall in Finland. This position paper describes the MuMMER project: its needs, the objectives, R&D challenges and our approach. It will serve as reference for the robotics community and stakeholders about this ambitious project, demonstrating how a co-design approach can address some of the barriers and help in building follow-up projects.
Conference Paper
Modern robotics applications that involve human-robot interaction require robots to be able to communicate with humans seamlessly and effectively. Natural language provides a flexible and efficient medium through which robots can exchange information with their human partners. Significant advancements have been made in developing robots capable of interpreting free-form instructions, but less attention has been devoted to endowing robots with the ability to generate natural language. We propose a model that enables robots to generate natural language instructions that allow humans to navigate a priori unknown environments. We first decide which information to share with the user according to their preferences, using a policy trained from human demonstrations via inverse reinforcement learning. We then "translate" this information into a natural language instruction using a neural sequence-to-sequence model that learns to generate free-form instructions from natural language corpora. We evaluate our method on a benchmark route instruction dataset and achieve a BLEU score of 72.18% compared to human-generated reference instructions. We additionally conduct navigation experiments with human participants demonstrating that our method generates instructions that people follow as accurately and easily as those produced by humans.
Chapter
In the past 10 years, very few published studies include some kind of extrinsic evaluation of an NLG component in an end-to-end-system, be it for phone or mobile-based dialogues or social robotic interaction. This may be attributed to the fact that these types of evaluations are very costly to set-up and run for a single component. The question therefore arises whether there is anything to be gained over and above intrinsic quality measures obtained in off-line experiments? In this article, we describe a case study of evaluating two variants of an NLG surface realiser and show that there are significant differences in both extrinsic measures and intrinsic measures. These differences can be used to inform further iterations of component and system development.
Conference Paper
We present a natural language generator based on the sequence-to-sequence approach that can be trained to produce natural language strings as well as deep syntax dependency trees from input dialogue acts, and we use it to directly compare two-step generation with separate sentence planning and surface realization stages to a joint, one-step approach. We were able to train both setups successfully using very little training data. The joint setup offers better performance, surpassing state-of-the-art with regards to n-gram-based scores while providing more relevant outputs.
Conference Paper
We present a framework for automatically learning human user models from joint-action demonstrations that enables a robot to compute a robust policy for a collaborative task with a human. First, the demonstrated action sequences are clustered into different human types using an unsupervised learning algorithm. A reward function is then learned for each type through the employment of an inverse reinforcement learning algorithm. The learned model is then incorporated into a mixed-observability Markov decision process (MOMDP) formulation, wherein the human type is a partially observable variable. With this framework, we can infer online the human type of a new user that was not included in the training set, and can compute a policy for the robot that will be aligned to the preference of this user. In a human subject experiment (n=30), participants agreed more strongly that the robot anticipated their actions when working with a robot incorporating the proposed framework (p
Conference Paper
We demonstrate interaction with a relational agent, embodied as a robot, to provide social support for isolated older adults. Our robot supports multiple activities, including discussing the weather, playing cards and checkers socially, maintaining a calendar, talking about family and friends, discussing nutrition, recording life stories, exercise coaching and making video calls.
Article
A 3 x 2 between subjects design examined the effect of shaking hands prior to engaging in a single issue distributive negotiation, where one negotiator performed their role tele-presently through a `Nao' humanoid robot. An additional third condition of handshaking with feedback examined the effect of augmenting the tele-present handshake with haptic and tactile feedback for the non tele-present and tele-present negotiators respectively. Results showed that the shaking of hands prior to negotiating resulted in increased cooperation between negotiators, reflected by economic outcomes that were more mutually beneficial. Despite the fact that the non tele-present negotiator could not see the real face of their counterpart, tele-presence did not affect the degree to which negotiators considered one another to be trustworthy, nor did it affect the degree to which negotiators self-reported as intentionally misleading one another. Negotiators in the more powerful role of buyer rated their impressions of their counterpart more positively, but only if they themselves conducted their negotiations tele-presently. Results are discussed in terms of their design implications for social tele-presence robotics.
Article
Navigation is a basic skill for autonomous robots. In the last years human–robot interaction has become an important research field that spans all of the robot capabilities including perception, reasoning, learning, manipulation and navigation. For navigation, the presence of humans requires novel approaches that take into account the constraints of human comfort as well as social rules. Besides these constraints, putting robots among humans opens new interaction possibilities for robots, also for navigation tasks, such as robot guides. This paper provides a survey of existing approaches to human-aware navigation and offers a general classification scheme for the presented methods.
Article
The standard referring-expression generation task involves creating stand-alone descriptions intended solely to distinguish a target object from its context. However, when an artificial system refers to objects in the course of interactive, embodied dialogue with a human partner, this is a very different setting; the references found in situated dialogue are able to take into account the aspects of the physical, interactive and task-level context, and are therefore unlike those found in corpora of stand-alone references. Also, the dominant method of evaluating generated references involves measuring corpus similarity. In an interactive context, though, other extrinsic measures such as task success and user preference are more relevant - and numerous studies have repeatedly found little or no correlation between such extrinsic metrics and the predictions of commonly used corpus-similarity metrics. To explore these issues, we introduce a humanoid robot designed to cooperate with a human partner on a joint construction task. We then describe the context-sensitive reference-generation algorithm that was implemented for use on this robot, which was inspired by the referring phenomena found in the Joint Construction Task corpus of human-human joint construction dialogues. The context-sensitive algorithm was evaluated through two user studies comparing it to a baseline algorithm, using a combination of objective performance measures and subjective user satisfaction scores. In both studies, the objective task performance and dialogue quality were found to be the same for both versions of the system; however, in both cases, the context-sensitive system scored more highly on subjective measures of interaction quality.
Article
Figures Preface 1. Introduction 2. National Language Generation in practice 3. The architecture of a Natural Language Generation system 4. Document planning 5. Microplanning 6. Surface realisation 7. Beyond text generation Appendix References Index.
Article
This paper reviews “socially interactive robots”: robots for which social human–robot interaction is important. We begin by discussing the context for socially interactive robots, emphasizing the relationship to other research fields and the different forms of “social robots”. We then present a taxonomy of design methods and system components used to build socially interactive robots. Finally, we describe the impact of these robots on humans and discuss open issues. An expanded version of this paper, which contains a survey and taxonomy of current applications, is available as a technical report [T. Fong, I. Nourbakhsh, K. Dautenhahn, A survey of socially interactive robots: concepts, design and applications, Technical Report No. CMU-RI-TR-02-29, Robotics Institute, Carnegie Mellon University, 2002].
Article
One of the main challenges in automatically generating textual weather forecasts is choosing appropriate English words to communicate numeric weather data. A corpus-based analysis of how humans write forecasts showed that there were major differences in how individual writers performed this task, that is, in how they translated data into words. These differences included both different preferences between potential near-synonyms that could be used to express information, and also differences in the meanings that individual writers associated with specific words. Because we thought these differences could confuse readers, we built our SumTime-Mousam weather-forecast generator to use consistent data-to-word rules, which avoided words which were only used by a few people, and words which were interpreted differently by different people. An evaluation by forecast users suggested that they preferred SumTime-Mousam's texts to human-generated texts, in part because of better word choice; this may be the first time that an evaluation has shown that nlg texts are better than human-authored texts.
Book
Natural language generation (NLG) is a subfield of natural language processing (NLP) that is often characterized as the study of automatically converting non-linguistic representations (e.g., from databases or other knowledge sources) into coherent natural language text. In recent years the field has evolved substantially. Perhaps the most important new development is the current emphasis on data-oriented methods and empirical evaluation. Progress in related areas such as machine translation, dialogue system design and automatic text summarization and the resulting awareness of the importance of language generation, the increasing availability of suitable corpora in recent years, and the organization of shared tasks for NLG, where different teams of researchers develop and evaluate their algorithms on a shared, held out data set have had a considerable impact on the field, and this book offers the first comprehensive overview of recent empirically oriented NLG research.
Article
Social intelligence in robots has a quite recent history in artificial intelligence and robotics. However, it has become increasingly apparent that social and interactive skills are necessary requirements in many application areas and contexts where robots need to interact and collaborate with other robots or humans. Research on human-robot interaction (HRI) poses many challenges regarding the nature of interactivity and 'social behaviour' in robot and humans. The first part of this paper addresses dimensions of HRI, discussing requirements on social skills for robots and introducing the conceptual space of HRI studies. In order to illustrate these concepts, two examples of HRI research are presented. First, research is surveyed which investigates the development of a cognitive robot companion. The aim of this work is to develop social rules for robot behaviour (a 'robotiquette') that is comfortable and acceptable to humans. Second, robots are discussed as possible educational or therapeutic toys for children with autism. The concept of interactive emergence in human-child interactions is highlighted. Different types of play among children are discussed in the light of their potential investigation in human-robot experiments. The paper concludes by examining different paradigms regarding 'social relationships' of robots and people interacting with them.
DE-ENIGMA: playfully empowering autistic children. Poster presented at the Autism-Europe Int
  • De-Enigma Project
The E2E NLG shared task
  • V Novikova J Dušek O Rieser
The MuMMER project: engaging human-robot interaction in real-world public spaces
  • J-M Foster Me Alami R Gestranius O Lemon O Niemelä M Odobez
  • A K Pandey
2017 NLG4DS: SIGDIAL 2017 Special Session on Natural Language Generation for Dialogue Systems
  • V Walker M Rieser V Demberg
  • D Klakow
  • Hakkani
DE-ENIGMA: playfully empowering autistic children
  • De-Enigma
  • Project
Natural language generation
  • F Mairesse
The E2E NLG shared task
  • J Novikova
  • V Dušek O Rieser
Shaking hands and cooperation in tele-present human-robot negotiation
  • Stanton Bevan
  • D Fraser