Bonnie L. WebberThe University of Edinburgh | UoE · School of Informatics
Bonnie L. Webber
About
267
Publications
28,484
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
8,203
Citations
Introduction
Skills and Expertise
Additional affiliations
August 1998 - December 2013
September 1978 - December 2013
Publications
Publications (267)
Implicit discourse relation recognition involves determining relationships that hold between spans of text that are not linked by an explicit discourse connective. In recent years, the pre-train, prompt, and predict paradigm has emerged as a promising approach for tackling this task. However, previous work solely relied on manual verbalizers for im...
Discourse relations play a pivotal role in establishing coherence within textual content, uniting sentences and clauses into a cohesive narrative. The Penn Discourse Treebank (PDTB) stands as one of the most extensively utilized datasets in this domain. In PDTB-3, the annotators can assign multiple labels to an example, when they believe that multi...
We examine how the language of online reviews has changed over the past 20 years. The corpora we use for this analysis consist of online reviews, each of which is paired with a numerical rating. This allows us to control for the perceived sentiment of a review when examining its linguistic features. Our findings show that reviews have become less c...
Implicit discourse relation recognition is a challenging task that involves identifying the sense or senses that hold between two adjacent spans of text, in the absence of an explicit connective between them. In both PDTB-2 and PDTB-3, discourse relational senses are organized into a three-level hierarchy ranging from four broad top-level senses, t...
Annotated data is an essential ingredient in natural language processing for training and evaluating machine learning models. It is therefore very desirable for the annotations to be of high quality. Recent work, however, has shown that several popular datasets contain a surprising number of annotation errors or inconsistencies. To alleviate this i...
Annotated data is an essential ingredient in natural language processing for training and evaluating machine learning models. It is therefore very desirable for the annotations to be of high quality. Recent work, however, has shown that several popular datasets contain a surprising amount of annotation errors or inconsistencies. To alleviate this i...
In the PDTB-3, several thousand implicit discourse relations were newly annotated \textit{within} individual sentences, adding to the over 15,000 implicit relations annotated \textit{across} adjacent sentences in the PDTB-2. Given that the position of the arguments to these \textit{intra-sentential implicits} is no longer as well-defined as with \t...
Many NLG tasks such as summarization, dialogue response, or open domain question answering focus primarily on a source text in order to generate a target response. This standard approach falls short, however, when a user's intent or context of work is not easily recoverable based solely on that source text-a scenario that we argue is more of the ru...
Many NLG tasks such as summarization, dialogue response, or open domain question answering focus primarily on a source text in order to generate a target response. This standard approach falls short, however, when a user's intent or context of work is not easily recoverable based solely on that source text -- a scenario that we argue is more of the...
Lexical cohesion is a fundamental mechanism for text which requires a pair of words to be interpreted as a certain type of lexical relation (e.g., similarity) to understand a coherent context; we refer to such relations as the contextual lexical relation. However, work on lexical cohesion has not modeled context comprehensively in considering lexic...
Because the 2020 ACL Lifetime Achievement Award presentation could not be done in person, we replaced the usual LTA talk with an interview between Professor Kathy McKeown (Columbia University) and the recipient, Bonnie Webber. The following is an editted version of the interview, with added citations.
Multi-sentence questions (MSQs) are sequences of questions connected by relations which, unlike sequences of standalone questions, need to be answered as a unit. Following Rhetorical Structure Theory (RST), we recognise that different "question discourse relations" between the subparts of MSQs reflect different speaker intents, and consequently eli...
The PDTB-3 contains many more Implicit discourse relations than the previous PDTB-2. This is in part because implicit relations have now been annotated within sentences as well as between them. In addition, some now co-occur with explicit discourse relations, instead of standing on their own. Here we show that while this can complicate the problem...
It is well-known that abstractive summaries are subject to hallucination---including material that is not supported by the original text. While summaries can be made hallucination-free by limiting them to general phrases, such summaries would fail to be very informative. Alternatively, one can try to avoid hallucinations by verifying that any speci...
Text corpora annotated with language-related properties are an important resource for the development of Language Technology. The current work contributes a new resource for Chinese Language Technology and for Chinese-English translation, in the form of a set of TED talks (some originally given in English, some in Chinese) that have been annotated...
Ellipsis and co-reference are common and ubiquitous especially in multi-turn dialogues. In this paper, we treat the resolution of ellipsis and co-reference in dialogue as a problem of generating omitted or referred expressions from the dialogue context. We therefore propose a unified end-to-end Generative Ellipsis and CO-reference Resolution model...
The availability of corpora to train semantic parsers in English has lead to significant advances in the field. Unfortunately, for languages other than English, annotation is scarce and so are developed parsers. We then ask: could a parser trained in English be applied to language that it hasn't been trained on? To answer this question we explore z...
This paper serves as a short overview of the JNLE special issue on representation of the meaning of the sentence, bringing together traditional symbolic and modern continuous approaches. We indicate notable aspects of sentence meaning and their compatibility with the two streams of research and then summarize the papers selected for this special is...
Negation scope has been annotated in several English and Chinese corpora, and highly accurate models for this task in these languages have been learned from these annotations. Unfortunately, annotations are not available in other languages. Could a model that detects negation scope be applied to a language that it hasn't been trained on? We develop...
Conversational agents are gaining popularity with the increasing ubiquity of smart devices. However, training agents in a data driven manner is challenging due to a lack of suitable corpora. This paper presents a novel method for gathering topical, unstructured conversational data in an efficient way: self-dialogues through crowd-sourcing. Alongsid...
Idiom translation is a challenging problem in machine translation because the meaning of idioms is non-compositional, and a literal translation is likely to be wrong. In this paper, we assess the quality of idiom translation of a modern neural MT system. We introduce a new evaluation method based on an idiom-specific blacklist of literal translatio...
We present Edina, the University of Edinburgh's social bot for the Amazon Alexa Prize competition. Edina is a conversational agent whose responses utilize data harvested from Amazon Mechanical Turk (AMT) through an innovative new technique we call self-dialogues. These are conversations in which a single AMT Worker plays both participants in a dial...
Understanding discourse relies to a great extent on correctly interpreting relations holding between the eventualities and facts mentioned in discourse. These discourse relations, such as causal, contrastive and temporal relations, can be expressed explicitly or implicitly in the discourse, and are the subject of annotation in the Penn Discourse Tr...
Many language technology applications would benefit from the ability to represent negation and its scope on top of widely-used linguistic resources. In this paper, we investigate the possibility of obtaining a first-order logic representation with negation scope marked using Universal Dependencies. To do so, we enhance UDepLambda, a framework that...
The Penn Discourse Treebank (PDTB) was released to the public in 2008. It remains the largest manually annotated corpus of discourse relations to date. Its focus on discourse relations that are either lexically-grounded in explicit discourse connectives or associated with sentential adjacency has not only facilitated its use in language technology...
Smartphones are increasingly ubiquitous, offer many location based services, and act as a conduit for the gathering and delivery of (spatial) information. There is considerable interest in Smartphones that support dialogue based interaction (e.g. Siri), as a way of learning about their surroundings (Bartie & Mackaness 2006, Janarthanam et al. 2013)...
Although the performance of SMT sys-tems has improved over a range of differ-ent linguistic phenomena, negation has not yet received adequate treatment. Previous works have considered the prob-lem of translating negative data as one of data sparsity (Wetzel and Bond (2012)) or of structural differences between source and target language with respec...
We present a French to English translation system for Wikipedia biography articles. We use training data from out-of-domain corpora and adapt the system for biographies. We propose two forms of domain adaptation. The first
biases the system towards words likely in biographies and encourages repetition of words across the document as a whole. Since...
We present a city navigation and tourist information mobile dialogue app with integrated question-answering (QA) and geographic information system (GIS) modules that helps pedestrian users to navigate in and learn about urban environments. In contrast to existing mobile apps which treat these problems independently, our Android app addresses the pr...
In their paper, A Probabilistic Reconciliation of Coherence-Driven and Centering-Driven Theories of Pronoun Interpretation, Kehler & Rohde consider the important question "What would the discourse processing architecture have to look like to allow for a fairly simple theory of pronoun interpretation?", (p.4)
Their answer is "a probabilistic model t...
Manual corpus annotation and automated corpus analysis began with a focus on what was seen as “low-hanging fruit”, such as morphological tags, part-of-speech tags, and syntactic structures. But horizons change, and attention is now focussed somewhat higher—on problems in understanding and reliably annotating other sorts of semantic and pragmatic ph...
We present methods for reducing the worst-case and typical-case complexity of a context-free parsing pipeline via hard constraints derived from finite-state pre-processing. We perform On predictions to determine if each word in the input ...
The discourse properties of text have long been recognized as critical to language tech-nology, and over the past 40 years, our un-derstanding of and ability to exploit the dis-course properties of text has grown in many ways. This essay briefly recounts these de-velopments, the technology they employ, the applications they support, and the new cha...
Every text has at least one topic and at least one genre. Evidence for a text's topic and genre comes, in part, from its lexical and syntactic features—features used in both Automatic Topic Classification and Automatic Genre Classification (AGC). Because an ideal AGC system should be stable in the face of changes in topic distribution, we assess fi...
Studies of discourse relations have not, in the past, attempted to characterize what serves as evidence for them, beyond lists of frozen expressions, or markers, drawn from a few well-defined syntactic classes. In this paper, we describe how the lexicalized discourse relation annotations of the Penn Discourse Treebank (PDTB) led to the discovery of...
Bases for discourse structure Properties of discourse structure relevant to LT Evidence for discourse structure
What is Question Answering?Current State of the Art in Open Domain QACurrent DirectionsFurther ReadingNote
We present an approach to automatically identifying the arguments of discourse connectives based on data from the Penn Discourse Treebank. Of the two arguments of connectives, called Arg1 and Arg2, we focus on Arg1, which has proven more challenging to identify. Our approach employs a sentence-based representation of arguments, and distinguishes in...
The goal of understanding how discourse is more than a sequence of
sentences has engaged researchers for many years. Researchers in the
1970's attempted to gain such understanding by identifying and
classifying the phenomena involved in discourse. This was followed by
attempts in the 1980s and early 1990s to explain discourse phenomena
in terms of...
In this introduction, we present our overview of interactive question answering (IQA). We contextualize IQA in the wider field of question answering, and establish connections to research in Information Retrieval and Dialogue Systems. We highlight the development of QA as a field, and identify challenges in the present research paradigm for which I...
Articles in the Penn TreeBank were iden- tified as being reviews, summaries, let- ters to the editor, news reportage, correc- tions, wit and short verse, or quarterly profit reports. All but the latter three were then characterised in terms of fea- tures manually annotated in the Penn Dis- course TreeBank — discourse connectives and their senses. S...
computational work;language processing;linguistics;language technology;research and development
We present the second version of the Penn Discourse Treebank, PDTB-2.0, describing its lexically-grounded annotations of discourse relations and their two abstract object arguments over the 1 million word Wall Street Journal corpus. We describe all aspects of the annotation, including (a) the argument structure of discourse relations, (b) the sense...
The method of Topic Indexing and Retrieval for QA persented in this paper enables fast and efficent QA for questions with named entity answers. This is achieved by identifying all possible named entity answers in a corpus off-line and gathering all possible evidence for their direct retrieval as answer candidates using standard IR techniques. An ev...
This article is a perspective on some important developments in semantics and in computational linguistics over the past forty years. It reviews two lines of research that lie at opposite ends of the field: semantics and morphology. The semantic part ...
In strategic management there has been a debate over many years. Already in 1962 Alfred Chandler had stated: Structure follows Strategy. In the nineteen eighties, Michael Porter modified Chandler's dictum about structure following strategy by introducing ...
This paper discusses how lexical resources based on semantic roles (i.e. FrameNet, PropBank, VerbNet) can be used for Question Answering, especially Web Question Answering. Two algorithms have been implemented to this end, with quite different characteristics. We discuss both approaches when applied to each of the resources and a combination of the...
In cooperative man-machine interaction, it is necessary but not sufficient for a system to respond truthfully and informatively to a user's question. In particular, if the system has reason to believe that its planned response might mislead the user, then it must block that conclusion by modifying its response. This paper focuses on identifying and...
We describe a framework for creating animated simulations of virtual human agents. The framework allows us to capture flexible patterns of activity, reactivity to a changing environment, and certain aspects of an agent personality model. Each leads to variation in how an animated simulation will be realized. As different parts of an activity make d...
We describe experiments carried out at the University of Edinburgh for our TREC 2006 QA participation. Our main effort was to develop an approach to QA that is based on frame semantics. Two algorithms were implemented to this end, building on the lexical resources FrameNet, PropBank and VerbNet. The first algorithm uses the resources to generate po...
At the start of my career, I had the good fortune of working with Ron Kaplan on Bill Woods'Lunar system (Woods et al., 1972). One day, in talking with Ron, I marvelled to him over the range of syntac-tic constructions I was able to implement in Lunar's ATN grammar formalism. Ron replied was that you could implement anything in an ATN: the point was...
Anatomical information is crucial to human biomedical research but not all research is based on human tissues. However, exploiting discoveries in model organisms such as the mouse at a systems level, involving metabolic and developmental networks in tissues, requires the identification of the links between human and model organism anatomies. The qu...
COBrA is a Java-based ontology editor for bio-ontologies that distinguishes itself from other editors by supporting the linking
of concepts between two ontologies, and providing sophisticated analysis and verification functions. In addition to the Gene
Ontology and Open Biology Ontologies formats, COBrA can import and export ontologies in the Seman...
This report describes the system developed by the University of Edinburgh and the Univer- sity of Sydney for the TREC-2005 question answering evaluation exercise. The backbone of our question-answering platform is QED, a linguistically-principled QA system. We ex- perimented with external sources of knowl- edge, such as Google and Wikipedia, to en-...
This paper surveys work on applying the insights of lexicalized grammars to low-level discourse, to show the value of positing an autonomous grammar for low-level discourse in which words (or idiomatic phrases) are associated with discourse-level predicate–argument structures or modification structures that convey their syntactic-semantic meaning a...
The Penn Discourse TreeBank (PDTB) is a new resource built on top of the Penn Wall Street Journal corpus, in which discourse connectives are annotated along with their arguments. Its use of standoff annotation allows integration with a stand-off version of the Penn TreeBank (syntactic structure) and PropBank (verbs and their arguments), which adds...
Part-of relations are central to anatomy. However, the definition, formalisation and use of part-of in anatomy ontologies is problematic. This paper surveys existing formal approaches, as well as the use of part-of in the Open Biological Ontologies (OBO) anatomies of model species. Based on this analysis, we propose a minimal ontology for anatomy w...
This report describes the experiments of the University of Edinburgh and the University of Sydney at the TREC-2004 question answering evaluation exercise. Our system combines two approaches: one with deep linguistic analysis using IR on the AQUAINT corpus applied to answer extraction from text passages, and one with a shallow linguistic analysis an...
INTRODUCTION Anatomical ontologies play an increasingly important role in indexing data, including gene expression, in model organisms. However, the terms used to name anatomical concepts (tissues) differ between communities, and the ontologies of model organisms differ from one another in ways that do not correspond to differences between the orga...
The accelerating growth in biomedical literature has stimulated activity on automated classification of and information extraction from this literature. The work described here attempts to improve on an earlier classification study associating biological articles to GO codes. It demonstrates the need, under particular assumptions, for more access t...
Abstract This report describes a new open - domain an - swer retrieval system developed at the Uni - versity of Edinburgh and gives results for the TREC - question answering track Phrasal an - swers are identified by increasingly narrowing down the search space from a large text col - lection to a single phrase The system uses document retrieval, q...
This report describes a new open-domain answer retrieval system developed at the University of Edinburgh and gives results for the TREC-12 question answering track. Phrasal answers are identified by increasingly narrowing down the search space from a large text collection to a single phrase. The system uses document retrieval, query-based passage s...