About
86
Publications
9,241
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,868
Citations
Introduction
Skills and Expertise
Publications
Publications (86)
We describe the design, the evaluation setup, and the results of the 2016 WMT shared task on cross-lingual pronoun prediction. This is a classification task in which participants are asked to provide predictions on what pronoun class label should replace a placeholder value in the target-language text, provided in lemmatised and PoS-tagged form. We...
Previous work on pronouns in SMT has focussed on third-person pronouns, treating them all as anaphoric. Little attention has been paid to other uses or other types of pronouns. Believing that further progress requires careful analysis of pronouns as a whole, we have analysed a parallel corpus of annotated English-German texts to highlight some of t...
Previous work on pronouns in SMT has focussed on third-person pronouns, treating them all as anaphoric. Little attention has been paid to other uses or other types of pronouns. Believing that further progress requires careful analysis of pronouns as a whole, we have analysed a parallel corpus of annotated English-German texts to highlight some of t...
We present ParCor, a parallel corpus of texts in which pronoun coreference – reduced coreference in which pronouns are used as referring expressions – has been annotated. The corpus is intended to be used both as a resource from which to learn systematic differences in pronoun use between languages and ultimately for developing and testing informed...
An increasing number of researchers and practitioners in Natural Language Engineering
face the prospect of having to work with entire texts, rather than individual sentences.
While it is clear that text must have useful structure, its nature may be less clear, making
it more difficult to exploit in applications. This survey of work on discourse str...
The discourse properties of text have long been recognized as critical to language technology, and over the past 40 years, our understanding of and ability to exploit the discourse properties of text has grown in many ways. This essay briefly recounts these developments, the technology they employ, the applications they support, and the new challen...
All three approaches (Hobbs, L&L, GJ&W) incorporate syntactic preferences, but in different ways. Hobbs captures syntactic preferences in search order: • Different initial search order for pronouns found before the verb and pronouns found after. • Left-right order within a tree reflects syntactic-role preference (subject>object>adjuncts). 1 • Bread...
While many advances have been made in Natural Language Generation (NLG), the scope of the field has been somewhat restricted because of the lack of annotated corpora from which properties of texts can be automatically acquired and applied towards the development of generation systems. In this paper, we describe how the Penn Discourse Tree-Bank (PDT...
The term discourse structure is used to denote any structure of a text above that of the sentence. Trees have often been posited as a good abstraction when discourse is taken to have a hierarchical structure (Mann and Thompson 1987; Webber et al. 2003; Marcu 2000; Egg and Redeker 2008). Nevertheless, periodically researchers have commented on the n...
In the context of Question Answering (QA) on free text, we assess the value of answer comparison and information fusion in handling multiple answers. We report improvements in answer re-ranking using fusion on a set of location questions and show the advantages of considering candidates as allies rather than competitors. We conclude with some obser...
In developmental biology, to support reasoning about cause and effect, it is critical to link genetic pathways with processes at the cellular and tissue level that take place beforehand, simultaneously or subsequently. While researchers have worked on resolving with respect to absolute time, events mentioned in medical texts such as clinical narrat...
The classical "success story" of corpus annotation are the various syntax treebanks that provide structural analyses of sentences and have enabled researchers to develop a range of new and highly successful data-oriented approaches to sentence parsing. In recent years, however, a number of corpora have been constructed that provide annotations on t...
Current research in developmental biology aims to link developmental genetic path-ways with the processes going on at cel-lular and tissue level. Normal processes will only take place under specific sequen-tial conditions at the level of the pathways. Disrupting or altering pathways may mean disrupted or altered development. This paper is part of a...
An emerging task in text understanding and generation is to categorize information as fact or opinion and to further attribute it to the appropriate source. Corpus annotation schemes aim to encode such distinctions for NLP applications concerned with such tasks, such as information extraction, question answering, summarization, and generation. We d...
The annotations of the Penn Discourse Treebank (PDTB) include (1) discourse connectives and their arguments, and (2) attribution of each argument of each con-nective and of the relation it denotes. Be-cause the PDTB covers the same text as the Penn TreeBank WSJ corpus, syntac-tic and discourse annotation can be com-pared. This has revealed signific...
D-LTAG is a discourse-level extension of lexicalized tree-adjoining grammar (LTAG), in which discourse syntax is projected by different types of discourse connectives and discourse interpretation is a product of compositional rules, anaphora resolution, and inference. In this paper, we present a D-LTAG extension of ongoing work on an LTAG syntax-se...
In this paper, we describe an annotation scheme for the attribution of abstract objects (propositions, facts, and eventualities) associated with discourse relations and their arguments annotated in the Penn Discourse TreeBank. The scheme aims to capture both the source and degrees of factuality of the abstract objects through the annotation of text...
Discourse connectives can be analysed as encoding predicate-argument relations whose arguments derive from the interpretation of discourse units. These arguments can be anaphoric or structural. Although structural arguments can be encoded in a parse tree, anaphoric arguments must be resolved by other means. A study of nine connectives, annotating t...
In open domain Question Answering, answer can- didates are ranked according to individual features such as matching the answer type expected by the question. We report on a technique based on the fusion of candidate answers and their context into answer neighbourhoods to provide better features for ranking and allow shallow reasoning.
The accelerating growth in biomedical literature has stimulated activity on automated classification of and information extraction from this literature. The work described here attempts to improve on an earlier classification study associating biological articles to GO codes. It demonstrates the need, under particular assumptions, for more access t...
We describe our use of an existing resource, the Mouse Anatomical Nomenclature, to improve a symbolic interface to anatomically-indexed gene expression data. The goal is to reduce user effort in specifying anatomical structures of interest and increase precision and recall.
Reading comprehension tests are receiving increased attention within the NLP community as a controlled test-bed for developing, evaluating and comparing robust question answering (NLQA) methods.
The Penn Discourse TreeBank (PDTB) is a new resource built on top of the complete Penn Wall Street Journal corpus, in which discourse connectives are annotated along with their arguments. Its use of stand-off annotation allows integration with a standoff version of the Penn TreeBank (syntactic structure) and PropBank (verbs and their arguments) , w...
Discourse connectives can be analyzed as encoding predicate-argument relations whose arguments derive from the interpretation of discourse units. These arguments can be anaphoric or structural. Although structural arguments can be encoded in a parse tree, anaphoric arguments must be resolved by other means. A study of nine connectives, annotating t...
This paper describes a new, large scale discourse-level annotation project -- the Penn Discourse TreeBank (PDTB). We present an approach to annotating a level of discourse structure that is based on identifying discourse connectives and their arguments. The PDTB is being built directly on top of the Penn TreeBank and Propbank, thus supporting the e...
This paper describes a new discourse-level annotation project -- the Penn Discourse Treebank (PDTB) -- that aims to produce a large-scale corpus in which discourse connectives are annotated, along with their arguments, thus exposing a clearly defined level of discourse structure.
Objects in Discourse. Kluwer, Boston.
Part-of relations are central to anatomy. However, the definition, formalisation and use of part-of in anatomy ontologies is problematic. This paper surveys existing formal approaches, as well as the use of part- of in the Open Biological Ontologies (OBO) anatomies of model species. Based on this analysis, we propose a minimal ontology for anatomy...
This report describes a new open-domain answer retrieval system developed at the University of Edinburgh and gives results for the TREC-12 question answering track. Phrasal answers are identified by increasingly narrowing down the search space from a large text collection to a single phrase. The system uses document retrieval, query-based passage s...
The task of named entity annotation of unseen text has recently been successfully automated with near-human performance. But the full task involves more than annotation, i.e. identifying the scope of each (continuous) text span and its class (such as place name). It also involves grounding the named entity (i.e. establishing its denotation with res...
The task of named entity annotation of unseen text has recently been successfully automated with near-human performance.
Recently, reading comprehension tests for students and adult language learners have received increased attention within the NLP community as a means to develop and evaluate robust question answering (NLQA) methods. We present our ongoing work on automatically creating richly annotated corpus resources for NLQA and on comparing automatic methods for...
Reading comprehension tests are re-ceiving increased attention within the NLP community as a controlled test-bed for developing, evaluating and compar-ing robust question answering (NLQA) methods. To support this, we have en-riched the MITRE CBC4Kids corpus with multiple XML annotation layers recording the output of various tokeniz-ers, lemmatizers...
We present an implementation of a discourse parsing system for a lexicalized Tree-Ajoining Grammar for discourse, specifying the integration of sentence and discourse level processing. Our system is based on the assumption that the compositional aspects of semantics at the discourse-level parallel those at the sentence-level. This coupling is achie...
This paper describes our research on producing justifications. ("U" refers to the user, "S" to the system.) U: Is John taking four courses? Si: No. John can't take any courses: he's not a student
One can only exploit inference in Question-Answering (QA) and assess its contribution systematically, if one knows what inference is contributing to. Thus we identify a set of tasks specific to QA and discuss what inference could contribute to their achievement. We conclude with a proposal for graduated test suites as a tool for assessing the perfo...
We have argued extensively in prior work that discourse connectives can be analyzed as en-coding predicate-argument relations whose ar-guments derived from the interpretation of dis-course units. All adverbial connectives we have analyzed to date have expressed binary relations. But they are special in taking one of their two arguments structurally...
this paper however, we will use a graphic presentation, as in (2) above, which is easier to read than conjunctions of logical formulae
Discourse connectives can be analyzed as encoding predicate-argument relations whose arguments derive from the interpretation of discourse units. These arguments can be anaphoric or structural. Although structural arguments can be encoded in a parse tree, anaphoric arguments must be resolved by other means. A study of nine connectives, annotating t...
Introduction This is a brief, informal and, because of Christmas holidays, regrettably partial survey of current research in computational linguistics at the University of Pennsylvania. Any inaccuracies can be blamed on the departmental egg nog. 2. Extending the Range of Interactive Behavior Perhaps the most activity here in computational linguisti...
this paper, we investigate this revi.-ed principle as applied to question answering. In particular the goals of the research described here are to: 1. characterize tractable cases in which the system as respondent (R) can anticipate the possibility of the user/questioner (Q} drawing false conclusions from its response and can hence alter or expand...
We show that discourse structure need not bear the full burden of conveying discourse relations by showing that many of them can be explained nonstructurally in terms of the grounding of anaphoric presuppositions (Van der Sandt, 1992). This simplifies discourse structure, while still allowing the realisation of a full range of discourse relations....
We address the question of why certain adverb and preposition phrases are only interpretable with respect to the discourse, and not just their own matrix clause. We show that, in many cases, an adverbial's compositional semantics explains why. We close by reporting on an annotation study aimed at providing specific evidence for how adverbials are i...
We argue in this article that many common adverbial phrases generally taken to signal a discourse relation between syntactically connected units within discourse structure instead work anaphorically to contribute relational meaning, with only indirect dependence on discourse structure. This allows a simpler discourse structure to provide scaffoldin...
The process of microplanning in natural language generation (NLG) encompasses a range of problems in which a generator must bridge underlying domain‐specific representations and general linguistic representations. These problems include constructing linguistic referring expressions to identify domain objects, selecting lexical items to express doma...
The process of microplanning encompasses a range of problems in Natural Language Generation (NLG), such as referring expression generation, lexical choice, and aggregation, problems in which a generator must bridge underlying domain-specific representations and general linguistic representations. In this paper, we describe a uniform approach to mic...
In this paper, we claim that the disambiguation of referring expressions in discourse can be formulated in terms that automated reasoners can address. Specifically, we show that consistency, informativity and minimality are criteria which can be (i) implemented using automated reasoning tools and (ii) used to disambiguate noun-noun compounds, meton...
Lexicalized grammars1 such as TAG (Joshi, 1987; XTAG-Group, 1998) and CCG (Steedman, 1996) have been very successful in showing how clauselevel syntax and semantics project from the lexicon. What drives the current enterprise is the hypothesis that the same can be shown true, at some level, for discourse syntax and semantics. Here we demonstrate ou...
this paper on interactions among these elements, since we feel that there is still much that needs to be learned about them from empirical and experimental studies. However, our use of anaphoric presupposition in an account of discourse relations suggests a unified account of discourse connectives, tense (whose anaphoric nature has long been argued...
We show that discourse structure need not bear the full burden of conveying discourse relations by showing that many of them can be explained nonstructurally in terms of the grounding of anaphoric presuppositions (Van der Sandt, 1992). This simplifies discourse structure, while still allowing the realisation of a full range of discourse relations....
We show how discourse structure can be relieved of some of the burden of discourse semantics by seeing part of it as arising from anaphoric presupposition (Van der Sandt, 1992). This allows the discourse structure that supports discourse semantics to be no more complicated than that which is already needed within the clause. 1 Introduction Research...
Managing a patient with multiple injuries is a cognitively intense task. While protocols provide invaluable support for maintaining quality care, they generally address a single condition, while multiple trauma generally involves many. The TraumAID system tries to address this by providing tools for reasoning, planning, plan recognition and text ge...
We focus on the productionof efficient descriptions of objects, actions and events. We define a type of efficiency, textual economy, that exploits the hearer's recognitionof inferential links to material elsewhere within a sentence. Textual economy leads to efficient descriptions because the material that supports such inferences has been included...
This paper examines an approach for integrating 3D structural reasoning, using computer models of the human anatomy, with diagnostic reasoning based on Bayesian networks in order to probabilistically predict injuries to anatomic structures from gunshot wounds. An interactive 3D graphical system has been created which allows the user to visualize di...
this paper, we show that a description-based approach is also useful in discourse, supporting the incremental construction of discourse structure for monologic discourse and the semantics to be associated with such structure.
This report describes some methods underlying automation of maintenance instructions of the sort found in maintenance technical orders. The key problem under review here is the interface between the geometry of the device and the verbal description of the maintenance actions required for the human maintainer. Language generation underlies both the...
this document to examining text generation rather than understanding, though we have current and historical interest in the instruction understanding problem as well [BWKE91, BPW93, WBE
The overall goals of the Center for Human Modeling and Simulation are the investigation of computer graphics modeling, animation, and rendering techniques. Major focii are in behavior-based animation of human movement, modeling through physicsbased techniques, applications of control theory techniques to dynamic models, illumination models for imag...
This thesis presents the Intentional Planning System (ItPlanS) a hierarchical planner that is designed for domains in which plans are required as rapidly as possible, with limited knowledge. These constraints argue for an incremental approach to the planning that does not require the system to make commitments beyond the information that is current...
Animating realistic human agents involves more than just creating movements that look "real". A principal characteristic of humans is their ability to plan and make decisions based on intentions and the local environmental context. "Animated agents" must therefore react to and deliberate about their environment and other agents. Our agent animation...
. In producing realistic, animatable models of the human body, we see much to be gained from developing a functional anatomy that links the anatomical and physiological behavior of the body through fundamental causal principles. This paper describes our current Finite Element Method implementation of a simplified lung and chest cavity during normal...
We present an integrated approach between anatomical and physiological modeling that is useful in medical training and visualization of internal body organ function. In particular, we model the breathing mechanism in the respiratory system because it involves physiological change, such as gas exchange, that depends on gross anatomical deformations...
Much planning research assumes that the goals for which one plans are known in advance. That is not true of trauma management, which involves both a search for relevant goals and reasoning about how to achieve them.
TraumAID is a consultation system for the diagnosis and treatment of multiple trauma. It has been under development jointly at the Uni...
Research on the factors and processes involved in pronoun interpretation has to date concentrated on anaphoric pronouns. Results have supported the now widely-held view that discourse understanding involves the creation of a partial, mental model of the situation described through the discourse. Anaphoric pronouns are taken to refer to elements of...
An oft-heard argument for Natural Language interfaces is their promise of reducing the effort an infrequent user would have to exert in using a computer system. This viewpoint has led to a research concentration on removing “artificial” constraints on a user's freedom of expression, allowing it to move closer to everyday speech. This paper discusse...
Large scale annotated corpora have played a critical role in speech and natu-ral language research. However, while existing annotated corpora such as the Penn Treebank have been highly suc-cessful at the sentence-level, we also need large-scale annotated resources that reliably encode key aspects of dis-course. In this paper, we detail (1) our plan...
Discourse connectives can be analyzed as discourse level predicates which project predicate-argument structure on a par with verbs at the sentence level. The Penn Discourse Treebank (PDTB) reflects this view in its design providing annotation of the discourse connectives and their arguments. Like verbs, discourse connectives have multiple senses. W...
This paper investigates the complexity of dependencies at the discourse level, in particular the dependencies between discourse connectives and their argu-ments. Our study is based on data from the Penn Discourse Treebank (PDTB) and is therefore an exploration into the ways treebanks can inform linguistic issues. We observe that, unlike in syntax,...
For many reasons, Question Answering must deal with questions that have multiple answers. To this end, we have built a framework for answer com-parison based on a technique using information fu-sion that has successfully been applied in automated summarization. The architecture of this system is a direct application of the Model-View-Controller de-...
We describe an experiment involving reranking a set of answer extractions for where-questions based on computing equivalence and inclusion relations among a question and its set of answer extractions.
Taking discourse connectives to be the predicates of binary discourse relations, the goal of Penn Discourse Treebank (PDTB) is to annotate the million word WSJ corpus in the Penn TreeBank with each of its discourse connectives and their arguments. The paper describes the linguistic obser- vations and ideas that led to the PDTB, the decisions that s...