Eleni MiltsakakiUniversity of Pennsylvania | UP
Eleni Miltsakaki
PhD in Computational Linguistics, UPenn
About
58
Publications
16,390
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,857
Citations
Introduction
Additional affiliations
July 2003 - December 2012
June 2013 - present
Publications
Publications (58)
We conduct a feasibility study into the applicability of answer-unaware question generation models to textbook passages. We show that a significant portion of errors in such systems arise from asking irrelevant or uninterpretable questions and that such errors can be ameliorated by providing summarized input. We find that giving these models human-...
We consider the problem of Vision-and-Language Navigation (VLN). The majority of current methods for VLN are trained end-to-end using either unstructured memory such as LSTM, or using cross-modal attention over the egocentric observations of the agent. In contrast to other works, our key insight is that the association between language and vision i...
Sentence simplification is the task of rewriting texts so they are easier to understand. Recent research has applied sequence-to-sequence (Seq2Seq) models to this task, focusing largely on training-time improvements via reinforcement learning and memory augmentation. One of the main problems with applying generic Seq2Seq models for simplification i...
Concession is one of the trickiest semantic discourse relations appearing in natural language. Many have tried to sub-categorize Concession and to define formal criteria to both distinguish its subtypes as well as for distinguishing Concession from the (similar) semantic relation of Contrast. But there is still a lack of consensus among the differe...
Readability formulas are methods used to match texts with the readers' reading level. Several methodological paradigms have previously been investigated in the field. The most popular paradigm dates several decades back and gave rise to well known readability formulas such as the Flesch formula (among several others). This paper compares this appro...
In this paper we report the results of a web search study. The goal of the study is to record and analyze student web search behavior. We recorded a) how many distinct keywords were used per student and b) how many sites were explored. We also kept field notes to supplement the quantitative data. The study was conducted over a period of six months...
How people refer to objects in the world, how people comprehend reference, and how children acquire an understanding of and an ability to use reference.
This volume brings together contributions by prominent researchers in the fields of language processing and language acquisition on topics of common interest: how people refer to objects in the wor...
We are developing a web-search application to locate and evaluate potential reading material on the internet. Our application, Read-X, performs a keyword search of the internet, analyzes the readability of text from each resulting website and classifies the text according to theme. This tool will be useful to adolescent and adult low-level reading...
Antelogue is a pronoun resolution prototype designed to be released as off-the-shelf software to be used autonomously or integrated with larger anaphora resolution or other NLP systems. It has modules to handle pronouns in both text and dialogue. In Antelogue, the problem of pronoun resolution is addressed as a two-step process: a) acquiring inform...
We present Antelogue, a novel pronoun resolution architecture for dialogues based on efficient filtering of potential antecedents through a simple look-up of information using existing resources (gender, number, NER, etc). Our system does not require large labelled datasets for training or complex handcrafted rules. We will demo the system's real t...
This paper describes Read-X, a system designed to identify text that is appropriate for the reader given his thematic choices and the reading ability asso- ciated with his educational background. To our knowledge, Read-X is the first web-based system that performs real-time searches and returns results classified thematically and by reading level w...
Movies and TV are a rich source of diverse and complex video of people, objects, actions and locales “in the wild”. Harvesting
automatically labeled sequences of actions from video would enable creation of large-scale and highly-varied datasets. To
enable such collection, we focus on the task of recovering scene structure in movies and TV series fo...
While many advances have been made in Natural Language Generation (NLG), the scope of the field has been somewhat restricted because of the lack of annotated corpora from which properties of texts can be automatically acquired and applied towards the development of generation systems. In this paper, we describe how the Penn Discourse Tree-Bank (PDT...
The automatic analysis and categorization of web text has witnessed a booming interest due to the increased text availability of different formats, content, genre and authorship. We present a new tool that searches the web and performs in real-time a) html-free text extrac- tion, b) classification for thematic content and c) evaluation of expected...
An important aspect of discourse understanding and genera- tion involves the recognition and processing of discourse relations. These are conveyed by discourse connectives, i.e., lexical items like because and as a result or implicit connectives expressing an inferred discourse rela- tion. The Penn Discourse TreeBank (PDTB) provides annotations of...
We present the second version of the Penn Discourse Treebank, PDTB-2.0, describing its lexically-grounded annotations of discourse relations and their two abstract object arguments over the 1 million word Wall Street Journal corpus. We describe all aspects of the annotation, including (a) the argument structure of discourse relations, (b) the sense...
The most recent release of PDTB 2.0 contains annotations of senses of connectives. The PDTB 2.0 manual describes the hierarchical set of senses used in the annotation and offers rough semantic descriptions of each label. In this paper, we refine the semantics of concession sub- stantially and offer a formal description of concessive relations and t...
This report contains the guidelines for the annotation of discourse relations in the Penn Discourse Treebank (http://www.seas.upenn.edu/~pdtb), PDTB. Discourse relations in the PDTB are annotated in a bottom up fashion, and capture both lexically realized relations as well as implicit relations. Guidelines in this report are provided for all aspect...
We describe the problem of anaphora resolution and discuss approaches to modeling this problem. Centering Theory (CT), which is an approach to modeling certain aspects of local coherence in discourse, includes within it the component that models anaphora resolution. However, CT itself is not a theory of anaphora resolution. It was developed as part...
In this paper we present a corpus study and a sentence completion experiment designed to evaluate the discourse prominence of entities evoked in relative clauses. The corpus study shows a preference for referring expressions after a sentence final relative clause to select a matrix clause entity as their antecedents. In the sentence completion expe...
The annotations of the Penn Discourse Treebank (PDTB) include (1) discourse connectives and their arguments, and (2) attribution of each argument of each con-nective and of the relation it denotes. Be-cause the PDTB covers the same text as the Penn TreeBank WSJ corpus, syntac-tic and discourse annotation can be com-pared. This has revealed signific...
Centering Theory (Grosz et al 1995) was developed as a model of local coherence in discourse. Coherence in Centering is evaluated in terms of center transitions. Four transitions are defined and ranked to reflect four degrees of coherence: Continue>Retain>Smooth-Shift>Rough-Shift. Center transitions are computed for each processed 'utterance' by me...
Discourse connectives can be analysed as encoding predicate-argument relations whose arguments derive from the interpretation of discourse units. These arguments can be anaphoric or structural. Although structural arguments can be encoded in a parse tree, anaphoric arguments must be resolved by other means. A study of nine connectives, annotating t...
The Penn Discourse TreeBank (PDTB) is a new resource built on top of the Penn Wall Street Journal corpus, in which discourse connectives are annotated along with their arguments. Its use of standoff annotation allows integration with a stand-off version of the Penn TreeBank (syntactic structure) and PropBank (verbs and their arguments), which adds...
The Penn Discourse TreeBank (PDTB) is a new resource built on top of the complete Penn Wall Street Journal corpus, in which discourse connectives are annotated along with their arguments. Its use of stand-off annotation allows integration with a standoff version of the Penn TreeBank (syntactic structure) and PropBank (verbs and their arguments) , w...
Discourse connectives can be analyzed as encoding predicate-argument relations whose arguments derive from the interpretation of discourse units. These arguments can be anaphoric or structural. Although structural arguments can be encoded in a parse tree, anaphoric arguments must be resolved by other means. A study of nine connectives, annotating t...
This paper describes a new, large scale discourse-level annotation project -- the Penn Discourse TreeBank (PDTB). We present an approach to annotating a level of discourse structure that is based on identifying discourse connectives and their arguments. The PDTB is being built directly on top of the Penn TreeBank and Propbank, thus supporting the e...
This paper describes a new discourse-level annotation project -- the Penn Discourse Treebank (PDTB) -- that aims to produce a large-scale corpus in which discourse connectives are annotated, along with their arguments, thus exposing a clearly defined level of discourse structure.
Existing software systems for automated essay scoring can provide NLP researchers with opportunities to test certain theoretical hypotheses, including some derived from Centering Theory. In this study we employ the Educational Testing Service's e-rater essay scoring system to examine whether local discourse coherence, as defined by a measure of Cen...
We present an implementation of a discourse parsing system for a lexicalized Tree-Ajoining Grammar for discourse, specifying the integration of sentence and discourse level processing. Our system is based on the assumption that the compositional aspects of semantics at the discourse-level parallel those at the sentence-level. This coupling is achie...
We have argued extensively in prior work that discourse connectives can be analyzed as en-coding predicate-argument relations whose ar-guments derived from the interpretation of dis-course units. All adverbial connectives we have analyzed to date have expressed binary relations. But they are special in taking one of their two arguments structurally...
The aim of this paper is to investigate the distribution of the structural and semantic focusing effect (e.g., Stevenson et al (1994) and Grosz, Joshi and Weinstein (1995) respectively) on pronominal intepretation, and determine the conditions under which one prevails over the other. We propose that the syntactic locality created by subordinate cla...
The central claim of this thesis is that, unlike main clauses, adjunct subordinate clauses do not form independent processing units in the computation of entity-based topic continuity (attention structure) in discourse. This claim has two primary consequences. First, discourse entities in adjunct subordinate clauses are assigned lower salience than...
Existing software systems for automated essay scoring can provide NLP researchers with opportunities to test certain theoretical hypotheses, including some derived from Centering Theory. In this study we employ ETS's e-rater essay scoring system to examine whether local discourse coherence, as defined by a measure of Rough-Shift transitions, might...
The problem of proposing referents for anaphoric expressions has been extensively researched in the literature and significant insights have been gained through the various approaches. However, no single model is capable of handling all the cases. We argue that this is due to a failure of the models to identify two distinct processes. Drawing on cu...
Discourse connectives can be analyzed as encoding predicate-argument relations whose arguments derive from the interpretation of discourse units. These arguments can be anaphoric or structural. Although structural arguments can be encoded in a parse tree, anaphoric arguments must be resolved by other means. A study of nine connectives, annotating t...
This paper presents the results of empirical studies on ve goal-oriented discourses, in which Centering Shifts and Cue phrases are used to retrieve embedded segment boundaries. Discourse segments are de ned as ful lled goals according to the Stack Model of Discourse. A four-step procedure for producing expandable sets of PopCues and PushCues is pre...
In this paper we are concerned with the location of topics in text processing and the determination of the update unit in looking up topic continuations and topic shifts. Using key elements of the Centering Model of local discourse coherence and empirical evidence from Modern Greek and Japanese we argue that the appropriate update unit for topic tr...
This paper explores the role of Centering Theory, in particular Rough-Shift identification, in locating abrupt topic shifts in student essays. Rough-Shifts within student paragraphs are generated by short-lived topics and are therefore indicative of poor topic development. We develop a Rough-Shift-based metric of incoherence to represent a coherenc...
According to prior work on the interpretation of anaphoric elements, the form of linguistic expression used for reference to a discourse entity reflects the entity's degree of salience. When both fully specified and underspecified forms of reference are available the most underspecified forms are used for reference to the most salient entity. In th...
Large scale annotated corpora have played a critical role in speech and natu-ral language research. However, while existing annotated corpora such as the Penn Treebank have been highly suc-cessful at the sentence-level, we also need large-scale annotated resources that reliably encode key aspects of dis-course. In this paper, we detail (1) our plan...
Discourse connectives can be analyzed as discourse level predicates which project predicate-argument structure on a par with verbs at the sentence level. The Penn Discourse Treebank (PDTB) reflects this view in its design providing annotation of the discourse connectives and their arguments. Like verbs, discourse connectives have multiple senses. W...
Taking discourse connectives to be the predicates of binary discourse relations, the goal of Penn Discourse Treebank (PDTB) is to annotate the million word WSJ corpus in the Penn TreeBank with each of its discourse connectives and their arguments. The paper describes the linguistic obser- vations and ideas that led to the PDTB, the decisions that s...
This paper presents a corpus-based analysis on the discourse functions of weak and strong forms of referring in Greek. We focus on null subjects as well as overt weak and strong pronominal forms. The distribution of the pronominal paradigms in a Greek corpus reveals multiple discourse functions. Specifically, null pronouns signify continuation on t...