About
123
Publications
13,429
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,578
Citations
Introduction
Skills and Expertise
Current institution
Additional affiliations
September 2012 - present
Publications
Publications (123)
We combine logical and distributional rep-resentations of natural language meaning by transforming distributional similarity judg-ments into weighted inference rules using Markov Logic Networks (MLNs). We show that this framework supports both judg-ing sentence similarity and recognizing tex-tual entailment by appropriately adapting the MLN impleme...
We address the task of computing vector space representations for the meaning of word oc- currences, which can vary widely according to context. This task is a crucial step towards a robust, vector-based compositional account of sentence meaning. We argue that existing mod- els for this task do not take syntactic structure sufficiently into account...
The vast majority of work on word senses has relied on predefined sense invento- ries and an annotation schema where each word instance is tagged with the best fit- ting sense. This paper examines the case for a graded notion of word meaning in two experiments, one which uses WordNet senses in a graded fashion, contrasted with the "winner takes all...
Interpreting and assessing goal driven actions is vital to understanding and reasoning over complex events. It is important to be able to acquire the knowledge needed for this understanding, though doing so is challenging. We argue that such knowledge can be elicited through a participant achievement lens. We analyze a complex event in a narrative...
We study semantic construal in grammatical constructions using large language models. First, we project contextual word embeddings into three interpretable semantic spaces, each defined by a different set of psycholinguistic feature norms. We validate these interpretable spaces and then use them to automatically derive semantic characterizations of...
Developing methods to adversarially challenge NLP systems is a promising avenue for improving both model performance and interpretability. Here, we describe the approach of the team "longhorns" on Task 1 of the The First Workshop on Dynamic Adversarial Data Collection (DADC), which asked teams to manually fool a model on an Extractive Question Answ...
Property inference involves predicting properties for a word from its distributional representation. We focus on human-generated resources that link words to their properties and on the task of predicting these properties for unseen words. We introduce the use of label propagation, a semi-supervised machine learning approach, for this task and, in...
Discourse signals are often implicit, leaving it up to the interpreter to draw the required inferences. At the same time, discourse is embedded in a social context, meaning that interpreters apply their own assumptions and beliefs when resolving these inferences, leading to multiple, valid interpretations. However, current discourse data and framew...
Humans use language to accomplish a wide variety of tasks - asking for and giving advice being one of them. In online advice forums, advice is mixed in with non-advice, like emotional support, and is sometimes stated explicitly, sometimes implicitly. Understanding the language of advice would equip systems with a better grasp of language pragmatics...
We propose a method for controlled narrative/story generation where we are able to guide the model to produce coherent narratives with user-specified target endings by interpolation: for example, we are told that Jim went hiking and at the end Jim needed to be rescued, and we want the model to incrementally generate steps along the way. The core of...
Recent progress in NLP witnessed the development of large-scale pre-trained language models (GPT, BERT, XLNet, etc.) based on Transformer (Vaswani et al. 2017), and in a range of end tasks, such models have achieved state-of-the-art results, approaching human performance. This clearly demonstrates the power of the stacked self-attention architectur...
Recent progress in NLP witnessed the development of large-scale pre-trained language models (GPT, BERT, XLNet, etc.) based on Transformer (Vaswani et al. 2017), and in a range of end tasks, such models have achieved state-of-the-art results, approaching human performance. This demonstrates the power of the stacked self-attention architecture when p...
The news coverage of events often contains not one but multiple incompatible accounts of what happened. We develop a query-based system that extracts compatible sets of events (scenarios) from such data, formulated as one-class clustering. Our system incrementally evaluates each event's compatibility with already selected events, taking order into...
Implicit arguments, which cannot be detected solely through syntactic cues, make it harder to extract predicate-argument tuples. We present a new model for implicit argument prediction that draws on reading comprehension, casting the predicate-argument tuple with the missing argument as a query. We also draw on pointer networks and multi-hop comput...
Discourse structure is integral to understanding a text and is helpful in many NLP tasks. Learning latent representations of discourse is an attractive alternative to acquiring expensive labeled discourse data. Liu and Lapata (2018) propose a structured attention mechanism for text classification that derives a tree over a text, akin to an RST disc...
The first step in discourse analysis involves dividing a text into segments. We annotate the first high-quality small-scale medical corpus in English with discourse segments and analyze how well news-trained segmenters perform on this domain. While we expectedly find a drop in performance, the nature of the segmentation errors suggests some problem...
Implicit arguments, which cannot be detected solely through syntactic cues, make it harder to extract predicate-argument tuples. We present a new model for implicit argument prediction that draws on reading comprehension, casting the predicate-argument tuple with the missing argument as a query. We also draw on pointer networks and multi-hop comput...
During natural disasters and conflicts, information about what happened is often confusing, messy, and distributed across many sources. We would like to be able to automatically identify relevant information and assemble it into coherent narratives of what happened. To make this task accessible to neural models, we introduce Story Salads, mixtures...
Distributional data tells us that a man can swallow candy, but not that a man can swallow a paintball, since this is never attested. However both are physically plausible events. This paper introduces the task of semantic plausibility: recognizing plausible but possibly novel events. We present a new crowdsourced dataset of semantic plausibility ju...
We test whether distributional models can do one-shot learning of definitional properties from text only. Using Bayesian models, we find that first learning overar-ching structure in the known data, regularities in textual contexts and in properties, helps one-shot learning, and that individual context items can be highly informative. Our experimen...
NLP tasks differ in the semantic information they require, and at this time no single semantic representation fulfills all requirements. Logic-based representations characterize sentence structure, but do not capture the graded aspect of meaning. Distributional models give graded similarity ratings for words and phrases, but do not capture sentence...
We consider the task of predicting lexical entailment using distributional vectors. We perform a novel qualitative analysis of one existing model which was previously shown to only measure the prototypicality of word pairs. We find that the model strongly learns to identify hypernyms using Hearst patterns, which are well known to be predictive of l...
Distributional models describe the meaning of a word in terms of its observed contexts. They have been very successful in computational linguistics. They have also been suggested as a model for how humans acquire (partial) knowledge about word meanings. But that raises the question of what, exactly, distributional models can learn, and the question...
Word sense disambiguation and the related field of automated word sense induction traditionally assume that the occurrences of a lemma can be partitioned into senses. But this seems to be a much easier task for some lemmas than others. Our work builds on recent work that proposes describing word meaning in a graded fashion rather than through a str...
NLP tasks differ in the semantic information they require, and at this time
no single semantic representation fulfills all requirements. Logic-based
representations characterize sentence structure, but do not capture the graded
aspect of meaning. Distributional models give graded similarity ratings for
words and phrases, but do not adequately captu...
As a format for describing the meaning of natural language sentences, probabilistic logic combines the expressivity of first-order logic with the ability to handle graded information in a principled fashion. But practical probabilistic logic frameworks usually assume a finite domain in which each entity corresponds to a constant in the logic (domai...
We test the Distributional Inclusion Hypothesis, which states that hypernyms tend to occur in a superset of contexts in which their hyponyms are found. We find that this hypothesis only holds when it is applied to relevant dimensions. We propose a robust supervised approach that achieves accuracies of .84 and .85 on two existing datasets and that c...
We present the first large-scale English "all-words lexical substitution" corpus. The size of the corpus provides a rich resource for investigations into word meaning. We investigate the nature of lexical substitute sets, comparing them to WordNet synsets. We find them to be consistent with, but more fine-grained than, synsets. We also identify sig...
Probabilistic Soft Logic (PSL) is a re-cently developed framework for proba-bilistic logic. We use PSL to combine logical and distributional representations of natural-language meaning, where distri-butional information is represented in the form of weighted inference rules. We ap-ply this framework to the task of Seman-tic Textual Similarity (STS)...
First-order logic provides a powerful and flexible mechanism for representing natural language semantics. However, it is an open question of how best to integrate it with uncertain, weighted knowledge, for example regarding word meaning. This paper describes a mapping between predicates of logical form and points in a vector space. This mapping is...
We represent natural language semantics by combining logical and distributional information in probabilistic logic. We use Markov Logic Networks (MLN) for the RTE task, and Probabilistic Soft Logic (PSL) for the STS task. The system is evaluated on the SICK dataset. Our best system achieves 73% accuracy on the RTE task, and a Pearson’s correlation...
Word Sense Induction (WSI) is the task of identifying the different uses (senses) of a target word in a given text in an unsupervised manner, i.e. without relying on any external resources such as dictionaries or sense-tagged data. This paper presents ...
Word sense disambiguation (WSD) is an old and important task in computational linguistics that still remains challenging, to machines as well as to human annotators. Recently there have been several proposals for representing word meaning in context that diverge from the traditional use of a single best sense for each occurrence. They represent wor...
Graded models of word meaning in context characterize the meaning of individual usages (occurrences) without reference to dictionary senses. We introduce a novel approach that frames the task of computing word meaning in context as a probabilistic inference problem. The model represents the meaning of a word as a probability distribution over poten...
Distributional representations have recently been proposed as a general-purpose representation of natural language meaning, to replace logical form. There is, however, one important difference between logical and distributional representations: Logical languages have a clear semantics, while distributional representations do not. In this paper, we...
Distributional models represent a word through the contexts in which it has been observed. They can be used to predict similarity in meaning, based on the distributional hypothesis, which states that two words that occur in similar contexts tend to have similar meanings. Distributional approaches are often implemented in vector space models. They r...
First-order logic provides a powerful and flexible mechanism for representing natural language semantics. However, it is an open question of how best to integrate it with uncertain, probabilistic knowledge, for example regarding word meaning. This paper describes the first steps of an approach to recasting first-order semantics into the probabilist...
We consider a new subproblem of unsupervised parsing from raw text, unsupervised partial parsing---the unsupervised version of text chunking. We show that addressing this task directly, using probabilistic finite-state methods, produces better results than relying on the local predictions of a current best unsu-pervised parser, Seginer's (2007) CCL...
We present a vector space–based model for selectional preferences that predicts plausibility scores for argument headwords. It does not require any lexical resources (such as WordNet). It can be trained either on one corpus with syntactic annotation, or on a combination of a small semantically annotated primary corpus and a large, syntactically ana...
We present an approach to unsupervised partial parsing: the identification of low-level constituents (which we dub clumps) in unannotated text. We begin by showing that CCLParser (Seginer 2007), an unsupervised parsing model, is particularly adept at identifying clumps, and that, surprisingly, building a simple right-branching structure above its c...
In this paper, we argue in favor of re-considering models for word meaning, using as a basis results from cognitive sci-ence on human concept representation. More specifically, we argue for a more flexible representation of word meaning than the assignment of a single best-fitting dictionary sense to each occurrence: Ei-ther use dictionary senses,...
With the urgent need to document the world’s dying languages, it is important to explore ways to speed up language documentation efforts. One promising avenue is to use techniques from computational linguistics to automate some of the process. Here we consider unsupervised morphological segmentation and active learning for creating interlinear glos...
This paper describes ongoing work on dis- tributional models for word meaning in context. We abandon the usual one-vector- per-word paradigm in favor of an exemplar model that activates only relevant occur- rences. On a paraphrasing task, we find that a simple exemplar model outperforms more complex state-of-the-art models.
We define the crouching Dirichlet, hidden Markov model (CDHMM), an HMM for part- of-speech tagging which draws state prior dis- tributions for each local document context. This simple modification of the HMM takes advantage of the dichotomy in natural lan- guage between content and function words. In contrast, a standard HMM draws all prior dis- tr...
We describe an approach for connecting language and geog-raphy that anchors natural language expressions to specific regions of the Earth, implemented in our TextGrounder system. The core of the system is a region-topic model, which we use to learn word distribu-tions for each region discussed in a given corpus. This model performs toponym resoluti...
Vector space models of word meaning typically represent the meaning of a word as a vector computed by summing over all its corpus occurrences. Words close to this point in space can be assumed to be similar to it in meaning. But how far around this point does the region of similar meaning extend? In this paper we discuss two models that represent w...
The appropriateness of paraphrases for words depends often on context: "grab" can replace "catch" in "catch a ball", but not in "catch a cold". Structured Vector Space (SVS) (Erk and Padó, 2008) is a model that computes word meaning in context in order to assess the appropriateness of such paraphrases. This paper investigates "best-practice" parame...
Semantic space models represent the meaning of a word as a vector in high-dimensional space. They offer a framework in which the mean-ing representation of a word can be computed from its context, but the question remains how they support inferences. While there has been some work on paraphrase-based inferences in semantic space, it is not clear ho...
Many approaches to unsupervised mor- phology acquisition incorporate the fre- quency of character sequences with re- spect to each other to identify word stems and affixes. This typically involves heuris- tic search procedures and calibrating mul- tiple arbitrary thresholds. We present a simple approach that uses no thresholds other than those invo...
Word sense disambiguation is typically phrased as the task of labeling a word in context with the best-fitting sense from a sense inventory such as WordNet. While questions have often been raised over the choice of sense inventory, computational linguists have readily accepted the best-fitting sense methodology despite the fact that the case for di...
Abstract Both vector space models and graph ran- dom walk models can be used to determine similarity between,concepts. Noting that vectors can be regarded as local views of a graph, we directly compare vector space models and graph random,walk models on standard tasks of predicting human,simi- larity ratings, concept categorization, and semantic pr...
Semantic space models represent the meaning of a word as a vector in high-dimensional space. They offer a framework in which the meaning representation of a word can be computed from its context, but the question remains how they support inferences. While there has been some work on paraphrase-based inferences in semantic space, it is not clear how...
In this article, we address the task of comparing and combining different semantic verb classifications within one language.
We present a methodology for the manual analysis of individual resources on the level of semantic features. The resulting representations can be aligned across resources, and allow a contrastive analysis of these resources. I...
We describe course adaptation and develop- ment for teaching computational linguistics for the diverse body of undergraduate and graduate students the Department of Linguis- tics at the University of Texas at Austin. We also discuss classroom tools and teaching aids we have used and created, and we mention our efforts to develop a campus-wide compu...
We describe course adaptation and development for teaching computational linguistics for the diverse body of undergraduate and graduate students the Department of Linguistics at the University of Texas at Austin. We also discuss classroom tools and teaching aids we have used and created, and we mention our efforts to develop a campus-wide computati...
We propose a new XML format for representing interlinearized glossed text (IGT), particularly in the context of the documentation and description of endangered languages. The proposed representation, which we call IGT-XML, builds on previous models but provides a more loosely coupled and flexible representation of different annotation layers. Desig...
We express dominance constraints in the once-only nesting fragment of stratified context unification, which therefore is NP-complete.
We propose a new, simple model for the auto- matic induction of selectional preferences, using corpus-based semantic similarity metrics. Fo- cusing on the task of semantic role labeling, we compute selectional preferences for seman- tic roles. In evaluations the similarity-based model shows lower error rates than both Resnik's WordNet-based model a...
This task consists of recognizing words and phrases that evoke semantic frames as defined in the FrameNet project (http: //framenet.icsi.berkeley.edu), and their semantic dependents, which are usually, but not always, their syntactic dependents (including subjects). The train-ing data was FN annotated sentences. In testing, participants automatical...
This task consists of recognizing words and phrases that evoke semantic frames as defined in the FrameNet project (http://framenet.icsi.berkeley.edu), and their semantic dependents, which are usually, but not always, their syntactic dependents (including subjects). The training data was FN annotated sentences. In testing, participants automatically...
We propose a new XML format for representing interlinearized glossed text (IGT), particularly in the context of the documentation and description of endangered languages. The proposed representation, which we call IGT-XML, builds on previous models but provides a more loosely coupled and flexible representation of different annotation layers. Desig...
We address the problem of unknown word sense detection: the identification of cor- pus occurrences that are not covered by a given sense inventory. We model this as an instance of outlier detection, using a simple nearest neighbor-based approach to measuring the resemblance of a new item to a training set. In combination with a method that alleviat...
In this paper, we describe the SALTO tool. It was originally developed for the annotation of semantic roles in the frame semantics paradigm, but can be used for graphical annotation of treebanks with general relational information in a simple drag-and-drop fashion. The tool additionally supports corpus management and quality control.
This paper presents SHALMANESER, a software package for shallow semantic parsing, the automatic assignment of semantic classes and roles to free text. SHALMANESER is a toolchain of independent modules communicating through a common XML format. System output can be inspected graphically. SHALMANESER can be used either as a "black box" to obtain sema...
This paper describes the SALSA corpus, a large German corpus manually annotated with role-semantic information, based on the syntactically annotated TIGER newspaper corpus (Brants et al., 2002). The first release, comprising about 20,000 annotated predicate instances (about half the TIGER corpus), is scheduled for mid-2006. In this paper we discuss...
We analyze models for semantic role assignment by defining a meta-model that abstracts over features and learning paradigms. This meta-model is based on the concept of role confusability, is de- fined in information-theoretic terms, and predicts that roles realized by less specific grammatical functions are more difficult to assign. We find that co...
In this paper, we present a rule-based system for the assignment of FrameNet frames by way of a "detour via WordNet". The system can be used to overcome sparse-data problems of statistical systems trained on current FrameNet data. We devise a weighting scheme to select the best frame(s) out of a set of candidate frames, and present first figures of...
This paper presents a manual pilot study in cross-linguistic analysis at the predicate-argument level. Looking at translation pairs differing in their parts of speech, we find that predicate-argument structure abstracts somewhat from morphosyntac-tic language idiosyncrasies, but there is still considerable variation in the distri-bution of semantic...
We describe a statistical approach to semantic role labelling that employs only shallow information.
This thesis studies the Constraint Language for Lambda Structures (CLLS), which is interpreted over lambda terms represented as tree-like structures. Our main focus is on the processing of parallelism constraints, a construct of CLLS. A parallelism constraint states that two pieces of a tree have the same structure.
We present a sound and complete...
We present an LFG syntax-semantics interface for the semi-automatic annotation of frame semantic roles for German in the SALSA
project. The architecture is intended to support a bootstrapping cycle for the acquisition of stochastic models for frame
semantic role assignment, starting from manual annotations on the basis of the syntactically annotate...
We present two XML formats for the description and encoding of semantic role information in corpora. The TIGER/SALSA XML format provides a modular representation for semantic roles and syntactic structure. The Text-SALSA XML format is a lightweight version of TIGER/SALSA XML designed for manual annotation with an XML editor rather than a special to...
We describe the ongoing construction of a large, semantically annotated corpus resource as reliable basis for the largescale acquisition of word-semantic information, e.g. the construction of domainindependent lexica. The backbone of the annotation are semantic roles in the frame semantics paradigm. We report experiences and evaluate the annotated...
The Constraint Language for Lambda Structures (CLLS) is an expressive tree description language. It provides a uniform framework for underspecified semantics, covering scope, ellipsis, and anaphora. Efficient algorithms exist for the sublanguage that models scope. But so far no terminating algorithm exists for sublanguages that model ellipsis. We i...
We describe the ongoing construction of a large, semantically annotated corpus resource as reliable basis for the large-scale acquisition of word-semantic information, e.g. the construction of domain-independent lexica. The backbone of the annotation are semantic roles in the frame semantics paradigm. We report experiences and evaluate the annotate...
We describe the ongoing construction of a large, semantically annotated corpus resource as reliable basis for the large-scale acquisition of word-semantic information, e.g. the construction of domain-independent lexica. The backbone of the annotation are semantic roles in the frame semantics paradigm. We report experiences and evaluate the annotate...