Lillian Lee's research while affiliated with Cornell University and other places

Publications (92)

Preprint
We challenge AI models to "demonstrate understanding" of the sophisticated multimodal humor of The New Yorker Caption Contest. Concretely, we develop three carefully circumscribed tasks for which it suffices (but is not necessary) to grasp potentially complex and unexpected relationships between image and caption, and similarly complex and unexpect...
Preprint
We propose a transition-based bubble parser to perform coordination structure identification and dependency-based syntactic analysis simultaneously. Bubble representations were proposed in the formal linguistics literature decades ago; they enhance dependency trees by encoding coordination boundaries and internal relationships within coordination s...
Preprint
We present our contribution to the IWPT 2021 shared task on parsing into enhanced Universal Dependencies. Our main system component is a hybrid tree-graph parser that integrates (a) predictions of spanning trees for the enhanced graphs with (b) additional graph edges not present in the spanning trees. We also adopt a finetuning strategy where we fi...
Preprint
Naturally-occurring bracketings, such as answer fragments to natural language questions and hyperlinks on webpages, can reflect human syntactic intuition regarding phrasal boundaries. Their availability and approximate correspondence to syntax make them appealing as distant information sources to incorporate into unsupervised constituency parsing....
Preprint
Large-scale semantic parsing datasets annotated with logical forms have enabled major advances in supervised approaches. But can richer supervision help even more? To explore the utility of fine-grained, lexical-level supervision, we introduce Squall, a dataset that enriches 11,276 WikiTableQuestions English-language questions with manually created...
Preprint
Modeling expressive cross-modal interactions seems crucial in multimodal tasks, such as visual question answering. However, sometimes high-performing black-box algorithms turn out to be mostly exploiting unimodal signals in the data. We propose a new diagnostic tool, empirical multimodally-additive function projection (EMAP), for isolating whether...
Preprint
An interesting and frequent type of multi-word expression (MWE) is the headless MWE, for which there are no true internal syntactic dominance relations; examples include many named entities ("Wells Fargo") and dates ("July 5, 2020") as well as certain productive constructions ("blow for blow", "day after day"). Despite their special status and prev...
Article
Moderators of online communities often employ comment deletion as a tool. We ask here whether, beyond the positive effects of shielding a community from undesirable content, does comment removal actually cause the behavior of the comment's author to improve? We examine this question in a particularly well-moderated community, the ChangeMyView subre...
Preprint
Moderators of online communities often employ comment deletion as a tool. We ask here whether, beyond the positive effects of shielding a community from undesirable content, does comment removal actually cause the behavior of the comment's author to improve? We examine this question in a particularly well-moderated community, the ChangeMyView subre...
Article
The echoes of power Understanding social interaction within groups is key to analyzing online communities. Most current work focuses on structural properties: who talks to whom, and how such interactions form larger network structures. The interactions themselves, however, generally take place in the form of natural language – either spoken or writ...
Preprint
Images and text co-occur everywhere on the web, but explicit links between images and sentences (or other intra-document textual units) are often not annotated by users. We present algorithms that successfully discover image-sentence relationships without relying on any explicit multimodal annotation. We explore several variants of our approach on...
Preprint
Controversial posts are those that split the preferences of a community, receiving both significant positive and significant negative feedback. Our inclusion of the word "community" here is deliberate: what is controversial to some audiences may not be so to others. Using data from several different communities on reddit.com, we predict the ultimat...
Preprint
Shi, Huang, and Lee (2017) obtained state-of-the-art results for English and Chinese dependency parsing by combining dynamic-programming implementations of transition-based dependency parsers with a minimal set of bidirectional LSTM features. However, their results were limited to projective parsing. In this paper, we extend their approach to suppo...
Preprint
We generalize Cohen, G\'omez-Rodr\'iguez, and Satta's (2011) parser to a family of non-projective transition-based dependency parsers allowing polynomial-time exact inference. This includes novel parsers with better coverage than Cohen et al. (2011), and even a variant that reduces time complexity to $O(n^6)$, improving over the known bounds in exa...
Article
Multimodal machine learning algorithms aim to learn visual-textual correspondences. Previous work suggests that concepts with concrete visual manifestations may be easier to learn than concepts with abstract ones. We give an algorithm for automatically computing the visual concreteness of words and topics within multimodal datasets. We apply the ap...
Article
We first present a minimal feature set for transition-based dependency parsing, continuing a recent trend started by Kiperwasser and Goldberg (2016a) and Cross and Huang (2016a) of using bi-directional LSTM features. We plug our minimal feature set into the dynamic-programming framework of Huang and Sagae (2010) and Kuhlmann et al. (2011) to produc...
Conference Paper
Group discussions are a way for individuals to exchange ideas and arguments in order to reach better decisions than they could on their own. One of the premises of productive discussions is that better solutions will prevail, and that the idea selection process is mediated by the (relative) competence of the individuals involved. However, since peo...
Article
The content of today's social media is becoming more and more rich, increasingly mixing text, images, videos, and audio. It is an intriguing research question to model the interplay between these different modes in attracting user attention and engagement. But in order to pursue this study of multimodal content, we must also account for context: ti...
Article
Group discussions are a way for individuals to exchange ideas and arguments in order to reach better decisions than they could on their own. One of the premises of productive discussions is that better solutions will prevail, and that the idea selection process is mediated by the (relative) competence of the individuals involved. However, since peo...
Article
When large social-media platforms allow users to easily form and self-organize into interest groups, highly related communities can arise. For example, the Reddit site hosts not just a group called food, but also HealthyFood, foodhacks, foodporn, and cooking, among others. Are these highly related communities created for similar classes of reasons...
Article
In meetings where important decisions get made, what items receive more attention may influence the outcome. We examine how different types of rhetorical (de-)emphasis -- including hedges, superlatives, and contrastive conjunctions -- correlate with what gets revisited later, controlling for item frequency and speaker. Our data consists of transcri...
Article
Gender bias is an increasingly important issue in sports journalism. In this work, we propose a language-model-based approach to quantify differences in questions posed to female vs. male athletes, and apply it to tennis post-match interviews. We find that journalists ask male players questions that are generally more focused on the game when compa...
Conference Paper
Despite the existence of highly successful Internet collaborations on complex projects, including open-source software, little is known about how Internet collaborations work for solving "extremely" difficult problems, such as open-ended research questions. We quantitatively investigate a series of efforts known as the Polymath projects, which tack...
Article
Changing someone's opinion is arguably one of the most important challenges of social interaction. The underlying process proves difficult to study: it is hard to know how someone's opinions are formed and whether and how someone's views shift. Fortunately, ChangeMyView, an active community on Reddit, provides a platform where users present their o...
Article
Full-text available
Most social network analysis works at the level of interactions between users. But the vast growth in size and complexity of social networks enables us to examine interactions at larger scale. In this work we use a dataset of 76M submissions to the social network Reddit, which is organized into distinct sub-communities called subreddits. We measure...
Article
Although analyzing user behavior within individual communities is an active and rich research domain, people usually interact with multiple communities both on- and off-line. How do users act in such multi-community environments? Although there are a host of intriguing aspects to this question, it has received much less attention in the research co...
Article
The strength with which a statement is made can have a significant impact on the audience. For example, international relations can be strained by how the media in one country describes an event in another; and papers can be rejected because they overstate or understate their findings. It is thus important to understand the effects of statement str...
Article
Consider a person trying to spread an important message on a social network. He/she can spend hours trying to craft the message. Does it actually matter? While there has been extensive prior work looking into predicting popularity of social-media content, the effect of wording per se has rarely been studied since it is often confounded with the pop...
Conference Paper
As we all know, more and more of life is now manifested online, and many of the digital traces that are left by human activity are increasingly recorded in natural-language format. This availability oers us the opportunity to glean user-modeling information from individual users’ linguistic behaviors. This talk will discuss the particular phenomeno...
Article
Discussion threads form a central part of the experience on many Web sites, including social networking sites such as Facebook and Google Plus and knowledge creation sites such as Wikipedia. To help users manage the challenge of allocating their attention among the discussions that are relevant to them, there has been a growing need for the algorit...
Article
Full-text available
Understanding the ways in which participants in public discussions frame their arguments is important in understanding how public opinion is formed. In this paper, we adopt the position that it is time for more computationally-oriented research on problems involving framing. In the interests of furthering that goal, we propose the following specifi...
Article
Full-text available
Understanding the ways in which information achieves widespread public awareness is a research question of significant interest. We consider whether, and how, the way in which the information is phrased --- the choice of words and sentence structure --- can affect this process. To this end, we develop an analysis framework and build a corpus of mov...
Article
Full-text available
Understanding social interaction within groups is key to analyzing online communities. Most current work focuses on structural properties: who talks to whom, and how such interactions form larger network structures. The interactions themselves, however, generally take place in the form of natural language --- either spoken or written --- and one co...
Article
Full-text available
We show that information about social relationships can be used to improve user-level sentiment analysis. The main motivation behind our approach is that users that are somehow "connected" may be more likely to hold similar opinions; therefore, relationship information can complement what we can extract about a user's viewpoints from their utteranc...
Article
Full-text available
Conversational participants tend to immediately and unconsciously adapt to each other's language styles: a speaker will even adjust the number of articles and other function words in their next utterance in response to the number in their partner's immediately preceding utterance. This striking level of coordination is thought to have arisen as a w...
Article
Full-text available
The ad hoc retrieval task is to find documents in a corpus that are relevant to a query. Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a structural reranking approach to ad-hoc retrieval that applies to settings with no hyperlink information. We reorder the documents in an initially retrieved set by...
Article
Full-text available
Researchers in textual entailment have begun to consider inferences involving 'downward-entailing operators', an interesting and important class of lexical items that change the way inferences are made. Recent work proposed a method for learning English downward-entailing operators that requires access to a high-quality collection of 'negative pola...
Article
Full-text available
We report on work in progress on extracting lexical simplifications (e.g., "collaborate" -> "work together"), focusing on utilizing edit histories in Simple English Wikipedia for this task. We consider two main approaches: (1) deriving simplification probabilities via an edit model that accounts for a mixture of different operations, and (2) using...
Article
Full-text available
There are many on-line settings in which users publicly express opinions. A number of these offer mechanisms for other users to evaluate these opinions; a canonical example is Amazon.com, where reviews come with annotations like "26 of 32 people found the following review helpful." Opinion evaluation appears in many off-line settings as well, inclu...
Article
Full-text available
An important part of textual inference is making deductions involving monotonicity, that is, determining whether a given assertion entails restrictions or relaxations of that assertion. For instance, the statement 'We know the epidemic spread quickly' does not entail 'We know the epidemic spread quickly via fleas', but 'We doubt the epidemic spread...
Article
The language-modeling approach to information retrieval provides an effective statistical framework for tackling various problems and often achieves impressive empirical performance. However, most previous work on language models for information retrieval focused on document-specific characteristics, and therefore did not take into account the stru...
Conference Paper
An important part of textual inference is making deductions involving monotonicity, that is, determining whether a given assertion entails restrictions or relaxations of that assertion. For instance, the statement 'We know the epidemic spread quickly' does not entail 'We know the epidemic spread quickly via fleas', but 'We doubt the epidemic spread...
Article
Full-text available
We present an approach to improving the precision of an initial document ranking wherein we utilize cluster information within a graph-based framework. The main idea is to perform re-ranking based on centrality within bipartite graphs of documents (on one side) and clusters (on the other side), on the premise that these are mutually reinforcing ent...
Article
An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and un...
Conference Paper
Full-text available
Treating classication as seeking minimum cuts in the appropriate graph has proven ef- fective in a number of applications. The power of this approach lies in its abil- ity to incorporate label-agreement prefer- ences among pairs of instances in a prov- ably tractable way. Label disagreement preferences are another potentially rich source of informa...
Conference Paper
We report on work in progress on using very simple statistics in an unsupervised fashion to re-rank search engine results when review-oriented queries are issued; the goal is to bring opinionated or subjec- tive results to the top of the results list. We find that our proposed technique performs comparably to methods that rely on sophis- ticated pr...
Conference Paper
Full-text available
We describe an array of novel introductory-level courses based on exciting topics in modern artificial intelligence. All present a great deal of often research-level technical content in a rigorous manner while keeping the material accessible to lower-level students. On the other hand, they differ in subject matter and style, since a "one-size-fits...
Article
There have been a number of prior attempts to theoretically justify the eectiveness of the inverse document frequency (IDF). Those that take as their starting point Robertson and Sparck Jones's probabilistic model are based on strong or complex assumptions. We show that a more intuitively plausible assumption suces. Moreover, the new assump- tion,...
Article
We investigate whether one can determine from the transcripts of U.S. Congressional floor debates whether the speeches represent support of or opposition to proposed legislation. To address this problem, we exploit the fact that these speeches occur as part of a discussion; this allows us to use sources of information regarding relationships betwee...
Article
Full-text available
We present a novel approach to pseudo-feedback-based ad hoc retrieval that uses language models induced from both documents and clusters. First, we treat the pseudo-feedback documents produced in response to the original query as a set of pseudo-queries that themselves can serve as input to the retrieval process. Observing that the documents return...
Conference Paper
We investigate whether one can determine from the transcripts of U.S. Congressional floor debates whether the speeches represent support of or opposition to proposed legislation. To address this problem, we exploit the fact that these speeches occur as part of a discussion; this allows us to use sources of information regarding relationships betwee...
Conference Paper
Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a structural re-ranking approach to ad hoc information retrieval: we reorder the documents in an initially retrieved set by exploiting asymmetric relationships between them. Specifically, we consider generation links, which indicate that the language mode...
Article
Sentiment analysis seeks to identify the view- point(s) underlying a text span; an example appli- cation is classifying a movie review as "thumbs up" or "thumbs down". To determine this sentiment po- larity, we propose a novel machine-learning method that applies text-categorization techniques to just the subjective portions of the document. Extrac...
Article
An important component of any generation system is the mapping dictionary, a lexicon of elementary semantic expressions and corresponding natural language realizations.Typically, labor-intensive knowledge-based methods are used to construct the dictionary. We instead propose to acquire it automatically via a novel multiple-pass algorithm employing...
Article
We consider the problem of modeling the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear. We first present an effective knowledge-lean method for learning content models from unannotated documents, utilizing a novel adaptation of algorithms for Hidden Markov Mod...
Article
Full-text available
Most previous work on the recently developed language-modeling approach to information retrieval focuses on document-specific characteristics, and therefore does not take into account the structure of the surrounding corpus. We propose a novel algorithmic framework in which information provided by document-based language models is enhanced by the i...
Article
We address the text-to-text generation problem of sentence-level paraphrasing --- a phenomenon distinct from and more difficult than word- or phrase-level paraphrasing. Our approach applies multiple-sequence alignment to sentences gathered from unannotated comparable corpora: it learns a set of paraphrasing patterns represented by word lattice pair...
Article
We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, ma...
Article
Distributional similarity is a useful notion in estimating the probabilities of rare joint events. It has been employed both to cluster events according to their distributions, and to directly compute averages of estimates for distributional neighbors of a target event. Here, we examine the tradeo#s between model size and prediction accuracy for cl...
Article
This paper describes a new Cornell University course serving as a non-programming introduction to computer science, with natural language processing and information retrieval forming a crucial part of the syllabus. Material was drawn from a wide variety of topics (such as theories of discourse structure and random graph models of the World Wide Web...
Article
In 1975, Valiant showed that Boolean matrix multiplication can be used for parsing context-free grammars (CFGs), yielding the asympotically fastest (although not practical) CFG parsing algorithm known. We prove a dual result: any CFG parser with time complexity $O(g n^{3 - epsilson})$, where $g$ is the size of the grammar and $n$ is the length of t...
Article
We compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency. The similarity-based methods perform up to 40% better on this particular task. We also conclude that events that occur only once in th...
Article
Full-text available
We describe and evaluate experimentally a method for clustering words according to their dis- tribution in particular syntactic contexts. Words are represented by the relative frequency distributions of contexts in which they appear, and relative entropy between those distributions is used as the similarity measure for clustering. Clusters are repr...
Article
Distributional similarity is a useful notion in estimating the probabilities of rare joint events. It has been employed both to cluster events according to their distributions, and to directly compute averages of estimates for distributional neighbors of a target event. Here, we examine the tradeoffs between model size and prediction accuracy for c...
Article
We consider the problem of creating document representations in which inter-document similarity measurements correspond to semantic similarity. We first present a novel subspace-based framework for formalizing this task. Using this framework, we derive a new analysis of Latent Semantic Indexing (LSI), showing a precise relationship between its perf...
Article
Estimating word co-occurrence probabili- ties is a problem underlying many appli- cations in statistical natural language pro- cessing. Distance-weighted (or similarity- weighted) averaging has been shown to be a promising approach to the analysis of novel co-occurrences. Many measures of distri- butional similarity have been proposed for use in th...
Article
Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation algorithms rely either on a lexicon and grammar or on pre-segmented data. In contrast, we introduce a novel statistical method utilizing unsegmented training data, with per...
Article
this paper as "the first clear demonstration of a probabilistic parser outperforming a trigram model" (pg. 457), it does not discuss what features of the algorithm lead to its superior results
Article
Natural language generation is usually divided into separate text planning and linguistic components. This division, though, assumes that the two components can operate independently, which is not always true. The IGEN generator eliminates the ...
Article
We compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency. The similarity-based methods perform up to 40% better on this particular task. We also conclude that events that occur only once in th...
Article
Word segmentation is an important issue in Japanese language processing because Japanese is written without space delimiters between words. We propose a simple dictionary-less method to segment Japanese kanji sequences into words based solely on character n-gram counts from an unannotated corpus. The performance was often better than that of rule-b...
Article
Word segmentation is an important issue in Japanese language processing because Japanese is written without space delimiters between words. We propose a simple dictionary-less method to segment Japanese kanji sequences into words based solely on character n-gram counts from an unannotated corpus. The performance was often better than that of rule-b...
Article
We compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency. The similarity-based methods perform up to 40% better on this particular task. We also conclude that events that occur only once in th...
Article
In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations “eat a peach” and ”eat a beach” is more likely. Statistical NLP methods determine the likelihood of a word combination from its f...
Article
In many applications of natural language processing it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations "eat a peach" and "eat a beach" is more likely. Statistical NLP methods determine the likelihood of a word combination according to its...
Conference Paper
In many applications of natural language processing it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations "eat a peach" and "eat a beach" is more likely. Statistical NLP methods determine the likelihood of a word combination according to its...

Citations

... Recently, some of the research efforts propose to further train the fine-tuned embeddings with specific training data or in a larger model architecture to improve model performance. Shi and Lee (2021) proposed two-stage fine-tuning, which first trains a general multilingual Enhanced Universal Dependency (Bouma et al., 2021) parser and then finetunes on each specific language separately. Wang et al. (2021a) proposed to train models through concatenating fine-tuned embeddings. ...
... Wikipedia is comprised of documents with rich metadata, which can be used as naturally-occurring supervision for a variety of NLP tasks. One example is hyperlinks, which have been used for parsing (Spitkovsky et al., 2010;Søgaard, 2017;Shi et al., 2021a), named entity recognition (Kazama and Torisawa, 2007;Nothman et al., 2008;Richman and Schone, 2008;Ghaddar and Langlais, 2017), entity disambiguation and linking (Bunescu and Paşca, 2006;Cucerzan, 2007;Mihalcea, 2007;Mihalcea and Csomai, 2007;Milne and Witten, 2008;Hoffart et al., 2011;Le and Titov, 2019), coreference resolution (Rahman and Ng, 2011;Singh et al., 2012a;Zheng et al., 2013;Eirew et al., 2021), and generating Wikipedia articles . ...
... The second group in Table 1 concerns analyzing AITA using statistical methods, such as creating a taxonomy of moral discussions [37], analyzing the correlation between users' demographics and blame assignments [9], and identifying linguistic features in moral judgment [50]. However, no work has studied the effects of the descriptions on individuals' agentiveness reflected in social media. ...
... The possible reason lies in the relatively more implicit and obscure imagetext relations therein (Vempala and Preotiuc-Pietro, 2019), whereas the image-text pairs in the widelyused datasets outside social media (e.g., COCO dataset (Lin et al., 2014), VQA dataset (Antol et al., 2015), VCR dataset (Zellers et al., 2019)) tend to present explicit information overlap. Such issue is nevertheless ignored in many previous solutions, which follow the common practice to fuse visual and lingual features (Vempala and Preotiuc-Pietro, 2019;Hessel and Lee, 2020;Botelho et al., 2021), making it hard for a multimodal model to well align cross-modal semantics attributed to their weak correlations (Fei et al., 2022). ...
... SQUALL [80] is based on a previous dataset named Wik-iTableQuestions [67], consisting of NL Questions posed on Wikipedia tables along with the expected answers. In contrast to WikiSQL, there are no structured queries in the WikiTable-Questions dataset. ...
... questioned the adequacy of these policies [67] (i.e., which content should be deleted) or surfaced instances of objectionable content that was not removed [59]. Prior research studied potential biases in moderation of YouTube comments [41,42], the impact of content moderation on user behavior on Reddit [40,58], and the magnitude of content moderation regarding third-party links to conspiracy theory stories [49]. It is still an open question how quickly content is removed, independent of the specific policy. ...
... Most benchmark data in vision and language assumes strong image-text correlations, and many multimodal models are hence designed to explore the common semantics shared by the two modalities. However, it has been recently pointed out that many real-world scenarios, including social media, tend to present image-text pairs with weak and intricate cross-modal interactions (Vempala and Preotiuc-Pietro, 2019;Hessel et al., 2019;Fei et al., 2022). ...
... These antisocial behaviors, include toxicity (Pavlopoulos et al. 2020;Ive, Anuchitanukul, and Specia 2021), abusive language and content (Vidgen et al. 2019), hate speech (Fortuna and Nunes 2018;Mozafari, Farahbakhsh, and Crespi 2019), trolling (Mojica 2017), offense (Meaney et al. 2021),and racism (Field et al. 2021). Earlier work primarily relies on hand-crafted features (Hessel and Lee 2019;Zhang et al. 2018), whereas recent work takes advantage of deep neural networks (Chang and Danescu-Niculescu-Mizil 2019). In contrast to detection of antisocial behaviors, another line of work (Bao et al. 2021) takes a different perspective to study early cues and design metrics for quantifying and predicting prosocial outcomes in online conversations. ...
... Transition based dependency parsing algorithms are widely used and extensively explored in the eld of Natural Language Processing (NLP) [3,4,5]. The majority of the research activities has been aiming to increase the accuracy of the algorithms and much less research was done with the aim of providing less energy consuming variants. ...
... D EPENDENCY parsing [1] is an important task in natural language processing (NLP) and a large number of methods have been proposed, most of which can be divided into two categories: graph-based methods [2], [3] and transition-based methods [4]- [6] . In this paper, we focus on graph-based dependency parsing, which traditionally has higher parsing accuracy. ...