Su Wang

Su Wang
  • PhD student in Computational Linguistics, UT Austin
  • Research Assistant at University of Texas at Austin

About

16
Publications
22,716
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
190
Citations
Introduction
PhD student at the Department of {Linguistics, Statistics and Data Science}, University of Texas at Austin. Research Assistant to Dr. Katrin Erk
Current institution
University of Texas at Austin
Current position
  • Research Assistant
Additional affiliations
August 2016 - present
University of Texas at Austin
Position
  • Research Assistant
Description
  • Interests: Machine Learning, Bayesian Statistics, Natural Language Processing
Education
August 2015 - August 2020
University of Texas at Austin
Field of study
  • Computational Linguistics and Statistics
August 2012 - June 2014

Publications

Publications (16)
Preprint
We propose a method for controlled narrative/story generation where we are able to guide the model to produce coherent narratives with user-specified target endings by interpolation: for example, we are told that Jim went hiking and at the end Jim needed to be rescued, and we want the model to incrementally generate steps along the way. The core of...
Preprint
The news coverage of events often contains not one but multiple incompatible accounts of what happened. We develop a query-based system that extracts compatible sets of events (scenarios) from such data, formulated as one-class clustering. Our system incrementally evaluates each event's compatibility with already selected events, taking order into...
Article
Paraphrasing is rooted in semantics. We show the effectiveness of transformers (Vaswani et al. 2017) for paraphrase generation and further improvements by incorporating PropBank labels via a multi-encoder. Evaluating on MSCOCO and WikiAnswers, we find that transformers are fast and effective, and that semantic augmentation for both transformers and...
Conference Paper
Full-text available
Paraphrasing is rooted in semantics. We show the effectiveness of transformers (Vaswani et al. 2017) for paraphrase generation and further improvements by incorporating PropBank labels via a multi-encoder. Evaluating on MSCOCO and WikiAnswers, we find that transformers are fast and effective, and that semantic augmentation for both transformers and...
Preprint
During natural disasters and conflicts, information about what happened is often confusing, messy, and distributed across many sources. We would like to be able to automatically identify relevant information and assemble it into coherent narratives of what happened. To make this task accessible to neural models, we introduce Story Salads, mixtures...
Preprint
Paraphrasing is rooted in semantics. We show the effectiveness of transformers (Vaswani et al. 2017) for paraphrase generation and further improvements by incorporating PropBank labels via a multi-encoder. Evaluating on MSCOCO and WikiAnswers, we find that transformers are fast and effective, and that semantic augmentation for both transformers and...
Preprint
Distributional data tells us that a man can swallow candy, but not that a man can swallow a paintball, since this is never attested. However both are physically plausible events. This paper introduces the task of semantic plausibility: recognizing plausible but possibly novel events. We present a new crowdsourced dataset of semantic plausibility ju...
Conference Paper
Full-text available
Distributional data tells us that a man can swallow candy, but not that a man can swallow a paintball, since this is never attested. However both are physically plausible events. This paper introduces the task of semantic plausibility: recognizing plausible but possibly novel events. We present a new crowdsourced dataset of semantic plausibility ju...
Article
Full-text available
We explore techniques to maximize the effectiveness of discourse information in the task of authorship attribution. We present a novel method to embed discourse features in a Convolutional Neural Network text classifier, which achieves a state-of-the-art result by a substantial margin. We empirically investigate several featurization methods to und...
Conference Paper
Full-text available
We explore techniques to maximize the effectiveness of discourse information in the task of authorship attribution. We present a novel method to embed discourse features in a Convolutional Neural Network text classifier, which achieves a state-of-the-art result by a substantial margin. We empirically investigate several featuriza-tion methods to un...
Conference Paper
Full-text available
We test whether distributional models can do one-shot learning of definitional properties from text only. Using Bayesian models, we find that first learning overar-ching structure in the known data, regularities in textual contexts and in properties, helps one-shot learning, and that individual context items can be highly informative. Our experimen...
Working Paper
Full-text available
Conference Paper
Full-text available
I present an in-detail introduction to Topic Models (TM), a family of probabilistic models for (mainly) document modeling. I introduce and motivate the model, and illustrate its applications in Natural Language Processing (NLP), with the particular focus on a thorough description and derivation of the common inference algorithms proposed for TMs. I...
Presentation
Full-text available
In this tutorial, I present an intuitive introduction to the Generative Adversarial Network (GAN), invented by Ian Goodfellow of Google Brain, overview the general idea of the model, and describe the algorithm for training it as per the original work. I further briefly introduce the application of GAN in Natural Language Processing to show its flex...

Network

Cited By