Yikang Shen

Yikang Shen
Université de Montréal | UdeM · Department of Computer Science and Operations Research

About

29
Publications
2,587
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
347
Citations

Publications

Publications (29)
Preprint
Full-text available
Humans can leverage prior experience and learn novel tasks from a handful of demonstrations. In contrast to offline meta-reinforcement learning, which aims to achieve quick adaptation through better algorithm design, we investigate the effect of architecture inductive bias on the few-shot learning capability. We propose a Prompt-based Decision Tran...
Preprint
Full-text available
Many complex real-world tasks are composed of several levels of sub-tasks. Humans leverage these hierarchical structures to accelerate the learning process and achieve better generalization. In this work, we study the inductive bias and propose Ordered Memory Policy Network (OMPN) to discover subtask hierarchy by learning from demonstration. The di...
Preprint
Full-text available
There are two major classes of natural language grammars -- the dependency grammar that models one-to-one correspondences between words and the constituency grammar that models the assembly of one or several corresponded words. While previous unsupervised parsing methods mostly focus on only inducing one class of grammars, we introduce a novel mode...
Preprint
Transformers do not scale very well to long sequence lengths largely because of quadratic self-attention complexity. In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models. To this date, there is...
Preprint
Syntax is fundamental to our thinking about language. Although neural networks are very successful in many tasks, they do not explicitly model syntactic structure. Failing to capture the structure of inputs could lead to generalization problems and over-parametrization. In the present work, we propose a new syntax-aware language model: Syntactic Or...
Preprint
We model the recursive production property of context-free grammars for natural and synthetic languages. To this end, we present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves, allowing us to compute the likelihood of a sequence of $N$ tokens under a latent tree model, which we maximise to train...
Preprint
It is commonly believed that knowledge of syntactic structure should improve language modeling. However, effectively and computationally efficiently incorporating syntactic structure into neural language models has been a challenging topic. In this paper, we make use of a multi-task objective, i.e., the models simultaneously predict words as well a...
Preprint
Stack-augmented recurrent neural networks (RNNs) have been of interest to the deep learning community for some time. However, the difficulty of training memory models remains a problem obstructing the widespread use of such models. In this paper, we propose the Ordered Memory architecture. Inspired by Ordered Neurons (Shen et al., 2018), we introdu...
Preprint
The ability to understand logical relationships between sentences is an important task in language understanding. To aid in progress for this task, researchers have collected datasets for machine learning and evaluation of current systems. However, like in the crowdsourced Visual Question Answering (VQA) task, some biases in the data inevitably occ...
Preprint
Recurrent neural network (RNN) models are widely used for processing sequential data governed by a latent tree structure. Previous work shows that RNN models (especially Long Short-Term Memory (LSTM) based models) could learn to exploit the underlying tree structure. However, its performance consistently lags behind that of tree-based models. This...
Preprint
In this work, we propose a novel method for training neural networks to perform single-document extractive summarization without heuristically-generated extractive labels. We call our approach BanditSum as it treats extractive summarization as a contextual bandit (CB) problem, where the model receives a document to summarize (the context), and choo...
Preprint
Full-text available
In this work, we propose a novel constituency parsing scheme. The model predicts a vector of real-valued scalars, named syntactic distances, for each split position in the input sentence. The syntactic distances specify the order in which the split points will be selected, recursively partitioning the input, in a top-down fashion. Compared to tradi...
Article
Full-text available
Learning distributed sentence representations remains an interesting problem in the field of Natural Language Processing (NLP). We want to learn a model that approximates the conditional latent space over the representations of a logical antecedent of the given statement. In our paper, we propose an approach to generating sentences, conditioned on...
Article
Full-text available
We propose a neural language model capable of unsupervised syntactic structure induction. The model leverages the structure information to form better semantic representations and better language modeling. Standard recurrent neural networks are limited by their structure and fail to efficiently use syntactic information. On the other hand, tree-str...
Conference Paper
Full-text available
Recently, variants of neural networks for computational linguistics have been proposed and successfully applied to neural language modeling and neural machine translation. These neural models can leverage knowledge from massive corpora but they are extremely slow as they predict candidate words from a large vocabulary during training and inference....
Article
Full-text available
We propose a new self-organizing hierarchical softmax formulation for neural-network-based language models over large vocabularies. Instead of using a predefined hierarchical structure, our approach is capable of learning word clusters with clear syntactical and semantic meaning during the language model training process. We provide experiments on...
Article
Full-text available
With the development of community based question answering (Q\&A) services, a large scale of Q\&A archives have been accumulated and are an important information and knowledge resource on the web. Question and answer matching has been attached much importance to for its ability to reuse knowledge stored in these systems: it can be useful in enhanci...
Article
Community-based Question Answering (CQA) has become popular in knowledge sharing sites since it allows users to get answers to complex, detailed, and personal questions directly from other users. Large archives of historical questions and associated answers have been accumulated. Retrieving relevant historical answers that best match a question is...
Conference Paper
Feature learning plays an important role in many machine learning tasks. As a common implementation for feature learning, the auto-encoder has shown excellent performance. However it also faces several challenges among which a notable one is how to reduce its generalization error. Different approaches have been proposed to solve this problem and Ba...

Network

Cited By

Projects

Projects (4)
Archived project