Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Data augmentation has been an important ingredient for boosting performances of learned models. Prior data augmentation methods for few-shot text classification have led to great performance boosts. However, they have not been designed to capture the intricate compositional structure of natural language. As a result, they fail to generate samples with plausible and diverse sentence structures. Motivated by this, we present the data Augmentation using Lexicalized Probabilistic context-free grammars (ALP) that generates augmented samples with diverse syntactic structures with plausible grammar. The lexicalized PCFG parse trees consider both the constituents and dependencies to produce a syntactic frame that maximizes a variety of word choices in a syntactically preservable manner without specific domain experts. Experiments on few-shot text classification tasks demonstrate that ALP enhances many state-of-the-art classification methods. As a second contribution, we delve into the train-val splitting methodologies when a data augmentation method comes into play. We argue empirically that the traditional splitting of training and validation sets is sub-optimal compared to our novel augmentation-based splitting strategies that further expand the training split with the same number of labeled data. Taken together, our contributions on the data augmentation strategies yield a strong training recipe for few-shot text classification tasks.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Data Augmentation creates synthetic data from the existing data. Traditional data augmentation approaches focus on expanding lexical diversity (Wei and Zou, 2019;Feng et al., 2020;Ng et al., 2020) or syntax variation (Kim et al., 2022;Loem et al., 2022;Hussein et al., 2022;Wang et al., 2023a). Post selection or representative selection (Edwards et al., 2021) helps to prevent a waste of resources and time in generating new documents. ...
... In contrast to previous data augmentation approaches (Wei and Zou, 2019;Feng et al., 2020;Ng et al., 2020;Kim et al., 2022;Loem et al., 2022;Hussein et al., 2022;Wang et al., 2023a), we have improved upon the conventional conditional generation method by transitioning from random sampling to a targeted selection strategy. The targeted augmentation module serves as a mechanism to ensure diversity. ...
... However, traditional machine learning methods based on statistical features and deep learning methods driven by big data struggle with learning under few-shot conditions. Moreover, existing few-shot learning approaches based on transfer learning [2] and data augmentation [3], [4] have certain limitations in adapting to new scenarios and dealing with imbalanced data categories, small inter-class differences and the complexity of newly emerging unknown types. Meta-learning [5], [6] has advantages in few-shot traffic classification, such as strong generalization capabilities, low resource overhead, and ease of scenario transfer [7]- [10], but these methods lack the ability to detect unknown new types of data. ...
... End For 15: End For 16: For (x q , y q ) in D OOD Do 17: Calculate L out by using Eqn(7) on D OOD ; 18: L ← L + 1 NOOD L out ; 19: End For as representations of the classes. Subsequently, the distance metric is calculated according to Equation (3). The distance metric is used to compute the distance between each ID sample in the query set and their corresponding prototypes in the support set, which is then used to calculate the naive classification loss L 1 and the margin loss L in . ...
Article
Full-text available
Traffic classification has always been one of the important research directions in the field of cyber security. Achieving rapid traffic classification and detecting unknown traffic are critical for preventing network attacks, malicious software, transaction fraud, and other types of cyber security threats. However, most existing models are based on large-scale data and are unable to quickly learn and recognize unknown traffic. Some methods based on few-shot learning solve the problem of rapidly learning new types of traffic, but they cannot detect out-of-distribution samples. Based on this, this paper proposes a few-shot traffic multi-classification method that supports out-of-distribution detection, named SPN. It improves the performance by integrating twin networks into the meta-learning framework based on the idea of metric learning, and introduces margin loss to ensure detection performance. We conduct two types of experiments, and compare them with the relevant baseline methods, the results showthat SPN has excellent performance in implementing few-shot multi-classification and out-of-distribution detection, and performs well in intrusion detection.
... Tests on five datasets augmented with EDA using only 50% of the available training set could achieve the same accuracy as training with all available data. EDA is a simple method that is widely used as a baseline (Yoo et al., 2021;Kim et al., 2022;Anaby-Tavor et al., 2020). ...
... In future work, we will explore open-source models for text generation and investigate other prompting strategies. We also plan to devote time to investigating the development of Lexicalized Probabilistic context-free grammars (ALP) (Kim et al., 2022) for Portuguese. Finally, we plan to add other DA methods to our research, mainly based on unsupervised learning and conditioning training. ...
Article
Full-text available
Text classification is a very common and important task in Natural Language Processing. In many domains and real-world settings, a few labeled instances are the only resource available to train classifiers. Models trained on small datasets tend to overfit and produce inaccurate results – Data augmentation (DA) techniques come as an alternative to minimize this problem. DA generates synthetic instances that can be fed to the classification algorithm during training. In this article, we explore a variety of DA methods, including back translation, paraphrasing, and text generation. We assess the impact of the DA methods over simulated low-data scenarios using well-known public datasets in English with classifiers built fine-tuning BERT models. We describe the means to adapt these DA methods to augment a small Portuguese dataset containing tweets labeled with smart city dimensions (e.g., transportation, energy, water, etc.). Our experiments showed that some classes were noticeably improved by DA – with an improvement of 43% in terms of F1 compared to the baseline with no augmentation. In a qualitative analysis, we observed that the DA methods were able to preserve the label but failed to preserve the semantics in some cases and that generative models were able to produce high-quality synthetic instances.
... That is still an open challenge for object recognition due to a lack of training samples to tune the parameters of a learning model, which might result in a serious overfitting problem. Some learning techniques based on different theories have been proposed for tackling the problem based on different theories, such as transfer learning [5], meta-learning [6], metric learning [7], and data augmentation-based learning [8,9]; however, it is still ongoing. Many scholars have conducted relevant research in this field. ...
Article
Full-text available
Learning a deep model from small data is an opening and challenging problem. In high-dimensional spaces, few samples only occupy an extremely small portion of the space, often exhibiting sparsity issues. Classifying in this globally sparse sample space poses significant challenges. However, by using a single sample category as a reference object for comparing and recognizing other samples, it is possible to construct a local space. Conducting contrastive learning in this local space can overcome the sparsity issue of a few samples. Based on this insight, we proposed a novel deep learning approach named Local Contrast Learning (LCL). This is analogous to a key insight into human cognitive behavior, where humans identify the objects in a specific context by contrasting them with the objects in that context or from their memory. LCL is used to train a deep model that can contrast the recognized sample with a couple of contrastive samples that are randomly drawn and shuffled. On a one-shot classification task on Omniglot, the deep model-based LCL with 86 layers and 1.94 million parameters, which was trained on a tiny dataset with only 60 classes and 20 samples per class, achieved an accuracy of 98.95%. Furthermore, it achieved an accuracy of 99.24% at 156 classes and 20 samples per class. LCL is a fundamental idea that can be applied to alleviate the parametric model’s overfitting resulting from a lack of training samples.
... The diversity of data structures is increased by the use of data augmentation. In our research, we classify data enhancement technologies into three categories according to the direction of enhancing features: those utilizing unlabeled data [15][16][17], those employing data synthesis [18][19][20], and those focusing on feature enhancement [21]. However, increasing the generalization capabilities of few-shot learning models through data augmentation still presents challenges as the semantics of the synthesized training samples closely resemble those of the original samples. ...
Article
Full-text available
Text classification is a machine learning technique employed to assign a given text to predefined categories, facilitating the automatic analysis and processing of textual data. However, an important problem is that the number of new text categories is growing faster than that of human annotation data, which makes many new categories of text data lack a lot of annotation data. As a result, the conventional deep neural network is forced to over-fit, which damages the application in the real world. As a solution to this problem, academics recommend addressing data scarcity through few-shot learning. One of the efficient methods is prompt-tuning, which transforms the input text into a mask prediction problem featuring [MASK]. By utilizing descriptors, the model maps output words to labels, enabling accurate prediction. Nevertheless, the previous prompt-based adaption approaches often relied on manually produced verbalizers or a single label to represent the entire label vocabulary, which makes the mapping granularity low, resulting in words not being accurately mapped to their label. To address these issues, we propose to enhance the verbalizer and construct the refined external knowledge into a prompt-tuning (REKP) model. We employ the external knowledge bases to increase the mapping space of tagged terms and design three refinement methods to remove noise data. We conduct comprehensive experiments on four benchmark datasets, namely AG’s News, Yahoo, IMDB, and Amazon. The results demonstrate that REKP can outperform the state-of-the-art baselines in terms of Micro-F1 on knowledge-enhanced text classification. In addition, we conduct an ablation study to ascertain the functionality of each module in our model, revealing that the refinement module significantly contributes to enhancing classification accuracy.
... Wei et al. [35] increased the number of samples by modifying the raw text. Kim et al. [36] are concerned that existing data augmentation methods neglect to capture the structural information of language. They generated augmented instances with diverse syntactic structures with plausible grammar. ...
Article
Full-text available
Training a deep-learning text classification model usually requires a large amount of labeled data, yet labeling data are usually labor-intensive and time-consuming. Few-shot text classification focuses on predicting unknown samples using only a few labeled samples. Recently, metric-based meta-learning methods have achieved promising results in few-shot text classification. They use episodic training in labeled samples to enhance the model’s generalization ability. However, existing models only focus on learning from a few labeled samples but neglect to learn from a large number of unlabeled samples. In this paper, we exploit the knowledge learned by the model in unlabeled samples to improve the generalization performance of the meta-network. Specifically, we introduce a novel knowledge distillation method that expands and enriches the meta-learning representation with self-supervised information. Meanwhile, we design a graph aggregation method that efficiently interacts the query set information with the support set information in each task and outputs a more discriminative representation. We conducted experiments on three public few-shot text classification datasets. The experimental results show that our model performs better than the state-of-the-art models in 5-way 1-shot and 5-way 5-shot cases.
... There are several popular approaches to stabilize the self-training process, such as using sample selection (Mukherjee and Awadallah 2020;Sohn et al. 2020) and reweighting strategies (Zhou, Kantarcioglu, and Thuraisingham 2012;Wang et al. 2021) to filter noisy labels or designing noise-aware loss functions Yu et al. 2022a;Tsai, Lin, and Fu 2022) to improve the model's robustness against incorrectly labeled data. In addition, data augmentation methods (Kim et al. 2022a;Chen, Yang, and Yang 2020;Zhang, Yu, and Zhang 2020) are also combined with self-training to improve the model's generalization ability. ...
Article
Training deep neural networks (DNNs) with limited supervision has been a popular research topic as it can significantly alleviate the annotation burden. Self-training has been successfully applied in semi-supervised learning tasks, but one drawback of self-training is that it is vulnerable to the label noise from incorrect pseudo labels. Inspired by the fact that samples with similar labels tend to share similar representations, we develop a neighborhood-based sample selection approach to tackle the issue of noisy pseudo labels. We further stabilize self-training via aggregating the predictions from different rounds during sample selection. Experiments on eight tasks show that our proposed method outperforms the strongest self-training baseline with 1.83% and 2.51% performance gain for text and graph datasets on average. Our further analysis demonstrates that our proposed data selection strategy reduces the noise of pseudo labels by 36.8% and saves 57.3% of the time when compared with the best baseline. Our code and appendices will be uploaded to: https://github.com/ritaranx/NeST.
... The most intuitive solutions are data augmentation methods [13,14], such as easy data augmentation. Recently, Kim et al. [15] demonstrated a data augmentation approach using a lexicalized probabilistic context-free grammar that generates augmented samples with different syntactic structures and plausible grammar. Many works have also used adequate label information [16], demonstrating that we can utilize class-label information to extract more discriminative feature representations from input texts, achieving a performance upgrade when the samples are scarce. ...
Article
Full-text available
Few-Shot Text Classification (FSTC) is a fundamental natural language processing problem that aims to classify small amounts of text with high accuracy. Mainstream methods model the superficial statistical relationships between text and labels. However, distributional imbalance problems are encountered during few-shot learning; therefore, questions remain regarding its robustness and generalization. The above problems can be addressed by intrinsic causal mechanisms. We introduce a general structural causal model to formalize the FSTC problem. To extract causal associations from text and reconstruct information to achieve a better classification effect, we propose a causal representation for few-shot learning (CRFL) framework to force representations to be causally related. Our framework performs well when the number of training examples is small or when it generalizes to the data transfer situation. CRFL is orthogonal to the existing fine-tuning and few-shot meta-learning methods and can be applied to any task. Extensive experimental results obtained on several widely used datasets validate the effectiveness of our approach, which can be attributed to our model’s stability and logical reasoning.
... Based on above limitations and recent advances in NLP, there are three directions to improve current study. (1) Enhance the domain adaptability of 3DCPN with technologies such as adversarial training (Han et al., 2021;Wang et al., 2022) or data augmentation (Kim et al., 2022;Sun et al., 2021). (2) Design or introduce models that are better at handling long sequence, such as BiLSTM and Transformer (Vaswani et al., 2017), and prevent overfitting. ...
... There are several popular approaches to stabilize the self-training process, such as using sample selection (Mukherjee and Awadallah 2020;Sohn et al. 2020) and reweighting strategies (Zhou, Kantarcioglu, and Thuraisingham 2012;Wang et al. 2021) to filter noisy labels or designing noise-aware loss functions Yu et al. 2022a;Tsai, Lin, and Fu 2022) to improve the model's robustness against incorrectly labeled data. In addition, data augmentation methods (Kim et al. 2022a;Chen, Yang, and Yang 2020;Zhang, Yu, and Zhang 2020) are also combined with self-training to improve the model's generalization ability. ...
Preprint
Training deep neural networks (DNNs) with limited supervision has been a popular research topic as it can significantly alleviate the annotation burden. Self-training has been successfully applied in semi-supervised learning tasks, but one drawback of self-training is that it is vulnerable to the label noise from incorrect pseudo labels. Inspired by the fact that samples with similar labels tend to share similar representations, we develop a neighborhood-based sample selection approach to tackle the issue of noisy pseudo labels. We further stabilize self-training via aggregating the predictions from different rounds during sample selection. Experiments on eight tasks show that our proposed method outperforms the strongest self-training baseline with 1.83% and 2.51% performance gain for text and graph datasets on average. Our further analysis demonstrates that our proposed data selection strategy reduces the noise of pseudo labels by 36.8% and saves 57.3% of the time when compared with the best baseline. Our code and appendices will be uploaded to https://github.com/ritaranx/NeST.
... In the few-shot text classification task, Easy Data Augmentation (EDA) [12] combines phrase replacement, deletion, swapping, and insertion to achieve excellent performance on several single text classification tasks. Kim et al. [13] implemented data augmentation of text using lexicalized probabilistic context-free grammars, which consider both components and dependencies to produce a syntactic framework that can generate augmented samples with plausible grammars and different syntactic structures. FlipDA [14] used generated label-flipped text data to improve its generalization ability with respect to few-shot natural language understanding. ...
Article
Full-text available
In recent years, action recognition has become a subject of focus in the field of computer vision. Interest has emerged regarding the recognition of previously unseen classes given a few labeled examples; this is known as few-shot video recognition (F-SVR). However, it is particularly challenging to learn the class representation in this kind of setting. In response to this difficulty, we present an optimal transport distribution enhancement (OTDE) mechanism that enables networks to adaptively enhance the given support videos. Our main idea is to design an optimal transport method by using the base classes of data to calibrate the biased distribution of the support set in F-SVR and generate enhanced samples to better model the distribution of intra-class features and estimate the similarity between them in an accurate and robust manner. In addition, the proposed OTDE component is a simple yet flexible approach and is adaptable to multiple existing F-SVR frameworks. By adopting OTDE, our design brings substantial performance improvements to a variety of current works, achieving competitive results on the Kinetics, UCF101 and HMDB51 datasets under various evaluation settings.
Conference Paper
Full-text available
In the era of artificial intelligence, data is gold but costly to annotate. The paper demonstrates a groundbreaking solution to this dilemma using ChatGPT for text augmentation in sentiment analysis. We leverage ChatGPT's generative capabilities to create synthetic training data that significantly improves the performance of smaller models, making them competitive with, or even outperforming, their larger counterparts. This innovation enables models to be both efficient and effective, thereby reducing computational cost, inference time, and memory usage without compromising on quality. Our work marks a key advancement in the cost-effective development and deployment of robust sentiment analysis models.
Article
Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing a model's generalization capabilities, it can also address many other challenges and problems, from overcoming a limited amount of training data, to regularizing the objective, to limiting the amount data used to protect privacy. Based on a precise description of the goals and applications of data augmentation and a taxonomy for existing works, this survey is concerned with data augmentation methods for textual classification and aims to provide a concise and comprehensive overview for researchers and practitioners. Derived from the taxonomy, we divide more than 100 methods into 12 different groupings and give state-of-the-art references expounding which methods are highly promising by relating them to each other. Finally, research perspectives that may constitute a building block for future work are provided.
Article
Full-text available
Language model based pre-trained models such as BERT have provided significant gains across different NLP tasks. In this paper, we study different types of pre-trained transformer based models such as auto-regressive models (GPT-2), auto-encoder models (BERT), and seq2seq models (BART) for conditional data augmentation. We show that prepending the class labels to text sequences provides a simple yet effective way to condition the pre-trained models for data augmentation. On three classification benchmarks,pre-trained Seq2Seq model outperforms other models. Further, we explore how different pre-trained model based data augmentation differs in-terms of data diversity, and how well such methods preserve the class-label information.
Conference Paper
Full-text available
This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.
Article
Full-text available
In automatic speech recognition, language models can be represented by Probabilistic Context Free Grammars (PCFGs). In this lecture we review some known algorithms which handle PCFGs; in particular an algorithm for the computation of the total probability that a PCFG generates a given sentence (Inside), an algorithm for finding the most probable parse tree (Viterbi), and an algorithm for the estimation of the probabilities of the rewriting rules of a PCFG given a corpus (Inside-Outside). Moreover, we introduce the Left-to-Right Inside algorithm, which computes the probability that successive applications of the grammar rewriting rules (beginning with the sentence start symbol s) produce a word string whose initial substring is a given one.
Conference Paper
Full-text available
Unsupervised vector-based approaches to semantics can model rich lexical meanings, but they largely fail to capture sentiment information that is central to many word meanings and important for a wide range of NLP tasks. We present a model that uses a mix of unsupervised and supervised techniques to learn word vectors capturing semantic term--document information as well as rich sentiment content. The proposed model can leverage both continuous and multi-dimensional sentiment information as well as non-sentiment annotations. We instantiate the model to utilize the document-level sentiment polarity annotations present in many online documents (e.g. star ratings). We evaluate the model using small, widely used sentiment and subjectivity corpora and find it out-performs several previously introduced methods for sentiment classification. We also introduce a large dataset of movie reviews to serve as a more robust benchmark for work in this area.
Conference Paper
We introduce Texygen, a benchmarking platform to support research on open-domain text generation models. Texygen has not only implemented a majority of text generation models, but also covered a set of metrics that evaluate the diversity, the quality and the consistency of the generated texts. The Texygen platform could help standardize the research on text generation and improve the reproductivity and reliability of future research work in text generation.
Article
Using an entropy argument, it is shown that stochastic context-free grammars (SCFG's) can model sources with hidden branching processes more efficiently than stochastic regular grammars (or equivalently HMM's). However, the automatic estimation of SCFG's using the Inside-Outside algorithm is limited in practice by its O(n3) complexity. In this paper, a novel pre-training algorithm is described which can give significant computational savings. Also, the need for controlling the way that non-terminals are allocated to hidden processes is discussed and a solution is presented in the form of a grammar minimization procedure.
Conference Paper
Traditionally, text categorization has been studied as the prob- lem of training of a classifier using labeled data. However, people can categorize documents into named categories with- out any explicit training because we know the meaning of category names. In this paper, we introduce Dataless Clas- sification, a learning protocol that uses world knowledge to induce classifiers without the need for any labeled data. Lik e humans, a dataless classifier interprets a string of words as a set of semantic concepts. We propose a model for dataless classification and show that the label name alone is often suf - ficient to induce classifiers. Using Wikipedia as our source o f world knowledge, we get 85.29% accuracy on tasks from the 20 Newsgroup dataset and 88.62% accuracy on tasks from a Yahoo! Answers dataset without any labeled or unlabeled data from the datasets. With unlabeled data, we can further improve the results and show quite competitive performance to a supervised learning algorithm that uses 100 labeled ex- amples. Using a semantic knowledge source that is based on Wikipedia, we experimentally demonstrate that dataless classification can categorize text without any annotated da ta. We show the results of classification on the standard 20 Newsgroups dataset and a new Yahoo! Answers dataset. Without even looking at unlabeled instances, dataless clas- sification outperforms supervised methods. Moreover, when unlabeled instances are available during training, our meth- ods are comparable to the supervised methods that need 100 training examples. We can perform on the fly text categorization for previ- ously unseen labels, since our classifier was not trained on any particular labels. Furthermore, since we do not need previously labeled data, the model is not committed into any particular domain. Therefore, we can use dataless classific a- tion across different data sets. We experimentally show that our method works equally well across domains.
How Important is the Train-Validation Split in Meta-Learning?
  • Y Bai
  • M Chen
  • P Zhou
  • T Zhao
  • J D Lee
  • S M Kakade
  • H Wang
  • C Xiong
Bai, Y.; Chen, M.; Zhou, P.; Zhao, T.; Lee, J. D.; Kakade, S. M.; Wang, H.; and Xiong, C. 2021. How Important is the Train-Validation Split in Meta-Learning? In Proceedings of the 38th International Conference on Machine Learning, ICML, Proceedings of Machine Learning Research, 543-553. PMLR.
Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels
  • P Chen
  • B Liao
  • G Chen
  • S Zhang
Chen, P.; Liao, B.; Chen, G.; and Zhang, S. 2019. Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels. In Proceedings of the 36th International Conference on Machine Learning, ICML, 1062-1070. PMLR.
Hyperparameter optimization
  • M Feurer
  • F Hutter
  • Cham Springer
  • O Gencoglu
  • M Van Gils
  • E Guldogan
  • C Morikawa
  • M Süzen
  • M Gruber
  • J Leinonen
  • H Huttunen
Feurer, M.; and Hutter, F. 2019. Hyperparameter optimization. In Automated machine learning, 3-33. Springer, Cham. Gencoglu, O.; van Gils, M.; Guldogan, E.; Morikawa, C.; Süzen, M.; Gruber, M.; Leinonen, J.; and Huttunen, H.
DivideMix: Learning with Noisy Labels as Semi-supervised Learning
  • J Li
  • R Socher
  • S C H Hoi
Li, J.; Socher, R.; and Hoi, S. C. H. 2020. DivideMix: Learning with Noisy Labels as Semi-supervised Learning. In 8th International Conference on Learning Representations, ICLR.
Uncertaintyaware Self-training for Few-shot Text Classification
  • S Mukherjee
  • A H Awadallah
Mukherjee, S.; and Awadallah, A. H. 2020. Uncertaintyaware Self-training for Few-shot Text Classification. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS.
SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness
  • N Ng
  • K Cho
  • M Ghassemi
Ng, N.; Cho, K.; and Ghassemi, M. 2020. SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP, 1268-1283.
A Representation Learning Perspective on the Importance of Train-Validation Splitting in Meta-Learning
  • N Saunshi
  • A Gupta
  • W Hu
Saunshi, N.; Gupta, A.; and Hu, W. 2021. A Representation Learning Perspective on the Importance of Train-Validation Splitting in Meta-Learning. In International Conference on Machine Learning, 9333-9343. PMLR.
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
  • R Socher
  • A Perelygin
  • J Wu
  • J Chuang
  • C D Manning
  • A Ng
  • C Potts
Socher, R.; Perelygin, A.; Wu, J.; Chuang, J.; Manning, C. D.; Ng, A.; and Potts, C. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 1631-1642.
EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
  • J W Wei
  • K Zou
Wei, J. W.; and Zou, K. 2019. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP, 6381-6387. Association for Computational Linguistics.
Unsupervised Data Augmentation for Consistency Training
  • Q Xie
  • Z Dai
  • E H Hovy
  • T Luong
  • Q Le
Xie, Q.; Dai, Z.; Hovy, E. H.; Luong, T.; and Le, Q. 2020. Unsupervised Data Augmentation for Consistency Training. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems.
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods
  • D Yarowsky
Yarowsky, D. 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In 33rd Annual Meeting of the Association for Computational Linguistics, 189-196.
QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension
  • A W Yu
  • D Dohan
  • M Luong
  • R Zhao
  • K Chen
  • M Norouzi
  • Q V Le
Yu, A. W.; Dohan, D.; Luong, M.; Zhao, R.; Chen, K.; Norouzi, M.; and Le, Q. V. 2018. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension. In 6th International Conference on Learning Representations, ICLR 2018.
SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup
  • R Zhang
  • Y Yu
  • C Zhang
Zhang, R.; Yu, Y.; and Zhang, C. 2020. SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP, 8566-8579. Association for Computational Linguistics.