Rare word filtering on the Amazon dataset, across various levels. Scores are the relative performance of each method over the no preprocessing baseline. Results are the average (and std) relative performance of the four models, across the five dataset seeds. Bold indicates statistical similarity to the best score, from a two-sample t-test with α = 0.05.

Rare word filtering on the Amazon dataset, across various levels. Scores are the relative performance of each method over the no preprocessing baseline. Results are the average (and std) relative performance of the four models, across the five dataset seeds. Bold indicates statistical similarity to the best score, from a two-sample t-test with α = 0.05.

Source publication
Preprint
Full-text available
Text classification is a significant branch of natural language processing, and has many applications including document classification and sentiment analysis. Unsurprisingly, those who do text classification are concerned with the run-time of their algorithms, many of which depend on the size of the corpus' vocabulary due to their bag-of-words rep...

Context in source publication

Context 1
... introduction to kernel and nearest-neighbor nonparametric regression. Tables 3 and 4 show the results of rare word filtering on the Amazon and AP News datasets. We filtered at levels corresponding to the geometric progression of values from 1 to half the size of the corpus (we refer to these as levels 1 to 9, with higher numbers being more filtered). ...

Similar publications

Conference Paper
Full-text available
The development of accurate sentiment analysis and aspect detection for the Bengali language is crucial due to the rise of Bengali language usage in digital media. Sentiment analysis and aspect detection are essential tasks in Natural Language Pro- cessing (NLP) as they allow us to extract meaningful information from textual data. In this study, we...
Preprint
Full-text available
We propose a new paradigm for zero-shot learners that is format agnostic, i.e., it is compatible with any format and applicable to a list of language tasks, such as text classification, commonsense reasoning, coreference resolution, and sentiment analysis. Zero-shot learning aims to train a model on a given task such that it can address new learnin...
Chapter
Full-text available
The initial stage in natural language processing is to break down the text into separate tokens. When the text corpus is huge, covering all words is inefficient regarding size of vocabulary. The effectiveness of a specific tokenization method varies on various factors, such as size of the data�set, the nature of the task, and the morphological co...
Article
Full-text available
The subject matter of this article revolves around the exploration of neural network architectures to enhance the accuracy of text classification, particularly within the realm of natural language processing. The significance of text classification has grown notably in recent years due to its pivotal role in various applications like sentiment anal...
Preprint
Full-text available
COVID-19 has produced significant fluctuations and impacts on the Chinese stock market, and the sentiment analysis of stock reviews is important for the study of economic recovery. Due to the lack of a large amount of labeled data in the existing Chinese stock review data, and the currently popular Bert model mostly failed to consider contextual in...