Chapter

Cluster Based Text Classification Model

12/2010; DOI:10.1007/978-3-7091-0388-3_14 pp.265-283

ABSTRACT We propose a cluster based classification model for suspicious email detection and other text classification tasks. The text
classification tasks comprise many training examples that require a complex classification model. Using clusters for classification
makes the model simpler and increases the accuracy at the same time. The test example is classified using simpler and smaller
model. The training examples in a particular cluster share the common vocabulary. At the time of clustering, we do not take
into account the labels of the training examples. After the clusters have been created, the classifier is trained on each
cluster having reduced dimensionality and less number of examples. The experimental results show that the proposed model outperforms
the existing classification models for the task of suspicious email detection and topic categorization on the Reuters-21578
and 20 Newsgroups datasets. Our model also outperforms A Decision Cluster Classification (ADCC) and the Decision Cluster Forest
Classification (DCFC) models on the Reuters-21578 dataset.

0 0
 · 
0 Bookmarks
 · 
36 Views

Keywords

20 Newsgroups datasets
 
ADCC
 
classification model
 
clustering
 
clusters
 
common vocabulary
 
complex classification model
 
DCFC
 
Decision Cluster Classification
 
existing classification models
 
experimental results
 
increases
 
model simpler
 
particular cluster share
 
proposed model outperforms
 
simpler
 
suspicious email detection
 
test example
 
text classification tasks
 
training examples