Conference Paper

Practical Application of Associative Classifier for Document Classification.

DOI: 10.1007/11562382_36 Conference: Information Retrieval Technology, Second Asia Information Retrieval Symposium, AIRS 2005, Jeju Island, Korea, October 13-15, 2005, Proceedings
Source: DBLP

ABSTRACT In practical text classiflcation tasks, the ability to interpret the classiflcation result is as important as the ability to classify exactly. The associative classifler has favorable characteristics, rapid training, good classiflcation accuracy, and excellent interpretation. However, the associative classifler has some obstacles to overcome when it is applied in the area of text classiflcation. First of all, the training process of the as- sociative classifler produces a huge amount of classiflcation rules, which makes the prediction for a new document inefiective. We resolve this by pruning the rules according to their contribution to correct classiflca- tions. In addition, since the target text collection generally has a high dimension, the training process might take a very long time. We propose mutual information between the word and class variables as a feature selection measure to reduce the space dimension. Experimental classi- flcation results using the 20-newsgroups dataset show many beneflts of the associative classiflcation in both training and predicting.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Textual Feature Selection (TFS) is an important phase in the process of text classification. It aims to identify the most significant textual features (i.e. key words and/or phrases), in a textual dataset, that serve to distinguish between text categories. In TFS, basic techniques can be divided into two groups: linguistic vs. statistical. For the purpose of building a language-independent text classifier, the study reported here is concerned with statistical TFS only. In this paper, we propose a novel statistical TFS approach that hybridizes the ideas of two existing techniques, DIAAF (Darmstadt Indexing Approach Association Factor) and RS (Relevancy Score). With respect to associative (text) classification, the experimental results demonstrate that the proposed approach can produce greater classification accuracy than other alternative approaches. KeywordsAssociative Classification-(Language-independent) Text Classification-Text Mining-Textual Feature Selection
    06/2010: pages 222-236;
  • Source
  • [Show abstract] [Hide abstract]
    ABSTRACT: Association rule mining and classification are important tasks in data mining. Using association rules has proved to be a good approach for classification. In this paper, we propose an accurate classifier based on class association rules (CARs), called CAR-IC, which introduces a new pruning strategy for mining CARs, which allows building specific rules with high confidence. Moreover, we propose and prove three propositions that support the use of a confidence threshold for computing rules that avoids ambiguity at the classification stage. This paper also presents a new way for ordering the set of CARs based on rule size and confidence. Finally, we define a new coverage strategy, which reduces the number of non-covered unseen-transactions during the classification stage. Results over several datasets show that CAR-IC beats the best classifiers based on CARs reported in the literature.
    Expert Systems with Applications 01/2012; 39(12):11203-11211. · 1.85 Impact Factor

Full-text (2 Sources)

Available from
Jun 4, 2014