Conference Paper

Practical Application of Associative Classifier for Document Classification.

DOI: 10.1007/11562382_36 Conference: Information Retrieval Technology, Second Asia Information Retrieval Symposium, AIRS 2005, Jeju Island, Korea, October 13-15, 2005, Proceedings
Source: DBLP


In practical text classiflcation tasks, the ability to interpret the classiflcation result is as important as the ability to classify exactly. The associative classifler has favorable characteristics, rapid training, good classiflcation accuracy, and excellent interpretation. However, the associative classifler has some obstacles to overcome when it is applied in the area of text classiflcation. First of all, the training process of the as- sociative classifler produces a huge amount of classiflcation rules, which makes the prediction for a new document inefiective. We resolve this by pruning the rules according to their contribution to correct classiflca- tions. In addition, since the target text collection generally has a high dimension, the training process might take a very long time. We propose mutual information between the word and class variables as a feature selection measure to reduce the space dimension. Experimental classi- flcation results using the 20-newsgroups dataset show many beneflts of the associative classiflcation in both training and predicting.

Download full-text


Available from: Gary Geunbae Lee, Feb 10, 2014
  • Source
    • "Associative classification has been used in different tasks, for example: text classification [9] [39], determination of DNA splice junction types [6], text segmentation [10], automatic image annotation [29], mammalian mesenchymal stem cell differentiation [34] and prediction of protein-protein interaction types [23], among others. Currently, all classifiers based on CARs use the Support and Confidence measures for computing and ordering the set of CARs [15] [21] [22] [30] [31] [38]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, an accurate classifier based on Class Association Rules (CARs), called CAR-NF, is proposed. CAR-NF introduces a new strategy for computing CARs, using the Netconf as measure of interest, that allows to prune the CAR search space for building specific rules with high Netconf. Moreover, we propose and prove a proposition that supports the use of a Netconf threshold value equal to 0.5 for mining the CARs. Additionally, a new way for ordering the set of CARs based on their rule sizes and Netconf values is introduced in CAR-NF. The ordering strategy together with the "Best K rules" satisfaction mechanism allows CAR-NF to have better accuracy than CBA, CMAR, CPAR, TFPC and HARMONY classifiers, the best classifiers based on CARs reported in the literature.
    Intelligent Data Analysis 01/2012; 16(1):49-68. DOI:10.3233/IDA-2011-0510 · 0.61 Impact Factor
  • Source

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Data pre-processing is an important topic in Text Classification (TC). It aims to convert the original textual data in a data-mining-ready structure, where the most significant text-features that serve to differentiate between text- categories are identified. Broadly speaking, textual data pre-processing techniques can be divided into three groups: (i) linguistic, (ii) statistical, and (iii) hybrid (i) & (ii). With regard to language-independent TC, our study relates to the statistical aspect only. The nature of textual data pre-processing includes: Document-base Representation (DR) and Feature Selection (FS). In this paper, we propose a hybrid statistical FS approach that integrates two existing (statistical FS) techniques, DIAAF (Darmstadt Indexing Approach Association Factor) and GSSC (Galavotti! Sebastiani! Simi Coefficient). Our proposed approach is presented under a statistical "bag of phrases" DR setting. The experimental results, based on the well-established associative text classification approach, demonstrate that our proposed technique outperforms existing mechanisms with respect to the accuracy of classification.
    Advanced Data Mining and Applications, 5th International Conference, ADMA 2009, Beijing, China, August 17-19, 2009. Proceedings; 01/2009
Show more