Conference Paper

Practical Application of Associative Classifier for Document Classification.

DOI: 10.1007/11562382_36 Conference: Information Retrieval Technology, Second Asia Information Retrieval Symposium, AIRS 2005, Jeju Island, Korea, October 13-15, 2005, Proceedings
Source: DBLP


In practical text classiflcation tasks, the ability to interpret the classiflcation result is as important as the ability to classify exactly. The associative classifler has favorable characteristics, rapid training, good classiflcation accuracy, and excellent interpretation. However, the associative classifler has some obstacles to overcome when it is applied in the area of text classiflcation. First of all, the training process of the as- sociative classifler produces a huge amount of classiflcation rules, which makes the prediction for a new document inefiective. We resolve this by pruning the rules according to their contribution to correct classiflca- tions. In addition, since the target text collection generally has a high dimension, the training process might take a very long time. We propose mutual information between the word and class variables as a feature selection measure to reduce the space dimension. Experimental classi- flcation results using the 20-newsgroups dataset show many beneflts of the associative classiflcation in both training and predicting.

Download full-text


Available from: Gary Geunbae Lee, Feb 10, 2014
7 Reads
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Data pre-processing is an important topic in Text Classification (TC). It aims to convert the original textual data in a data-mining-ready structure, where the most significant text-features that serve to differentiate between text- categories are identified. Broadly speaking, textual data pre-processing techniques can be divided into three groups: (i) linguistic, (ii) statistical, and (iii) hybrid (i) & (ii). With regard to language-independent TC, our study relates to the statistical aspect only. The nature of textual data pre-processing includes: Document-base Representation (DR) and Feature Selection (FS). In this paper, we propose a hybrid statistical FS approach that integrates two existing (statistical FS) techniques, DIAAF (Darmstadt Indexing Approach Association Factor) and GSSC (Galavotti! Sebastiani! Simi Coefficient). Our proposed approach is presented under a statistical "bag of phrases" DR setting. The experimental results, based on the well-established associative text classification approach, demonstrate that our proposed technique outperforms existing mechanisms with respect to the accuracy of classification.
    Advanced Data Mining and Applications, 5th International Conference, ADMA 2009, Beijing, China, August 17-19, 2009. Proceedings; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, data mining is used to analyze the differentiation of mammalian Mesenchymal Stem Cells (MSCs). A database comprising the key parameters which, we believe, influence the destiny of mammalian MSCs has been constructed. This paper introduces Classification Association Rule Mining (CARM) as a data mining technique in the domain of tissue engineering and initiates a new promising research field. The experimental results show that the proposed approach performs well with respect to the accuracy of (classification) prediction. Moreover, it was found that some rules mined from the constructed MSC database are meaningful and useful.
    Advances in Data Mining. Applications and Theoretical Aspects, 9th Industrial Conference, ICDM 2009, Leipzig, Germany, July 20-22, 2009. Proceedings; 01/2009
Show more