Difference-Similitude Matrix in Text Classification

ChapterinLecture Notes in Computer Science 3614:487-487 · January 1970with10 Reads
Impact Factor: 0.51 · DOI: 10.1007/11540007_3 · Source: DBLP
In book: Fuzzy Systems and Knowledge Discovery, pp.487-487
Abstract

Text classification can greatly improve the performance of information retrieval and information filtering, but high dimensionality of documents baffles the applications of most classification approaches. This paper proposed a Difference-Similitude Matrix (DSM) based method to solve the problem. The method represents a pre-classified collection as an item-document matrix, in which documents in same categories are described with similarities while documents in different categories with differences. Using the DSM reduction algorithm, simpler and more efficient than rough set reduction, we reduced the dimensionality of document space and generated rules for text classification.