Difference-Similitude Matrix in Text Classification

School of Electronic Information, Wuhan University, Wu-han-shih, Hubei, China
DOI: 10.1007/11540007_3 In book: Fuzzy Systems and Knowledge Discovery, pp.487-487
Source: DBLP


Text classification can greatly improve the performance of information retrieval and information filtering, but high dimensionality
of documents baffles the applications of most classification approaches. This paper proposed a Difference-Similitude Matrix
(DSM) based method to solve the problem. The method represents a pre-classified collection as an item-document matrix, in
which documents in same categories are described with similarities while documents in different categories with differences.
Using the DSM reduction algorithm, simpler and more efficient than rough set reduction, we reduced the dimensionality of document
space and generated rules for text classification.

6 Reads