Difference-Similitude Matrix in Text Classification

DOI: 10.1007/11540007_3
Source: DBLP


Text classification can greatly improve the performance of information retrieval and information filtering, but high dimensionality
of documents baffles the applications of most classification approaches. This paper proposed a Difference-Similitude Matrix
(DSM) based method to solve the problem. The method represents a pre-classified collection as an item-document matrix, in
which documents in same categories are described with similarities while documents in different categories with differences.
Using the DSM reduction algorithm, simpler and more efficient than rough set reduction, we reduced the dimensionality of document
space and generated rules for text classification.

5 Reads