Conference Proceeding

An empirical comparison of fast and efficient tools for mining textual data

06/2010; In proceeding of: ISCSE 2010, 1st International Symposium on Computing in Science and Engineering, At Kusadasi, Aydin, Turkey

ABSTRACT In order to effectively manage and retrieve the information comprised in vast amount of text documents,
powerful text mining tools and techniques are essential. In this paper we evaluate and compare two state-of-the-art data mining tools for clustering high-dimensional text data, Cluto and Gmeans. Several experiments were conducted on three benchmark datasets, and results are analysed in terms of clustering quality, memory and CPU time consumption. We empirically show that Gmeans offers high scalability by sacrificing clustering quality while Cluto presents better clustering quality at the expense of memory and CPU time.

0 0
 · 
1 Bookmark
 · 
68 Views

Full-text

View
71 Downloads
Available from
29 May 2012

Keywords

benchmark datasets
 
clustering high-dimensional text data
 
clustering quality
 
Cluto presents
 
CPU time
 
CPU time consumption
 
powerful text mining tools
 
sacrificing clustering quality
 
scalability
 
state-of-the-art data mining tools
 
text documents
 
vast amount