An empirical comparison of fast and efficient tools for mining textual data
ABSTRACT In order to effectively manage and retrieve the information comprised in vast amount of text documents,
powerful text mining tools and techniques are essential. In this paper we evaluate and compare two state-of-the-art data mining tools for clustering high-dimensional text data, Cluto and Gmeans. Several experiments were conducted on three benchmark datasets, and results are analysed in terms of clustering quality, memory and CPU time consumption. We empirically show that Gmeans offers high scalability by sacrificing clustering quality while Cluto presents better clustering quality at the expense of memory and CPU time.
Full-textDOI: · Available from: Volkan Tunalı, May 29, 2015
SourceAvailable from: Ramiz Aliguliyev[Show abstract] [Hide abstract]
ABSTRACT: Clustering algorithms are used to assess the interaction among documents by organizing documents into clusters such that document within a cluster are more similar to each other than are documents belonging to different clusters. Document clustering has been traditionally investigated as a means of improving the performance of search engines by pre-clustering the entire corpus, and a post-retrieval document browsing technique as well. It has long been studied as a post-retrieval document visualization technique. The purpose of present paper to show that assignment weight to documents improves clustering solution.Expert Systems with Applications 05/2009; 36(4):7904-7916. DOI:10.1016/j.eswa.2008.11.017 · 1.97 Impact Factor
05/2005; Addison Wesley.
Article: A Brief Survey of Text Mining