[Show abstract][Hide abstract] ABSTRACT: Partitional graph clustering algorithms like K-means and Star necessitate a priori decisions on the number of clusters and
threshold for the weight of edges to be considered, respectively. These decisions are difficult to make and their impact on
clustering performance is significant. We propose a family of algorithms for weighted graph clustering that neither requires
a predefined number of clusters, unlike K-means, nor a threshold for the weight of edges, unlike Star. To do so, we use re-assignment
of vertices as a halting criterion, as in K-means, and a metric for selecting clusters’ seeds, as in Star. Pictorially, the
algorithms’ strategy resembles the rippling of stones thrown in a pond, thus the name ’Ricochet’. We evaluate the performance
of our proposed algorithms using standard datasets and evaluate the impact of removing constraints by comparing the performance
of our algorithms with constrained algorithms: K-means and Star and unconstrained algorithm: Markov clustering.
Database Systems for Advanced Applications, 14th International Conference, DASFAA 2009, Brisbane, Australia, April 21-23, 2009. Proceedings; 01/2009
[Show abstract][Hide abstract] ABSTRACT: ABSTRACT Although PageRank has been designed to estimate thepopularity of Web pages, it is a general algorithm that can be applied to the analysis of other graphs other than one of hypertext documents. In this paper, we explore its application to sentiment analysis and opinion mining: i.e. the ranking of items based on user textual reviews. We first propose various techniques using collocation and pivot words to extract a weighted graph of terms from user reviews and to account for positive and negative opinions. We refer to this graph as the sentiment graph. Using PageRank and a very small set of adjectives (such as ‘good’, ‘excellent’, etc.) we rank the different
Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008, Napa Valley, California, USA, October 26-30, 2008; 01/2008
[Show abstract][Hide abstract] ABSTRACT: The Star algorithm is an effective and efficient algorithm for graph clustering. We propose a series of novel, yet simple,
metrics for the selection of Star centers in the Star algorithm and its variants. We empirically study the performance of
off-line, standard and extended, and on-line versions of the Star algorithm adapted to the various metrics and show that one
of the proposed metrics outperforms all others in both effectiveness and efficiency of clustering. We empirically study the
sensitivity of the metrics to the threshold value of the algorithm and show improvement with respect to this aspect too.
Database and Expert Systems Applications, 18th International Conference, DEXA 2007, Regensburg, Germany, September 3-7, 2007, Proceedings; 01/2007
[Show abstract][Hide abstract] ABSTRACT: Querying search engines with the keyword “jaguars” returns results as diverse as web sites about cars, computer games, attack planes, American football, and animals. More and more search engines offer options to organize query results by categories or, given a document, to return a list of links to topically related documents. While information retrieval traditionally defines similarity of documents in terms of contents, it seems natural to expect that the very structure of the Web carries important information about the topical similarity of documents. Here we study the role of a matrix constructed from weighted co-citations (documents referenced by the same document), weighted couplings (documents referencing the same document), incoming, and outgoing links for the clustering of documents on the Web. We present and discuss three methods of clustering based on this
matrix construction using three clustering algorithms, K-means, Markov and Maximum Spanning Tree, respectively. Our main contribution is a clustering technique based on the Maximum Spanning Tree technique and an evaluation of its effectiveness comparatively to the two most robust alternatives: K-means and Markov clustering.
International Journal of Web Information Systems 01/2006; 2:69-76.