Article

An Information-Theoretic Derivation of Min-Cut-Based Clustering

Dept. of Appl. Phys. & Appl. Math., Columbia Univ., New York, NY, USA
IEEE Transactions on Pattern Analysis and Machine Intelligence (Impact Factor: 5.69). 07/2010; 32(6):988 - 995. DOI: 10.1109/TPAMI.2009.124
Source: IEEE Xplore

ABSTRACT Min-cut clustering, based on minimizing one of two heuristic cost functions proposed by Shi and Malik nearly a decade ago, has spawned tremendous research, both analytic and algorithmic, in the graph partitioning and image segmentation communities over the last decade. It is, however, unclear if these heuristics can be derived from a more general principle, facilitating generalization to new problem settings. Motivated by an existing graph partitioning framework, we derive relationships between optimizing relevance information, as defined in the Information Bottleneck method, and the regularized cut in a K-partitioned graph. For fast-mixing graphs, we show that the cost functions introduced by Shi and Malik can be well approximated as the rate of loss of predictive information about the location of random walkers on the graph. For graphs drawn from a generative model designed to describe community structure, the optimal information-theoretic partition and the optimal min-cut partition are shown to be the same with high probability.

0 Followers
 · 
146 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Clustering is an effective technique in data mining to generate groups that are the matter of interest. Among various clustering approaches, the family of k-means algorithms and min-cut algorithms gain most popularity due to their simplicity and efficacy. The classical k-means algorithm partitions a number of data points into several subsets by iteratively updating the clustering centers and the associated data points. By contrast, a weighted undirected graph is constructed in min-cut algorithms which partition the vertices of the graph into two sets. However, existing clustering algorithms tend to cluster minority of data points into a subset, which shall be avoided when the target dataset is balanced. To achieve more accurate clustering for balanced dataset, we propose to leverage exclusive lasso on k-means and min-cut to regulate the balance degree of the clustering results. By optimizing our objective functions that build atop the exclusive lasso, we can make the clustering result as much balanced as possible. Extensive experiments on several large-scale datasets validate the advantage of the proposed algorithms compared to the state-of-the-art clustering algorithms.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Consider the problem of approximating a Markov chain by another Markov chain with a smaller state space that is obtained by partitioning the original state space. An information-theoretic cost function is proposed that is based on the relative entropy rate between the original Markov chain and a Markov chain defined by the partition. The state space aggregation problem can be sub-optimally solved by using the information bottleneck method.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present a method for reducing a regular, discrete-time Markov chain (DTMC) to another DTMC with a given, typically much smaller number of states. The cost of reduction is defined as the Kullback-Leibler divergence rate between a projection of the original process through a partition function and the a DTMC on the correspondingly partitioned state space. Finding the reduced model with minimal cost is computationally expensive, as it requires exhaustive search among all state space partitions, and exact evaluation of the reduction cost for each candidate partition. In our approach, we optimize an upper bound on the reduction cost instead of the exact cost; The proposed upper bound is easy to compute and it is tight in the case when the original chain is lumpable with respect to the partition. Then, we express the problem in form of information bottleneck optimization, and we propose the agglomerative information bottleneck algorithm for finding a locally optimal solution. The theory is illustrated with examples and one application scenario in the context of modeling bio-molecular interactions.
    IEEE Transactions on Automatic Control 04/2013; 60(4). DOI:10.1109/TAC.2014.2364971 · 3.17 Impact Factor

Preview (3 Sources)

Download
0 Downloads
Available from