An Information-Theoretic Derivation of Min-Cut-Based Clustering

Dept. of Appl. Phys. & Appl. Math., Columbia Univ., New York, NY, USA
IEEE Transactions on Pattern Analysis and Machine Intelligence (Impact Factor: 5.78). 07/2010; 32(6):988 - 995. DOI: 10.1109/TPAMI.2009.124
Source: IEEE Xplore


Min-cut clustering, based on minimizing one of two heuristic cost functions proposed by Shi and Malik nearly a decade ago, has spawned tremendous research, both analytic and algorithmic, in the graph partitioning and image segmentation communities over the last decade. It is, however, unclear if these heuristics can be derived from a more general principle, facilitating generalization to new problem settings. Motivated by an existing graph partitioning framework, we derive relationships between optimizing relevance information, as defined in the Information Bottleneck method, and the regularized cut in a K-partitioned graph. For fast-mixing graphs, we show that the cost functions introduced by Shi and Malik can be well approximated as the rate of loss of predictive information about the location of random walkers on the graph. For graphs drawn from a generative model designed to describe community structure, the optimal information-theoretic partition and the optimal min-cut partition are shown to be the same with high probability.

7 Reads
  • Source
    • "Motivated by this fact, discriminative k-means (DKM) [15] is proposed to incorporate discriminative analysis and clustering into a single framework to formalize the clustering as a trace maximization problem. By contrast, the min-cut clustering is realized by constructing a weighted undirected graph and then partitioning its vertices into two sets so that the total weight of the set of edges with endpoints in different sets is minimized [16] [17]. Among several graph clustering methods, min-cut tends to provide more balanced clusters as compared to other graph clustering criterion. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Clustering is an effective technique in data mining to generate groups that are the matter of interest. Among various clustering approaches, the family of k-means algorithms and min-cut algorithms gain most popularity due to their simplicity and efficacy. The classical k-means algorithm partitions a number of data points into several subsets by iteratively updating the clustering centers and the associated data points. By contrast, a weighted undirected graph is constructed in min-cut algorithms which partition the vertices of the graph into two sets. However, existing clustering algorithms tend to cluster minority of data points into a subset, which shall be avoided when the target dataset is balanced. To achieve more accurate clustering for balanced dataset, we propose to leverage exclusive lasso on k-means and min-cut to regulate the balance degree of the clustering results. By optimizing our objective functions that build atop the exclusive lasso, we can make the clustering result as much balanced as possible. Extensive experiments on several large-scale datasets validate the advantage of the proposed algorithms compared to the state-of-the-art clustering algorithms.
  • Source
    • "State space aggregation has attracted much attention during the last years, e.g., in chemical reaction networks [14], control theory [24], or in [21], which used total variational distance for aggregation. Most relevant to our work are information-theoretic cost functions [2] [7] [11] [23] and information-theoretic graph clustering [4] [16] [20]. Partitioning the state space does not suffice for aggregation. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Consider the problem of approximating a Markov chain by another Markov chain with a smaller state space that is obtained by partitioning the original state space. An information-theoretic cost function is proposed that is based on the relative entropy rate between the original Markov chain and a Markov chain defined by the partition. The state space aggregation problem can be sub-optimally solved by using the information bottleneck method.
  • Source
    • "In works related to (graph) clustering, information-theoretic cost functions are often used for error quantification. In particular, in [10], the authors use the information bottleneck method for partitioning a graph via assuming continuous-time graph diffusion. Moreover, in [11] and [12] pairwise distance measures between data points were used to define a stationary Markov chain, whose statistics are then used for clustering the data points. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present a method for reducing a regular, discrete-time Markov chain (DTMC) to another DTMC with a given, typically much smaller number of states. The cost of reduction is defined as the Kullback-Leibler divergence rate between a projection of the original process through a partition function and the a DTMC on the correspondingly partitioned state space. Finding the reduced model with minimal cost is computationally expensive, as it requires exhaustive search among all state space partitions, and exact evaluation of the reduction cost for each candidate partition. In our approach, we optimize an upper bound on the reduction cost instead of the exact cost; The proposed upper bound is easy to compute and it is tight in the case when the original chain is lumpable with respect to the partition. Then, we express the problem in form of information bottleneck optimization, and we propose the agglomerative information bottleneck algorithm for finding a locally optimal solution. The theory is illustrated with examples and one application scenario in the context of modeling bio-molecular interactions.
    IEEE Transactions on Automatic Control 04/2013; 60(4). DOI:10.1109/TAC.2014.2364971 · 2.78 Impact Factor
Show more

Preview (3 Sources)

7 Reads
Available from