An Information-Theoretic Derivation of Min-Cut-Based Clustering

Dept. of Appl. Phys. & Appl. Math., Columbia Univ., New York, NY, USA
IEEE Transactions on Pattern Analysis and Machine Intelligence (Impact Factor: 5.69). 07/2010; DOI: 10.1109/TPAMI.2009.124
Source: IEEE Xplore

ABSTRACT Min-cut clustering, based on minimizing one of two heuristic cost functions proposed by Shi and Malik nearly a decade ago, has spawned tremendous research, both analytic and algorithmic, in the graph partitioning and image segmentation communities over the last decade. It is, however, unclear if these heuristics can be derived from a more general principle, facilitating generalization to new problem settings. Motivated by an existing graph partitioning framework, we derive relationships between optimizing relevance information, as defined in the Information Bottleneck method, and the regularized cut in a K-partitioned graph. For fast-mixing graphs, we show that the cost functions introduced by Shi and Malik can be well approximated as the rate of loss of predictive information about the location of random walkers on the graph. For graphs drawn from a generative model designed to describe community structure, the optimal information-theoretic partition and the optimal min-cut partition are shown to be the same with high probability.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Clustering is an effective technique in data mining to generate groups that are the matter of interest. Among various clustering approaches, the family of k-means algorithms and min-cut algorithms gain most popularity due to their simplicity and efficacy. The classical k-means algorithm partitions a number of data points into several subsets by iteratively updating the clustering centers and the associated data points. By contrast, a weighted undirected graph is constructed in min-cut algorithms which partition the vertices of the graph into two sets. However, existing clustering algorithms tend to cluster minority of data points into a subset, which shall be avoided when the target dataset is balanced. To achieve more accurate clustering for balanced dataset, we propose to leverage exclusive lasso on k-means and min-cut to regulate the balance degree of the clustering results. By optimizing our objective functions that build atop the exclusive lasso, we can make the clustering result as much balanced as possible. Extensive experiments on several large-scale datasets validate the advantage of the proposed algorithms compared to the state-of-the-art clustering algorithms.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Segmentation of remote sensing images is challenging task not only for the intrinsic complexity of imaged scenes but also for their multiple-scale interpretation. Hierarchical techniques, which provide a sequence of nested segmentation maps for the scene at different scales are therefore very promising. The Texture fragmentation and reconstruction technique (TFR) carries out a hierarchical image segmentation based mainly on textural image properties. In this work we consider its improved version, Recursive-TFR, based on recursive binary segmentation, assess its performance experimentally on a suitable segmentation benchmark, prove its potential for remore-sensing imagery and point out promising developments.
    Signal Image Technology and Internet Based Systems (SITIS), 2012 Eighth International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: It is realized that fixed thresholds mostly fail in two circumstances as they only search for a certain range of skin color: (i) any skin-like object may be classified as skin if skin-like colors belong to fixed threshold range; (ii) any true skin for different races may be mistakenly classified as non-skin if that skin colors do not belong to fixed threshold range. In this paper, graph cuts (GC) is first extended to skin color segmentation. Although its result is acceptable, a complex environment with skin-like objects or different skin colors or different lighting conditions often results in a partial success. It is also known that probability neural network (PNN) has the advantage of recognizing different skin colors in cluttered environments. Therefore, many images with skin-like objects or different skin colors or different lighting conditions are segmented by the proposed algorithm (i.e., the combination of GC algorithm and PNN classification with other functions, e.g., morphology filtering, labeling, area constraint). The compared results among GC algorithm, PNN classification, and the proposed algorithm are presented not only to verify the accurate segmentation of these images but also to reduce the computation time. Finally, the application to the classification of hand gestures in complex environment with different lighting conditions further confirms the effectiveness and efficiency of our method.
    Neural Processing Letters 02/2012; 37(1). DOI:10.1007/s11063-012-9275-4 · 1.24 Impact Factor

Preview (3 Sources)

Available from