Joint Manifold Distance: a new approach to appearance based clustering

Source: CiteSeer


We wish to match sets of images to sets of images where both sets are undergoing various distortions such as viewpoint and lighting changes.

  • Source
    • "Particularly in image processing, manifold models have been seen to provide more compact and efficient signal representations as well as assisting the analysis of data. Manifold models for signals have recently been studied in several research areas such as dimensionality reduction [1], [2], image registration [3], [4] and transformation-invariant pattern classification [5]. A signal manifold in a high-dimensional signal space is a set of signals that can be mapped to a lower-dimensional parameter space. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Transformation-invariant analysis of signals often requires the computation of the distance from a test pattern to a transformation manifold. In particular, the estimation of the distances between a transformed query signal and several transformation manifolds representing different classes provides essential information for the classification of the signal. In many applications, the computation of the exact distance to the manifold is costly, whereas an efficient practical solution is the approximation of the manifold distance with the aid of a manifold grid. In this paper, we consider a setting with transformation manifolds of known parameterization. We first present an algorithm for the selection of samples from a single manifold that permits to minimize the average error in the manifold distance estimation. Then we propose a method for the joint discretization of multiple manifolds that represent different signal classes, where we optimize the transformation-invariant classification accuracy yielded by the discrete manifold representation. Experimental results show that sampling each manifold individually by minimizing the manifold distance estimation error outperforms baseline sampling solutions with respect to registration and classification accuracy. Performing an additional joint optimization on all samples improves the classification performance further. Moreover, given a fixed total number of samples to be selected from all manifolds, an asymmetric distribution of samples to different manifolds depending on their geometric structures may also increase the classification accuracy in comparison with the equal distribution of samples.
    Full-text · Article · Jan 2012 · IEEE Transactions on Image Processing
  • Source
    • "Existing classification methods using image sets differ in the ways in which they model the sets and compute distances between them. Fitzgibbon and Zisserman [10] (see also [3]) use image sets to recognize the principal characters in movies. They model faces detected in contiguous frames as affine subspaces in feature space, use Joint Manifold Distance (JMD) to measure distances between these, then apply a JMD-based clustering algorithm to discover the principal cast of the movie. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We introduce a novel method for face recognition from image sets. In our setting each test and training example is a set of images of an individual's face, not just a single image, so recognition decisions need to be based on comparisons of image sets. Methods for this have two main aspects: the models used to represent the individual image sets; and the similarity metric used to compare the models. Here, we represent images as points in a linear or affine feature space and characterize each image set by a convex geometric region (the affine or convex hull) spanned by its feature points. Set dissimilarity is measured by geometric distances (distances of closest approach) between convex models. To reduce the influence of outliers we use robust methods to discard input points that are far from the fitted model. The kernel trick allows the approach to be extended to implicit feature mappings, thus handling complex and nonlinear manifolds of face images. Experiments on two public face datasets show that our proposed methods outperform a number of existing state-of-the-art ones.
    Full-text · Conference Paper · Jul 2010
  • Source
    • "Therefore, like in television titles, same person's faces do not exist in narrow range in the feature space [12]. In [6], subspace is constructed not from whole faces of a person but from face sequences detected from successive frames. It clusters face sequences using a distance function that is invariant to affine transformations [5] to make it robust against transforms. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a new approach for clustering faces of characters in a recorded television title. The clustering results are used to catalog video clips based on subjects' faces for quick scene access. The main goal is to obtain a result for cataloging in tolerable wait- ing time after the recording, which is less than 3 minutes per hour of video clips. Although conventional face recognition-based clustering methods can obtain good results, they require considerable processing time. To enable high-speed processing, we use similarities of shots where the characters appear to estimate corresponding faces instead of calculating distance between each facial feature. Two similar shot-based clustering (SSC) methods are proposed. The first method only uses SSC and the second method uses face thumbnail clustering (FTC) as well. The experiment shows that the average processing time per hour of video clips was 350 ms and 31 seconds for SSC and SSC+FTC, respectively, despite the decrease in the aver- age number of different person's faces in a catalog being 6.0% and 0.9% compared to face recognition-based clustering.
    Preview · Article · Mar 2010 · Progress in Informatics
Show more