Conference Paper

Action Classification on Product Manifolds

DOI: 10.1109/CVPR.2010.5540131 Conference: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010
Source: DBLP


Videos can be naturally represented as multidimensional arrays known as tensors. However, the geometry of the tensor space is often ignored. In this paper, we argue that the underlying geometry of the tensor space is an important property for action classification. We characterize a tensor as a point on a product manifold and perform classification on this space. First, we factorize a tensor relating to each order using a modified High Order Singular Value Decomposition (HOSVD). We recognize each factorized space as a Grassmann manifold. Consequently, a tensor is mapped to a point on a product manifold and the geodesic distance on a product manifold is computed for tensor classification. We assess the proposed method using two public video databases, namely Cambridge-Gesture gesture and KTH human action data sets. Experimental results reveal that the proposed method performs very well on these data sets. In addition, our method is generic in the sense that no prior training is needed.

14 Reads
  • Source
    • "To preserve characteristics of individual submanifolds, factor-dependent submanifolds are learned and local coordinates are aligned for a joint parameter space [31]. Product manifold embedding techniques are applied for action recognition [32] and temporal motion sequence analysis [33]. In human motion analysis from video data, manifold learning with spatial and temporal constraints is applied for cyclic motion using a multiple kernel learning framework [ 34]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The problem we address in this paper is how to learn joint representation from data lying on multiple manifolds. We are given multiple data sets, and there is an underlying common manifold among the different data sets. Each data set is considered to be an instance of this common manifold. The goal is to achieve an embedding of all the points on all the manifolds in a way that preserves the local structure of each manifold and that, at the same time, collapses all the different manifolds into one man-ifold in the embedding space while preserving the implicit correspondences between the points across different data sets. We propose a framework to learn embedding of such data, which can preserve the intra-manifolds' local geometric structure and the inter-manifolds' correspondence structure. The proposed solution works as extensions to current state-of-the-art spectral-embedding approaches to handling multiple mani-folds.
    Pattern Recognition 08/2015; DOI:10.1016/j.patcog.2015.08.024 · 3.10 Impact Factor
  • Source
    • "Method Set1 Set2 Set3 Set4 Overall TCCA [16] 0.81 0.81 0.78 0.86 0.82 PM [14] 0.89 0.86 0.89 0.87 0.88 TB [15] 0.93 0.88 0.90 0.91 0.91 Cov3D [13] 0.92 0.94 0.94 0.93 0.93 Our method 0.92 0.93 0.97 0.95 0.94 (a) Dislike gesture. (b) Point gesture. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We introduce a novel approach to cultural heritage experience: by means of ego-vision embedded devices we develop a system, which offers a more natural and entertaining way of accessing museum knowledge. Our method is based on distributed self-gesture and artwork recognition, and does not need fixed cameras nor radio-frequency identifications sensors. We propose the use of dense trajectories sampled around the hand region to perform self-gesture recognition, understanding the way a user naturally interacts with an artwork, and demonstrate that our approach can benefit from distributed training. We test our algorithms on publicly available data sets and we extend our experiments to both virtual and real museum scenarios, where our method shows robustness when challenged with real-world data. Furthermore, we run an extensive performance analysis on our ARM-based wearable device.
    IEEE Sensors Journal 05/2015; 15(5):1-1. DOI:10.1109/JSEN.2015.2411994 · 1.76 Impact Factor
  • Source
    • "81% 81% 78% 86 % - RLPP [5] 86% 86% 85% 88 % - PM 1-NN [29] 89% 86% 89% 87 % - PMLSR [28] 93% 89% 91% 94 % - Our before alignment 94% 91% 90% 88% 77% after alignment 99% 97% 97% 96% 98% Improvement 5% 6% 7% 8% 21% "
    [Show abstract] [Hide abstract]
    ABSTRACT: Statistical classification of actions in videos is mostly performed by extracting relevant features, particularly covariance features, from image frames and studying time series associated with temporal evolutions of these features. A natural mathematical representation of activity videos is in form of parameterized trajectories on the covariance manifold, i.e. the set of symmetric, positive-definite matrices (SPDMs). The variable execution-rates of actions implies variable parameterizations of the resulting trajectories, and complicates their classification. Since action classes are invariant to execution rates, one requires rate-invariant metrics for comparing trajectories. A recent paper represented trajectories using their transported square-root vector fields (TSRVFs), defined by parallel translating scaled-velocity vectors of trajectories to a reference tangent space on the manifold. To avoid arbitrariness of selecting the reference and to reduce distortion introduced during this mapping, we develop a purely intrinsic approach where SPDM trajectories are represented by redefining their TSRVFs at the starting points of the trajectories, and analyzed as elements of a vector bundle on the manifold. Using a natural Riemannain metric on vector bundles of SPDMs, we compute geodesic paths and geodesic distances between trajectories in the quotient space of this vector bundle, with respect to the re-parameterization group. This makes the resulting comparison of trajectories invariant to their re-parameterization. We demonstrate this framework on two applications involving video classification: visual speech recognition or lip-reading and hand-gesture recognition. In both cases we achieve results either comparable to or better than the current literature.
Show more