Conference Paper

Action Classification on Product Manifolds

DOI: 10.1109/CVPR.2010.5540131 Conference: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010
Source: DBLP

ABSTRACT Videos can be naturally represented as multidimensional arrays known as tensors. However, the geometry of the tensor space is often ignored. In this paper, we argue that the underlying geometry of the tensor space is an important property for action classification. We characterize a tensor as a point on a product manifold and perform classification on this space. First, we factorize a tensor relating to each order using a modified High Order Singular Value Decomposition (HOSVD). We recognize each factorized space as a Grassmann manifold. Consequently, a tensor is mapped to a point on a product manifold and the geodesic distance on a product manifold is computed for tensor classification. We assess the proposed method using two public video databases, namely Cambridge-Gesture gesture and KTH human action data sets. Experimental results reveal that the proposed method performs very well on these data sets. In addition, our method is generic in the sense that no prior training is needed.

  • Source
    • "Method Set1 Set2 Set3 Set4 Overall TCCA [16] 0.81 0.81 0.78 0.86 0.82 PM [14] 0.89 0.86 0.89 0.87 0.88 TB [15] 0.93 0.88 0.90 0.91 0.91 Cov3D [13] 0.92 0.94 0.94 0.93 0.93 Our method 0.92 0.93 0.97 0.95 0.94 (a) Dislike gesture. (b) Point gesture. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We introduce a novel approach to cultural heritage experience: by means of ego-vision embedded devices we develop a system, which offers a more natural and entertaining way of accessing museum knowledge. Our method is based on distributed self-gesture and artwork recognition, and does not need fixed cameras nor radio-frequency identifications sensors. We propose the use of dense trajectories sampled around the hand region to perform self-gesture recognition, understanding the way a user naturally interacts with an artwork, and demonstrate that our approach can benefit from distributed training. We test our algorithms on publicly available data sets and we extend our experiments to both virtual and real museum scenarios, where our method shows robustness when challenged with real-world data. Furthermore, we run an extensive performance analysis on our ARM-based wearable device.
    IEEE Sensors Journal 05/2015; 15(5):1-1. DOI:10.1109/JSEN.2015.2411994 · 1.85 Impact Factor
  • Source
    • "81% 81% 78% 86 % - RLPP [5] 86% 86% 85% 88 % - PM 1-NN [29] 89% 86% 89% 87 % - PMLSR [28] 93% 89% 91% 94 % - Our before alignment 94% 91% 90% 88% 77% after alignment 99% 97% 97% 96% 98% Improvement 5% 6% 7% 8% 21% "
    [Show abstract] [Hide abstract]
    ABSTRACT: Statistical classification of actions in videos is mostly performed by extracting relevant features, particularly covariance features, from image frames and studying time series associated with temporal evolutions of these features. A natural mathematical representation of activity videos is in form of parameterized trajectories on the covariance manifold, i.e. the set of symmetric, positive-definite matrices (SPDMs). The variable execution-rates of actions implies variable parameterizations of the resulting trajectories, and complicates their classification. Since action classes are invariant to execution rates, one requires rate-invariant metrics for comparing trajectories. A recent paper represented trajectories using their transported square-root vector fields (TSRVFs), defined by parallel translating scaled-velocity vectors of trajectories to a reference tangent space on the manifold. To avoid arbitrariness of selecting the reference and to reduce distortion introduced during this mapping, we develop a purely intrinsic approach where SPDM trajectories are represented by redefining their TSRVFs at the starting points of the trajectories, and analyzed as elements of a vector bundle on the manifold. Using a natural Riemannain metric on vector bundles of SPDMs, we compute geodesic paths and geodesic distances between trajectories in the quotient space of this vector bundle, with respect to the re-parameterization group. This makes the resulting comparison of trajectories invariant to their re-parameterization. We demonstrate this framework on two applications involving video classification: visual speech recognition or lip-reading and hand-gesture recognition. In both cases we achieve results either comparable to or better than the current literature.
  • Source
    • "This is in contrast with [13] which uses a much richer method of Canonical Correlation Analysis (CCA), to represent the statistics of the volume and [7] [19] who use more complex kinematic features based on optical flow. 3. Unlike [14] [13] we do not assume a fixed number of frames in the spatiotemporal volume. Also, our approach does not rely on extensive learning procedures applied, for example, in [21] [20] [13]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a novel approach for online action recognition. The action is represented in a low dimensional (15D) space using a covariance descriptor of shape and motion features - spatio-temporal coordinates and optical flow of pixels belonging to extracted silhouettes. We analyze the applicability of the descriptor for online scenarios where action classification is performed based on incomplete spatio-temporal volumes. In order to enable our online action classification algorithm to be applied in real time, we introduce two modifications, namely the incremental covariance update and the on demand nearest neighbor classification. In our experiments we use quality measures, such as latency, especially designed for the online scenario to report the algorithm’s performance. We evaluate the performance of our descriptor on standard, publicly available datasets for gesture recognition, namely the Cambridge-Gestures dataset and the ChaLearn One-Shot-Learning dataset and show that its performance is comparable to the state-of-the-art despite its relative simplicity. The evaluation on the UCF-101 action recognition dataset demonstrates that the descriptor is applicable in challenging unconstrained environments.
    Computer Vision and Image Understanding 08/2014; 129. DOI:10.1016/j.cviu.2014.08.001 · 1.36 Impact Factor
Show more