Action Classification on Product Manifolds
DOI: 10.1109/CVPR.2010.5540131 Conference: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010
Videos can be naturally represented as multidimensional arrays known as tensors. However, the geometry of the tensor space is often ignored. In this paper, we argue that the underlying geometry of the tensor space is an important property for action classification. We characterize a tensor as a point on a product manifold and perform classification on this space. First, we factorize a tensor relating to each order using a modified High Order Singular Value Decomposition (HOSVD). We recognize each factorized space as a Grassmann manifold. Consequently, a tensor is mapped to a point on a product manifold and the geodesic distance on a product manifold is computed for tensor classification. We assess the proposed method using two public video databases, namely Cambridge-Gesture gesture and KTH human action data sets. Experimental results reveal that the proposed method performs very well on these data sets. In addition, our method is generic in the sense that no prior training is needed.
Available from: Marwan Torki
- "To preserve characteristics of individual submanifolds, factor-dependent submanifolds are learned and local coordinates are aligned for a joint parameter space . Product manifold embedding techniques are applied for action recognition  and temporal motion sequence analysis . In human motion analysis from video data, manifold learning with spatial and temporal constraints is applied for cyclic motion using a multiple kernel learning framework [ 34]. "
[Show abstract] [Hide abstract]
ABSTRACT: The problem we address in this paper is how to learn joint representation from data lying on multiple manifolds. We are given multiple data sets, and there is an underlying common manifold among the different data sets. Each data set is considered to be an instance of this common manifold. The goal is to achieve an embedding of all the points on all the manifolds in a way that preserves the local structure of each manifold and that, at the same time, collapses all the different manifolds into one man-ifold in the embedding space while preserving the implicit correspondences between the points across different data sets. We propose a framework to learn embedding of such data, which can preserve the intra-manifolds' local geometric structure and the inter-manifolds' correspondence structure. The proposed solution works as extensions to current state-of-the-art spectral-embedding approaches to handling multiple mani-folds.
Available from: Lorenzo Baraldi
- "Method Set1 Set2 Set3 Set4 Overall TCCA  0.81 0.81 0.78 0.86 0.82 PM  0.89 0.86 0.89 0.87 0.88 TB  0.93 0.88 0.90 0.91 0.91 Cov3D  0.92 0.94 0.94 0.93 0.93 Our method 0.92 0.93 0.97 0.95 0.94 (a) Dislike gesture. (b) Point gesture. "
[Show abstract] [Hide abstract]
ABSTRACT: We introduce a novel approach to cultural heritage experience: by means of ego-vision embedded devices we develop a system, which offers a more natural and entertaining way of accessing museum knowledge. Our method is based on distributed self-gesture and artwork recognition, and does not need fixed cameras nor radio-frequency identifications sensors. We propose the use of dense trajectories sampled around the hand region to perform self-gesture recognition, understanding the way a user naturally interacts with an artwork, and demonstrate that our approach can benefit from distributed training. We test our algorithms on publicly available data sets and we extend our experiments to both virtual and real museum scenarios, where our method shows robustness when challenged with real-world data. Furthermore, we run an extensive performance analysis on our ARM-based wearable device.
Available from: Eric Klassen
- "81% 81% 78% 86 % - RLPP  86% 86% 85% 88 % - PM 1-NN  89% 86% 89% 87 % - PMLSR  93% 89% 91% 94 % - Our before alignment 94% 91% 90% 88% 77% after alignment 99% 97% 97% 96% 98% Improvement 5% 6% 7% 8% 21% "
[Show abstract] [Hide abstract]
ABSTRACT: Statistical classification of actions in videos is mostly performed by
extracting relevant features, particularly covariance features, from image
frames and studying time series associated with temporal evolutions of these
features. A natural mathematical representation of activity videos is in form
of parameterized trajectories on the covariance manifold, i.e. the set of
symmetric, positive-definite matrices (SPDMs). The variable execution-rates of
actions implies variable parameterizations of the resulting trajectories, and
complicates their classification. Since action classes are invariant to
execution rates, one requires rate-invariant metrics for comparing
trajectories. A recent paper represented trajectories using their transported
square-root vector fields (TSRVFs), defined by parallel translating
scaled-velocity vectors of trajectories to a reference tangent space on the
manifold. To avoid arbitrariness of selecting the reference and to reduce
distortion introduced during this mapping, we develop a purely intrinsic
approach where SPDM trajectories are represented by redefining their TSRVFs at
the starting points of the trajectories, and analyzed as elements of a vector
bundle on the manifold. Using a natural Riemannain metric on vector bundles of
SPDMs, we compute geodesic paths and geodesic distances between trajectories in
the quotient space of this vector bundle, with respect to the
re-parameterization group. This makes the resulting comparison of trajectories
invariant to their re-parameterization. We demonstrate this framework on two
applications involving video classification: visual speech recognition or
lip-reading and hand-gesture recognition. In both cases we achieve results
either comparable to or better than the current literature.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.