Conference Paper

Action classification on product manifolds.

DOI: 10.1109/CVPR.2010.5540131 Conference: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010
Source: DBLP

ABSTRACT Videos can be naturally represented as multidimensional arrays known as tensors. However, the geometry of the tensor space is often ignored. In this paper, we argue that the underlying geometry of the tensor space is an important property for action classification. We characterize a tensor as a point on a product manifold and perform classification on this space. First, we factorize a tensor relating to each order using a modified High Order Singular Value Decomposition (HOSVD). We recognize each factorized space as a Grassmann manifold. Consequently, a tensor is mapped to a point on a product manifold and the geodesic distance on a product manifold is computed for tensor classification. We assess the proposed method using two public video databases, namely Cambridge-Gesture gesture and KTH human action data sets. Experimental results reveal that the proposed method performs very well on these data sets. In addition, our method is generic in the sense that no prior training is needed.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a novel method for monocular hand gesture recognition in ego-vision scenarios that deals with static and dynamic gestures and can achieve high accuracy results using a few positive samples. Specifically, we use and extend the dense trajectories approach that has been successfully introduced for action recognition. Dense features are extracted around regions selected by a new hand segmentation technique that integrates superpixel classification, temporal and spatial coherence. We extensively testour gesture recognition and segmentation algorithms on public datasets and propose a new dataset shot with a wearable camera. In addition, we demonstrate that our solution can work in near real-time on a wearable device.
    IEEE Computer Vision and Pattern Recognition (CVPR) Embedded Vision Workshop (EVW); 01/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A human action recognition framework is proposed which models motion variations corresponding to a particular class of actions without the need for sequence length normalization. The motion descriptors used in this framework are based on the optical flow vectors computed at every point on the silhouette of the human body. Histogram of flow(HOF) is computed from the optical flow vectors and these give the motion orientation in a local neighborhood. To get a relationship between the motion vectors at a particular instant, the magnitude and direction of the optical flow vector are coded with local binary patterns(LBP). The concatenation of these histograms(HOF-LBP) are considered as the action feature set to be used in the proposed framework. We illustrate that this motion descriptor is suitable for classifying various human actions when used in conjunction with the proposed action recognition framework which models the motion variations in time for each class using regression based techniques. The feature vectors extracted from the training set are suitably mapped to a lower dimensional space using Empirical Orthogonal Functional Analysis. A regression based technique such as Generalized Regression Neural Networks(GRNN), are used to compute the functional mapping from the action feature vectors to its reduced Eigenspace representation for each class, thereby obtaining separate action manifolds. The feature set obtained from a test sequence are compared with each of the action manifolds by comparing the test coefficients with the ones corresponding to the manifold (as estimated by GRNN) to determine the class using Mahalanobis distance.
    Proceedings of the 2013 IEEE International Conference on Systems, Man, and Cybernetics; 10/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a novel approach for online action recognition. The action is represented in a low dimensional (15D) space using a covariance descriptor of shape and motion features - spatio-temporal coordinates and optical flow of pixels belonging to extracted silhouettes. We analyze the applicability of the descriptor for online scenarios where action classification is performed based on incomplete spatio-temporal volumes. In order to enable our online action classification algorithm to be applied in real time, we introduce two modifications, namely the incremental covariance update and the on demand nearest neighbor classification. In our experiments we use quality measures, such as latency, especially designed for the online scenario to report the algorithm’s performance. We evaluate the performance of our descriptor on standard, publicly available datasets for gesture recognition, namely the Cambridge-Gestures dataset and the ChaLearn One-Shot-Learning dataset and show that its performance is comparable to the state-of-the-art despite its relative simplicity. The evaluation on the UCF-101 action recognition dataset demonstrates that the descriptor is applicable in challenging unconstrained environments.
    Computer Vision and Image Understanding 08/2014; · 1.36 Impact Factor