Action classification on product manifolds.
ABSTRACT Videos can be naturally represented as multidimensional arrays known as tensors. However, the geometry of the tensor space is often ignored. In this paper, we argue that the underlying geometry of the tensor space is an important property for action classification. We characterize a tensor as a point on a product manifold and perform classification on this space. First, we factorize a tensor relating to each order using a modified High Order Singular Value Decomposition (HOSVD). We recognize each factorized space as a Grassmann manifold. Consequently, a tensor is mapped to a point on a product manifold and the geodesic distance on a product manifold is computed for tensor classification. We assess the proposed method using two public video databases, namely Cambridge-Gesture gesture and KTH human action data sets. Experimental results reveal that the proposed method performs very well on these data sets. In addition, our method is generic in the sense that no prior training is needed.
- SourceAvailable from: Lorenzo Baraldi[Show abstract] [Hide abstract]
ABSTRACT: We present a novel method for monocular hand gesture recognition in ego-vision scenarios that deals with static and dynamic gestures and can achieve high accuracy results using a few positive samples. Specifically, we use and extend the dense trajectories approach that has been successfully introduced for action recognition. Dense features are extracted around regions selected by a new hand segmentation technique that integrates superpixel classification, temporal and spatial coherence. We extensively testour gesture recognition and segmentation algorithms on public datasets and propose a new dataset shot with a wearable camera. In addition, we demonstrate that our solution can work in near real-time on a wearable device.IEEE Computer Vision and Pattern Recognition (CVPR) Embedded Vision Workshop (EVW); 01/2014
- [Show abstract] [Hide abstract]
ABSTRACT: A human action recognition framework is proposed which models motion variations corresponding to a particular class of actions without the need for sequence length normalization. The motion descriptors used in this framework are based on the optical flow vectors computed at every point on the silhouette of the human body. Histogram of flow(HOF) is computed from the optical flow vectors and these give the motion orientation in a local neighborhood. To get a relationship between the motion vectors at a particular instant, the magnitude and direction of the optical flow vector are coded with local binary patterns(LBP). The concatenation of these histograms(HOF-LBP) are considered as the action feature set to be used in the proposed framework. We illustrate that this motion descriptor is suitable for classifying various human actions when used in conjunction with the proposed action recognition framework which models the motion variations in time for each class using regression based techniques. The feature vectors extracted from the training set are suitably mapped to a lower dimensional space using Empirical Orthogonal Functional Analysis. A regression based technique such as Generalized Regression Neural Networks(GRNN), are used to compute the functional mapping from the action feature vectors to its reduced Eigenspace representation for each class, thereby obtaining separate action manifolds. The feature set obtained from a test sequence are compared with each of the action manifolds by comparing the test coefficients with the ones corresponding to the manifold (as estimated by GRNN) to determine the class using Mahalanobis distance.Proceedings of the 2013 IEEE International Conference on Systems, Man, and Cybernetics; 10/2013
- [Show abstract] [Hide abstract]
ABSTRACT: Given a finite set of subspaces of RnRn, perhaps of differing dimensions, we describe a flag of vector spaces (i.e. a nested sequence of vector spaces) that best represents the collection based on a natural optimization criterion and we present an algorithm for its computation. The utility of this flag representation lies in its ability to represent a collection of subspaces of differing dimensions. When the set of subspaces all have the same dimension d, the flag mean is related to several commonly used subspace representations. For instance, the d-dimensional subspace in the flag corresponds to the extrinsic manifold mean. When the set of subspaces is both well clustered and equidimensional of dimension d, then the d-dimensional component of the flag provides an approximation to the Karcher mean. An intermediate matrix used to construct the flag can also be used to recover the canonical components at the heart of Multiset Canonical Correlation Analysis. Two examples utilizing the Carnegie Mellon University Pose, Illumination, and Expression Database (CMU-PIE) serve as visual illustrations of the algorithm.Linear Algebra and its Applications 06/2014; 451:15–32. · 0.97 Impact Factor