Conference Paper

Action Classification on Product Manifolds

DOI: 10.1109/CVPR.2010.5540131 Conference: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010
Source: DBLP

ABSTRACT Videos can be naturally represented as multidimensional arrays known as tensors. However, the geometry of the tensor space is often ignored. In this paper, we argue that the underlying geometry of the tensor space is an important property for action classification. We characterize a tensor as a point on a product manifold and perform classification on this space. First, we factorize a tensor relating to each order using a modified High Order Singular Value Decomposition (HOSVD). We recognize each factorized space as a Grassmann manifold. Consequently, a tensor is mapped to a point on a product manifold and the geodesic distance on a product manifold is computed for tensor classification. We assess the proposed method using two public video databases, namely Cambridge-Gesture gesture and KTH human action data sets. Experimental results reveal that the proposed method performs very well on these data sets. In addition, our method is generic in the sense that no prior training is needed.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Human behavior recognition is one important task of image processing and surveillance system. One main challenge of human behavior recognition is how to effectively model behaviors on condition of unconstrained videos due to tremendous variations from camera motion, background clutter, object appearance and so on. In this paper, we propose two novel Multi-Feature Hierarchical Latent Dirichlet Allocation models for human behavior recognition by extending the bag-of-word topic models such as the Latent Dirichlet Allocation model and the Multi-Modal Latent Dirichlet Allocation model. The two proposed models with three hierarchies including low-level visual features, feature topics, and behavior topics can effectively fuse two different types of features including motion and static visual features, avoid detecting or tracking the motion objects, and improve the recognition performance even if the features are extracted with a great amount of noise. Finally, we adopt the variational EM algorithm to learn the parameters of these models. Experiments on the You Tube dataset demonstrate the effectiveness of our proposed models.
    Sciece China. Information Sciences 09/2014; 57(9):1-15. DOI:10.1007/s11432-013-4794-9 · 0.70 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we consider the action recognition problem based on geometrical structure. Our method uses a low dimensional structure on the Grassmannian manifold to represent video sequences, by utilizing the linear structure of the tangent space. This approach can be divided into a training (off-line computing) stage and testing (on-line computing) stage, and makes the recognition algorithm scalable to large data sets. We test the proposed method on several benchmark data sets. The result shows that the new approach takes less computation compared to previous work based on the same geometrical assumption, and has similar or even higher recognition accuracy.
    2013 28th International Conference of Image and Vision Computing New Zealand (IVCNZ); 11/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Statistical classification of actions in videos is mostly performed by extracting relevant features, particularly covariance features, from image frames and studying time series associated with temporal evolutions of these features. A natural mathematical representation of activity videos is in form of parameterized trajectories on the covariance manifold, i.e. the set of symmetric, positive-definite matrices (SPDMs). The variable execution-rates of actions implies variable parameterizations of the resulting trajectories, and complicates their classification. Since action classes are invariant to execution rates, one requires rate-invariant metrics for comparing trajectories. A recent paper represented trajectories using their transported square-root vector fields (TSRVFs), defined by parallel translating scaled-velocity vectors of trajectories to a reference tangent space on the manifold. To avoid arbitrariness of selecting the reference and to reduce distortion introduced during this mapping, we develop a purely intrinsic approach where SPDM trajectories are represented by redefining their TSRVFs at the starting points of the trajectories, and analyzed as elements of a vector bundle on the manifold. Using a natural Riemannain metric on vector bundles of SPDMs, we compute geodesic paths and geodesic distances between trajectories in the quotient space of this vector bundle, with respect to the re-parameterization group. This makes the resulting comparison of trajectories invariant to their re-parameterization. We demonstrate this framework on two applications involving video classification: visual speech recognition or lip-reading and hand-gesture recognition. In both cases we achieve results either comparable to or better than the current literature.