Conference Paper

A distribution based video representation for human action recognition

Lab. of Adv. Comput. Res., Chinese Acad. of Sci., Beijing, China
DOI: 10.1109/ICME.2010.5582550 Conference: Multimedia and Expo (ICME), 2010 IEEE International Conference on
Source: IEEE Xplore

ABSTRACT Most current research on human action recognition in videos uses the bag-of-words (BoW) representations based on vector quantization on local spatial temporal features, due to the simplicity and good performance of such representations. In contrast to the BoW schemes, this paper explores a localized, continuous and probabilistic video representation. Specifically, the proposed representation encodes the visual and motion information of an ensemble of local spatial temporal (ST) features of a video into a distribution estimated by a generative probabilistic model such as the Gaussian Mixture Model. Furthermore, this probabilistic video representation naturally gives rise to an information-theoretic distance metric of videos. This makes the representation readily applicable as input to most discriminative classifiers, such as the nearest neighbor schemes and the kernel methods. The experiments on two datasets, KTH and UCF sports, show that the proposed approach could deliver promising results.

0 Followers
 · 
70 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In recent years, the bag-of-words (BoW) video representations have achieved promising results in human action recognition in videos. By vector quantizing local spatial temporal (ST) features, the BoW video representation brings in simplicity and efficiency, but limitations too. First, the discretization of feature space in BoW inevitably results in ambiguity and information loss in video representation. Second, there exists no universal codebook for BoW representation. The codebook needs to be re-built when video corpus is changed. To tackle these issues, this paper explores a localized, continuous and probabilistic video representation. Specifically, the proposed representation encodes the visual and motion information of an ensemble of local ST features of a video into a distribution estimated by a generative probabilistic model. Furthermore, the probabilistic video representation naturally gives rise to an information-theoretic distance metric of videos. This makes the representation readily applicable to most discriminative classifiers, such as the nearest neighbor schemes and the kernel based classifiers. Experiments on two datasets, KTH and UCF sports, show that the proposed approach could deliver promising results. KeywordsHuman action recognition–Probabilistic video representation–Information-theoretic video matching
    Multimedia Tools and Applications 06/2012; 58(3):1-23. DOI:10.1007/s11042-011-0748-7 · 1.06 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Human activity recognition is an important area of computer vision research.Its applications include surveillance systems, patient monitoring systems, and a variety of systems that involve interactions between persons and electronic devices such as human-computer interfaces. The goal of human activity recognition is to automatically analyze ongoing activities from an unknown video. This paper provides a detailed overview of recognition of human actions. We first define the Accumulated Motion Image (AMI) by using the technique of frame differencing. The Energy Histograms for horizontal and vertical directions are computed from AMI and those features are extracted for further process.The Discrete Fourier Transform is Computed from Energy Histograms and those features are also extracted. A trained Multi-class SVM(Support Vector Machine) is used to recognize the various actions from all these features. Public dataset is used for Evaluation.