A distribution based video representation for human action recognition
Most current research on human action recognition in videos uses the bag-of-words (BoW) representations based on vector quantization on local spatial temporal features, due to the simplicity and good performance of such representations. In contrast to the BoW schemes, this paper explores a localized, continuous and probabilistic video representation. Specifically, the proposed representation encodes the visual and motion information of an ensemble of local spatial temporal (ST) features of a video into a distribution estimated by a generative probabilistic model such as the Gaussian Mixture Model. Furthermore, this probabilistic video representation naturally gives rise to an information-theoretic distance metric of videos. This makes the representation readily applicable as input to most discriminative classifiers, such as the nearest neighbor schemes and the kernel methods. The experiments on two datasets, KTH and UCF sports, show that the proposed approach could deliver promising results.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.