Conference Paper

Localizing and recognizing action unit using position information of local feature.

DOI: 10.1109/ICME.2009.5202573 In proceeding of: Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, ICME 2009, June 28 - July 2, 2009, New York City, NY, USA
Source: DBLP

ABSTRACT Action recognition has attracted much attention for human behavior analysis in recent years. Local spatial-temporal (ST) features are widely adopted in many works. However, most existing works which represent action video by histogram of ST words fail to have a deep insight into a fine structure of actions because of the local nature of these features. In this paper, we propose a novel method to simultaneously localize and recognize action units (AU) by regarding them as 3D (x,y,t) objects. Firstly, we record all of the local ST features in a codebook with the information of action class labels and relative positions to the respective AU centers. This simulates the probability distribution of class label and relative position in a non-parameter manner. When a novel video comes, we match its ST features to the codebook entries and cast votes for positions of its AU centers. And we utilize the localization result to recognize these AUs. The presented experiments on a public dataset demonstrate that our method performs well.

0 Bookmarks
 · 
59 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A common trend in object recognition is to detect and leverage the use of sparse, informative feature points. The use of such features makes the problem more manageable while providing increased robustness to noise and pose variation. In this work we develop an extension of these ideas to the spatio-temporal case. For this purpose, we show that the direct 3D counterparts to commonly used 2D interest point detectors are inadequate, and we propose an alternative. Anchoring off of these interest points, we devise a recognition algorithm based on spatio-temporally windowed data. We present recognition results on a variety of datasets including both human and rodent behavior.
    Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005. 2nd Joint IEEE International Workshop on; 11/2005
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Local space-time features capture local events in video and can be adapted to the size, the frequency and the velocity of moving patterns. In this paper, we demonstrate how such features can be used for recognizing complex motion patterns. We construct video representations in terms of local space-time features and integrate such representations with SVM classification schemes for recognition. For the purpose of evaluation we introduce a new video database containing 2391 sequences of six human actions performed by 25 people in four different scenarios. The presented results of action recognition justify the proposed method and demonstrate its advantage compared to other relative approaches for action recognition.
    Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on; 09/2004
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the activities occurring in the video. The analysis of human activities in videos is an area with increasingly important consequences from security and surveillance to entertainment and personal archiving. Several challenges at various levels of processing-robustness against errors in low-level processing, view and rate-invariant representations at midlevel processing and semantic representation of human activities at higher level processing-make this problem hard to solve. In this review paper, we present a comprehensive survey of efforts in the past couple of decades to address the problems of representation, recognition, and learning of human activities from video and related applications. We discuss the problem at two major levels of complexity: 1) "actions" and 2) "activities." "Actions" are characterized by simple motion patterns typically executed by a single human. "Activities" are more complex and involve coordinated actions among a small number of humans. We will discuss several approaches and classify them according to their ability to handle varying degrees of complexity as interpreted above. We begin with a discussion of approaches to model the simplest of action classes known as atomic or primitive actions that do not require sophisticated dynamical modeling. Then, methods to model actions with more complex dynamics are discussed. The discussion then leads naturally to methods for higher level representation of complex activities.
    IEEE Transactions on Circuits and Systems for Video Technology 12/2008; · 1.82 Impact Factor