Conference Proceeding

Recognizing human actions from still images with latent poses.

Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 01/2010; DOI:10.1109/CVPR.2010.5539879 In proceeding of: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010
Source: DBLP

ABSTRACT We consider the problem of recognizing human actions from still images. We propose a novel approach that treats the pose of the person in the image as latent variables that will help with recognition. Different from other work that learns separate systems for pose estimation and action recognition, then combines them in an ad-hoc fashion, our system is trained in an integrated fashion that jointly considers poses and actions. Our learning objective is designed to directly exploit the pose information for action recognition. Our experimental results demonstrate that by inferring the latent poses, we can improve the final action recognition results.

0 0
 · 
0 Bookmarks
 · 
66 Views
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: We present a distributed representation of pose and appearance of people called the “poselet activation vector”. First we show that this representation can be used to estimate the pose of people defined by the 3D orientations of the head and torso in the challenging PASCAL VOC 2010 person detection dataset. Our method is robust to clutter, aspect and viewpoint variation and works even when body parts like faces and limbs are occluded or hard to localize. We combine this representation with other sources of information like interaction with objects and other people in the image and use it for action recognition. We report competitive results on the PASCAL VOC 2010 static image action classification challenge.
    The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 20-25 June 2011; 01/2011
  • [show abstract] [hide abstract]
    ABSTRACT: Human upper body pose estimation plays a key role in applications related to human-computer interactions. We propose to develop Human upper body pose estimation plays a key role in applications related to human-computer interactions. We propose to develop an avatar based video conferencing system where a user’s avatar is animated following his/her gestures. Tracking gestures an avatar based video conferencing system where a user’s avatar is animated following his/her gestures. Tracking gestures calls for human pose estimation through image based measurements. Our work is motivated by the pictorial structures approach calls for human pose estimation through image based measurements. Our work is motivated by the pictorial structures approach and we use a 2D model as a collection of rectangular body parts. Stochastic search iterations are used to estimate the angles and we use a 2D model as a collection of rectangular body parts. Stochastic search iterations are used to estimate the angles between these body parts through Orientation Similarity Maximization (OSiMa) along the outline of the body model. The proposed between these body parts through Orientation Similarity Maximization (OSiMa) along the outline of the body model. The proposed approach is validated on human upper body images with varying levels of background clutter and has shown (near) accurate pose approach is validated on human upper body images with varying levels of background clutter and has shown (near) accurate pose estimation results in real time. estimation results in real time.
    06/2011: pages 200-205;
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: In this paper, we present a pose based approach for locating and recognizing human actions in videos. In our method, human poses are detected and represented based on deformable part model. To our knowledge, this is the first work on exploring the effectiveness of deformable part models in combining human detection and pose estimation into action recognition. Comparing with previous methods, ours have three main advantages. First, our method does not rely on any assumption on video preprocessing quality, such as satisfactory foreground segmentation or reliable tracking; Second, we propose a novel compact representation for human pose which works together with human detection and can well represent the spatial and temporal structures inside an action; Third, with human detection taken into consideration in our framework, our method has the ability to locate and recognize multiple actions in the same scene. Experiments on benchmark datasets and recorded cluttered videos verified the efficacy of our method.
    Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on; 07/2011

Full-text

View
0 Downloads
Available from