Conference Paper

Recognizing Human Actions from Still Images with Latent Poses

DOI: 10.1109/CVPR.2010.5539879 Conference: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010
Source: DBLP


We consider the problem of recognizing human actions from still images. We propose a novel approach that treats the pose of the person in the image as latent variables that will help with recognition. Different from other work that learns separate systems for pose estimation and action recognition, then combines them in an ad-hoc fashion, our system is trained in an integrated fashion that jointly considers poses and actions. Our learning objective is designed to directly exploit the pose information for action recognition. Our experimental results demonstrate that by inferring the latent poses, we can improve the final action recognition results.

15 Reads
  • Source
    • "To recognize action classes from still images accurately, researchers tend to integrate it with the task of pose estimation [7], [8]. In the integrated framework, these two tasks can help each other. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Recognizing actions from still images is popularly studied recently. In this paper, we model an action class as a flexible number of spatial configurations of body parts by proposing a new spatial SPN (Sum-Product Networks). First, we discover a set of parts in image collections via unsupervised learning. Then, our new spatial SPN is applied to model the spatial relationship and also the high-order correlations of parts. To learn robust networks, we further develop a hierarchical spatial SPN method, which models pairwise spatial relationship between parts inside sub-images and models the correlation of sub-images via extra layers of SPN. Our method is shown to be effective on two benchmark datasets.
  • Source
    • "These approaches have reported very interesting results on challenging still image action datasets such as those described in [7] [8] [9]. At the opposite end of the spectrum are approaches based on the explicit recovery of body parts and the incorporation of structural information in the recognition process [10] [11]. The baseline model is a latent part-based model akin to Pictorial Structure which can be estimated as a joint, conditional or max-margin model [12] [13]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Action recognition from still images is an important task of computer vision applications such as image annotation, robotic navigation, video surveillance and several others. Existing approaches mainly rely on either bag-of-feature representations or articulated body-part models. However, the relationship between the action and the image segments is still substantially unexplored. For this reason, in this paper we propose to approach action recognition by leveraging an intermediate layer of "superpixels" whose latent classes can act as attributes of the action. In the proposed approach, the action class is predicted by a structural model(learnt by Latent Structural SVM) based on measurements from the image superpixels and their latent classes. Experimental results over the challenging Stanford 40 Actions dataset report a significant average accuracy of 74.06% for the positive class and 88.50% for the negative class, giving evidence to the performance of the proposed approach.
  • Source
    • "Also, unlike our dataset, action recognition datasets are usually comprised of short videos that precisely encapsulate the action of interest. Activity recognition works can be categorized in recognition from still images [10] [11] [12] and videos [13]. They can also be divided to context [9] [12] or motion based methods [4, 14–16]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Various sports video genre categorization methods are proposed recently, mainly focusing on professional sports videos captured for TV broadcasting. This paper aims to categorize sports videos in the wild, captured using mobile phones by people watching a game or practicing a sport. Thus, no assumption is made about video production practices or existence of field lining and equipment. Motivated by distinctiveness of motions in sports activities, we propose a novel motion trajectory descriptor to effectively and efficiently represent a video. Furthermore, temporal analysis of local descriptors is proposed to integrate the categorization decision over time. Experiments on a newly collected dataset of amateur sports videos in the wild demonstrate that our trajectory descriptor is superior for sports videos categorization and temporal analysis improves the categorization accuracy further.
    the IEEE International Conference on Image Processing (ICIP 2014),, Paris, France; 10/2014
Show more


15 Reads
Available from