Conference Paper

Recognizing human actions from still images with latent poses.

DOI: 10.1109/CVPR.2010.5539879 Conference: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010
Source: DBLP

ABSTRACT We consider the problem of recognizing human actions from still images. We propose a novel approach that treats the pose of the person in the image as latent variables that will help with recognition. Different from other work that learns separate systems for pose estimation and action recognition, then combines them in an ad-hoc fashion, our system is trained in an integrated fashion that jointly considers poses and actions. Our learning objective is designed to directly exploit the pose information for action recognition. Our experimental results demonstrate that by inferring the latent poses, we can improve the final action recognition results.

  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a supervised dictionary learning algorithm for action recognition in still images followed by a discriminative weighting model. The dictionary is learned based on Local Fisher Discrimination which takes into account the local manifold structure and discrimination information of local descriptors. The label information of local descriptors is considered in both dictionary learning and sparse coding stage which generates a supervised sparse coding algorithm and makes the coding coefficients discriminative. Instead of using spatial pyramid features, sliding window-based features with max-pooling are computed from coding coefficients. And then a discriminative weighting model combining a max-margin classifier is proposed using the features. Both the weighting coefficients and model parameters can be jointly learned using the same way in Multiple Kernel Learning algorithm. We validate our model on the following action recognition datasets: Willow 7 human actions dataset, People Playing Music Instrument (PPMI) dataset, and Sports dataset. To show the generality of our model, we also validate it on Scene15 dataset. The experiment results show that only with single scale local descriptors, our algorithm is comparable to some state-of-the-art algorithms.
    Neurocomputing 02/2015; DOI:10.1016/j.neucom.2015.01.024 · 2.01 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Human activity recognition is one of the most challenging problems that has received considerable attention from the computer vision community in recent years. Its applications are diverse, spanning from its use in activity understanding for intelligent surveillance systems to improving human-computer interactions. The goal of human activity recognition is to automatically recognize ongoing activities from an unknown video (i.e. a sequence of image frames). The challenges in solving this problem are multi-fold due to the complexity of human motions, the spatial and temporal variations exhibited due to differences in duration of different activities performed, the changing spatial characteristics of the human form, and the contextual information in performing each activity. A number of approaches have been proposed to address these challenges over the past few years by trying to design effective, compact descriptors for human activity encoding activity characteristics with context; however the mechanisms for incorporating them are not unique. In this dissertation, I present efficient techniques to handle learning and recognizing human activities. The primary goal of this research is to design compact but rich descriptors along with effective algorithms that can generally accommodate useful activity representation in a way of recognizing a single human activity or a collective activity in a crowded scene. For single human activity recognition, I introduce the subject-centric descriptors incorporating of both local and global representations that provide robustness against noise, partial occlusion, and invariance to changes in image scales. For collective activity recognition, I present context-based descriptors that efficiently encode human activity characteristic with contextual information leading to improve methods for analyzing group activities in a crowded scene. My results focus on recognizing single human activity and collective activity in a crowded scene. I show how efficient of my proposed descriptors in encoding human activity to be made on several public datasets. Moreover, I show how to incorporate contextual information to human activity characteristic in analyzing human activities in a crowded scene.
    07/2013, Degree: Ph.D., Supervisor: Shishir K. Shah;Ioannis A. Kakadiaris
  • IEEE Transactions on Circuits and Systems for Video Technology 01/2015; DOI:10.1109/TCSVT.2015.2397200 · 2.26 Impact Factor


Available from