Conference Paper

View-Invariant Human Action Detection Using Component-Wise HMM of Body Parts.

DOI: 10.1007/978-3-540-70517-8_20 Conference: Articulated Motion and Deformable Objects, 5th International Conference, AMDO 2008, Port d'Andratx, Mallorca, Spain, July 9-11, 2008, Proceedings
Source: DBLP


This paper presents a framework for view-invariant action recognition in image sequences. Feature-based human detection becomes extremely chal- lenging when the agent is being observed from different viewpoints. Besides, similar actions, such as walking and jogging, are hardly distinguishable by con- sidering the human body as a whole. In this work, we have developed a system which detects human body parts under different views and recognize similar ac- tions by learning temporal changes of detected body part components. Firstly, human body part detection is achieved to find separately three components of the human body, namely the head, legs and arms. We incorporate a number of sub-classifiers, each for a specific range of view-point, to detect those body parts. Subsequently, we have extended this approach to distinguish and recognise ac- tions like walking and jogging based on component-wise HMM learning.

Download full-text


Available from: Jordi Gonzàlez, Mar 27, 2014
  • Source
    • "It is of significant interest in many applications, such as automated surveillance, aerial video analysis, sport video annotation and search. Various visual cues have been shown to be effective for representing human actions, including motion [8] [9], contours [3] [12], extremities [22], and body parts [5] [18], etc. Most of these features can be reliably extracted from image sequences of medium to high-resolution. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present a novel descriptor to characterize human action when it is being observed from a far field of view. Visual cues are usually sparse and vague under this scenario. An action sequence is divided into overlapped spatial-temporal volumes to make reliable and comprehensive use of the observed features. Within each volume, we represent successive poses by time series of Histogram of Oriented Gradients (HOG) and movements by time series of Histogram of Oriented Optical Flow (HOOF). Supervised Principle Component Analysis (SPCA) is applied to seek a subset of discriminantly informative principle components (PCs) to reduce the dimension of histogram vectors without loss of accuracy. The final action descriptor is formed by concatenating sequences of SPCA projected HOG and HOOF features. A Support Vector Machines (SVM) classifier is trained to perform action classification. We evaluated our algorithm by testing it on one normal resolution and two low-resolution datasets, and compared our results with those of other reported methods. By using less than 1/5 the dimension a full-length descriptor, our method is able to achieve perfect accuracy on two of the datasets, and perform comparably to other methods on the third dataset.
    Motion and Video Computing, 2009. WMVC '09. Workshop on; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In the perspective of activity recognition systems operating for long periods of time in environments susceptible of up-grades, one question that arises is how to exploit a-priori unknown newly discovered sensors for activity recognition. We present a methodology to exploit these unknown new sensors. This methodology uses sporadic interactions with primitive sensors together with behavioral assumptions to confer activity recognition capabilities to a newly discovered sensor. The behavioral assumptions are used to infer addi-tional information from the primitive sensors (e.g. simple reed switches) in ways that go beyond their initially foreseen function (e.g. detecting walking). We explain the method-ology on the example of learning how to use an unknown new on-body sensor to detect modes of locomotion, when the user spodarically interacts with instrumented furniture.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The ability to predict the intentions of people based solely on their visual actions is a skill only performed by humans and animals. This requires segmentation of items in the field of view, tracking of moving objects, identifying the importance of each object, determining the current role of each important object individually and in collaboration with other objects, relating these objects into a predefined scenario, assessing the selected scenario with the information retrieve, and finally adjusting the scenario to better fit the data. This is all accomplished with great accuracy in less than a few seconds. The intelligence of current computer algorithms has not reached this level of complexity with the accuracy and time constraints that humans and animals have, but there are several research efforts that are working towards this by identifying new algorithms for solving parts of this problem. This survey paper lists several of these efforts that rely mainly on understanding the image processing and classification of a limited number of actions. It divides the activities up into several groups and ends with a discussion of future needs. KeywordsVisual human action classification–Artificial intelligence–Hidden Markov Model–Grammars
    Artificial Intelligence Review 04/2011; 37(4):301-311. DOI:10.1007/s10462-011-9232-z · 2.11 Impact Factor
Show more