Gaussian Process Dynamical Models for Human Motion

Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, Ontario M5S 2E4 Canada.
IEEE Transactions on Pattern Analysis and Machine Intelligence (Impact Factor: 5.78). 03/2008; 30(2):283-98. DOI: 10.1109/TPAMI.2007.1167
Source: PubMed


We introduce Gaussian process dynamical models (GPDM) for nonlinear time series analysis, with applications to learning models of human pose and motion from high-dimensionalmotion capture data. A GPDM is a latent variable model. It comprises a low-dimensional latent space with associated dynamics, and a map from the latent space to an observation space. We marginalize out the model parameters in closed-form, using Gaussian process priors for both the dynamics and the observation mappings. This results in a non-parametric model for dynamical systems that accounts for uncertainty in the model. We demonstrate the approach, and compare four learning algorithms on human motion capture data in which each pose is 50-dimensional. Despite the use of small data sets, the GPDM learns an effective representation of the nonlinear dynamics in these spaces.

Download full-text


Available from: David J. Fleet, Sep 03, 2015
  • Source
    • "A better solution is to use Gaussian Processes (GPs) which are non-linear, non-parametric models [7]. They have been successfully applied in various tasks including speech and music processing [8] [9] [10]. Previously, we have also used GPs for static music emotion recognition [11]. "

  • Source
    • "Motion generation Generation of naturalistic human motion using probabilistic models trained on motion capture data has previous been addressed in the context of computer graphics and machine learning. Prior work has tackled synthesis of stylized human motion using bilinear spatiotemporal basis models [1], Hidden Markov Models [3], linear dynamical systems [21], and Gaussian process latent variable models [46] [40], as well as multilinear variants thereof [12] [45]. Unlike methods based on Gaussian processes, we use a parametric representation and a simple, scalable supervised training method that makes it practical to train on large datasets. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture. The ERD model is a recurrent neural network that incorporates nonlinear encoder and decoder networks before and after recurrent layers. We test instantiations of ERD architectures in the tasks of motion capture (mocap) generation, body pose labeling and body pose forecasting in videos. Our model handles mocap training data across multiple subjects and activity domains, and synthesizes novel motions while avoid drifting for long periods of time. For human pose labeling, ERD outperforms a per frame body part detector by resolving left-right body part confusions. For video pose forecasting, ERD predicts body joint displacements across a temporal horizon of 400ms and outperforms a first order motion model based on optical flow. ERDs extend previous Long Short Term Memory (LSTM) models in the literature to jointly learn representations and their dynamics. Our experiments show such representation learning is crucial for both labeling and prediction in space-time. We find this is a distinguishing feature between the spatio-temporal visual domain in comparison to 1D text, speech or handwriting, where straightforward hard coded representations have shown excellent results when directly combined with recurrent units.
  • Source
    • "Gaussian process regression has been applied to trajectory analysis [11] and human motion modeling [24]. For the multi-object activity modeling, Loy et al. [12] formulated the non-linear relationships between decomposed image regions as a regression problem. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a hierarchical framework for detecting local and global anomalies via hierarchical feature representation and Gaussian process regression. While local anomaly is typically detected as a 3D pattern matching problem, we are more interested in global anomaly that involves multiple normal events interacting in an unusual manner such as car accident. To simultaneously detect local and global anomalies, we formulate the extraction of normal interactions from training video as the problem of efficiently finding the frequent geometric relations of the nearby sparse spatio-temporal interest points. A codebook of interaction templates is then constructed and modeled using Gaussian process regression. A novel inference method for computing the likelihood of an observed interaction is also proposed. As such, our model is robust to slight topo-logical deformations and can handle the noise and data un-balance problems in the training data. Simulations show that our system outperforms the main state-of-the-art methods on this topic and achieves at least 80% detection rates based on three challenging datasets.
    IEEE Conference on Computer Vision and Pattern Recognition; 06/2015
Show more