ArticlePDF Available

GP-LVM for Data Consolidation

Authors:

Abstract

Many machine learning task are involved with the transfer of information from one representation to a corresponding representation or tasks where several different observations represent the same underlying phenomenon. A classical algorithm for feature selection using information from multiple sources or representations is Canonical Correlation Analysis (CCA). In CCA the objective is to select features in each observation space that are maximally correlated compared to dimension- ality reduction where the objective is to re-represent the d ata in a more efficient form. We suggest a dimensionality reduction technique that builds on CCA. By extending the latent space with two additional spaces, each specific to a partition of the data, the model is capable of representing the full var iance of the data. In this paper we suggest a generative model for shared dimensionality reduction analogous to that of CCA.
GP-LVM for Data Consolidation
Carl Henrik Ek, Philip H. S. Torr and Neil D. Lawrence
Learning from Multiple Sources Workshop, NIPS 2008
December 14, 2008
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 1 / 25
Motivation: Static pose rotating 360
Data consists of actual pose and features derived from silhouette
(data artificially generated in Poser)
Visualization on the left from silhouette features. Visualization on the
right from pose features.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 2 / 25
Our Approach
Reduce dimensionality of the data.
INon linear dimensionality reduction.
IUnderlying assumption that data is really low dimensional e.g. a
prototype with non-linear distortions.
Fusion of different modalities.
IConcatanate data observations
IY= [y1. . . yN]T <N×DY(silhouette)
IZ= [z1. . . zN]T <N×DZ(pose).
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 3 / 25
Fusion of the Data
Assume data sets have intrinsic low dimensionality, X= [x1,...,xN]T
where xn <q,qDyand qDz.
yni =fY
i(xn) + Y
ni ,zni =fZ
i(xn) + Z
ni .
For Gaussian process priors over fY
i(·) and fZ
i(·) this is a shared
latent space variant of the GP-LVM (Shon et al., 2006; Ek et al., 2007;
Navaratnam et al., 2007).
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 4 / 25
Probabilistic CCA
If fi(·) are taken to be linear and
nN(0,C)
this model is probabilistic canonical correlates analysis (Bach and
Jordan, 2005).
For non-linear f·
i(·) with Gaussian process priors we have
GPLVM-CCA (Leen and Fyfe, 2006).
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 5 / 25
New Model
yni =fY
ixS
n,xY
n+Y
ni ,zni =fZ
ixS
n,xZ
n+Z
ni ,
The mappings are occurring from a latent space which is split into
three parts, XY=xY
nN
n=1,XZ=xZ
nN
n=1 and XS=xS
nN
n=1.
The1XYand XZtake the role of CZand CY.
1For linear mappings and qY=DY
1 and qZ=DZ
1 CCA is recovered.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 6 / 25
Non Linear CCA
Kernel-CCA (see e.g. Kuss and Graepel, 2003) implicitly assumes that
there is a smooth mapping from each of the data-spaces to a shared
latent space,
xs
ni =gY
i(yn) = gZ
i(zn).
We augment CCA to extract private spaces, XYand XZ.
To do this we make further assumption about the non-consolidating
subspaces,
xY
ni =hY
i(yn),xZ
ni =hZ
i(zn),
where hY
i(·) and hZ
i(·) are smooth functions.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 7 / 25
Initialize the GP-LVM
Spectral methods used to initialize the GP-LVM (Lawrence, 2005).
Harmeling (2007) observed that high quality embeddings are backed up
by high GP-LVM log likelihoods.
First step: apply kernel CCA to find shared sub-space.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 8 / 25
Canonical Correlates Analysis
Find linear transformations WYand WZmaximizing the correlation
between WYYand WZZ.
{ˆ
WY,ˆ
WZ}=argmax{WY,WZ}tr WT
YΣYZ WZ
s.t.tr WT
YΣYY WY=Itr WT
ZΣZZ WZ=I
the optima is found through an eigenvalue problem.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 9 / 25
Non Linear Canonical Correlates Analysis
We apply CCA in the dominant principal subspace of each feature
space instead of directly in the feature space (Kuss and Graepel, 2003).
Applying CCA recovers two sets of bases WYand WZexplaining the
correlated or shared variance between the two feature spaces.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 10 / 25
NCCA I
Need to describe private subspaces (XZ,XY).
Look for directions of maximum data variance that are orthogonal to
the canonical correlates.
Call the procedure non-consolidating components analysis (NCCA).
Seek the first direction v1of maximum variance orthogonal to W.
v1=argmaxv1vT
1Kv1
subject to: vT
1v1= 1 and vT
1W=0.
The optimal v1is found via an eigenvalue problem,
CWWTKv1=λ1v1.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 11 / 25
NCCA II
For successive directions further eigenvalue problems of the form
K WWT+
k1
X
i=1
vivT
i!K!vk=λkvk
need to be solved.
Embeddings then take form:
XS=1
2(WYFY+WZFZ) (1)
XY=VYFY;XZ=VZFZ,(2)
where FYand FZrepresent the kernel PCA representation of each
observation space.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 12 / 25
Initialization of a GP-LVM
Purely spectral algorithm: the optimization problems are convex and
they lead to unique solutions.
Spectral methods are less useful in “inquisition” of the model.
The pre-image problem means that handling missing data can be
rather involved (Sanguinetti and Lawrence, 2006).
Build Gaussian process mappings from the latent to the data space.
This results in a GP-LVM model.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 13 / 25
Inference I
Given a silhouette (y), we can find the corresponding xS
position.
The likelihood of different poses (z) can then be visualized in the
private space for the poses, xZ
.
Disambiguation (not dealt with here) can then be achieved through
e.g. temporal information.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 14 / 25
Motivation
x-axes are the shared space for the two models and the y-axes are the
private space for the silhouettes (left) and the pose (right). Shading is
from the GP-LVM likelihood.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 15 / 25
Toy Problem Result
Pose inference from silhouette using two different silhouettes from the
training data.
Left image: continuous leg ambiguity.
Right image: discrete leg ambiguity.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 16 / 25
Video
Silhouette video (if time!).
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 17 / 25
Experiments
A walking sequence from the HumanEva database (Sigal and Black,
2006).
IFour cycles in a circular walk.
IUse two for training and two for testing for the same subject.
IEach image is represented using a 100 dimensional integral HOG
descriptor (Zhu et al., 2006).
IRepresent the pose space as the sum of a MVU kernel (Weinberger
et al., 2004) applied to the full pose space and a linear kernel applied on
the local motion.
IRepresent the HOG features with an MVU kernel.
On HumanEva: one dimensional shared space explaining data
variance: 9% image space. 18% pose space.
To retain 95% of the total variance in each observation two
dimensions are needed for private spaces.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 18 / 25
Experiments
Computation time about 10 minutes on a Intel Core Duo with 1GB of
RAM.
Inference procedure using 20 nearest neighbor initializations per image
took a few seconds to compute.
Comparison with shared GPLVM.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 19 / 25
HumanEva Sequence Results
Top row: original test set image. Second row: visualisation of
ambiguities. Bottom row: pose from mode closest to ground truth.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 20 / 25
HumanEva Mode Exploration I
NCCA
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 21 / 25
HumanEva Mode Exploration I
Shared GP-LVM
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 21 / 25
HumanEva Mode Exploration II
NCCA
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 22 / 25
HumanEva Mode Exploration II
Shared GP-LVM
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 22 / 25
Discussion
Careful fusion of multimodal data at training stage allows for elegant
disambiguation when only part of the data is available at test time.
Further work:
IRefinement with GPLVM algorithm.
IDisambiguation with temporal information.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 23 / 25
References I
F. R. Bach and M. I. Jordan. A probabilistic interpretation of canonical
correlation analysis. Technical Report 688, Department of Statistics, University
of California, Berkeley, [PDF].
C. H. Ek, P. H. Torr, and N. D. Lawrence. Gaussian process latent variable
models for human pose estimation. In 4th Joint Workshop on Multimodal
Interaction and Related Machine Learning Algorithms (MLMI 2007), volume
LNCS 4892, pages 132–143, Brno, Czech Republic, Jun. 2007. Springer-Verlag.
S. Harmeling. Exploring model selection techniques for nonlinear dimensionality
reduction. Technical Report EDI-INF-RR-0960, University of Edinburgh, [PDF].
M. Kuss and T. Graepel. The geometry of kernel canonical correlation analysis.
Technical Report TR-108, Max Planck Institute for Biological Cybernetics,
T¨
ubingen, Germany, [PDF].
N. D. Lawrence. Probabilistic non-linear principal component analysis with
Gaussian Process latent variable models. J. Mach. Learn. Res., 6:1783–1816,
2005. ISSN 1533-7928. [URL].
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 24 / 25
References II
G. Leen and C. Fyfe. A Gaussian process latent variable model formulation of
canonical correlation analysis. Bruges (Belgium), 26-28 April 2006 2006.
[PDF].
R. Navaratnam, A. Fitzgibbon, and R. Cipolla. The joint manifold model. In IEEE
International Conference on Computer Vision (ICCV), 2007.
G. Sanguinetti and N. D. Lawrence. Missing data in kernel pca. In ECML,
Lecture Notes in Computer Science, Berlin, 2006. Springer-Verlag.
A. Shon, K. Grochow, A. Hertzmann, and R. Rao. Learning shared latent
structure for image synthesis and robotic imitation. Proc. NIPS, pages
1233–1240, 2006.
L. Sigal and M. Black. Humaneva: Synchronized video and motion capture
dataset for evaluation of articulated human motion. Brown Univertsity TR,
2006.
K. Weinberger, F. Sha, and L. Saul. Learning a kernel matrix for nonlinear
dimensionality reduction. ACM International Conference Proceeding Series,
2004.
Q. Zhu, S. Avidan, M. Yeh, and K. Cheng. Fast Human Detection Using a
Cascade of Histograms of Oriented Gradients. CVPR, 1(2):4, 2006.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 25 / 25
... In this paper we propose a probabilistic and nonlinear formulation of the IBFA model framed as a GP-LVM model. The proposed paper extends the shared multi-view approach proposed by Shon et al. (2006) using the factorized structure described by Ek et al. (2008b). The main contribution of the paper is a fully Bayesian treatment of the model which avoids ...
Article
Full-text available
Factor analysis aims to determine latent factors, or traits, which summarize a given data set. Inter-battery factor analysis extends this notion to multiple views of the data. In this paper we show how a nonlinear, nonparametric version of these models can be recovered through the Gaussian process latent variable model. This gives us a flexible formalism for multi-view learning where the latent variables can be used both for exploratory purposes and for learning representations that enable efficient inference for ambiguous estimation tasks. Learning is performed in a Bayesian manner through the formulation of a variational compression scheme which gives a rigorous lower bound on the log likelihood. Our Bayesian framework provides strong regularization during training, allowing the structure of the latent space to be determined efficiently and automatically. We demonstrate this by producing the first (to our knowledge) published results of learning from dozens of views, even when data is scarce. We further show experimental results on several different types of multi-view data sets and for different kinds of tasks, including exploratory data analysis, generation, ambiguity modelling through latent priors and classification.
Conference Paper
Full-text available
We investigate how to learn a kernel matrix for high dimensional data that lies on or near a low dimensional manifold. Noting that the kernel matrix implicitly maps the data into a nonlinear feature space, we show how to discover a mapping that "unfolds" the underlying manifold from which the data was sampled. The kernel matrix is constructed by maximizing the variance in feature space subject to local constraints that preserve the angles and distances between nearest neighbors. The main optimization involves an instance of semidefinite programming---a fundamentally different computation than previous algorithms for manifold learning, such as Isomap and locally linear embedding. The optimized kernels perform better than polynomial and Gaussian kernels for problems in manifold learning, but worse for problems in large margin classification. We explain these results in terms of the geometric properties of different kernels and comment on various interpretations of other manifold learning algorithms as kernel methods.
Conference Paper
Full-text available
We investigate a nonparametric model with which to vi- sualize the relationship between two datasets. We base our model on Gaussian Process Latent Variable Models (GPLVM)(1),(2), a probabilisti- cally defined latent variable model which takes the alternative approach of marginalizing the parameters and optimizing the latent variables; we optimize a latent variable set for each dataset, which preserves the corre- lations between the datasets, resulting in a GPLVM formulation of canon- ical correlation analysis which can be nonlinearised by choice of covariance function.
Conference Paper
Full-text available
We describe a method for recovering 3D human body pose from silhouettes. Our model is based on learning a latent space using the Gaussian Process Latent Variable Model (GP-LVM) [1] encapsulating both pose and silhouette features Our method is generative, this allows us to model the ambiguities of a silhouette representation in a principled way. We learn a dynamical model over the latent space which allows us to disambiguate between ambiguous silhouettes by temporal consistency. The model has only two free parameters and has several advantages over both regression approaches and other generative methods. In addition to the application shown in this paper the suggested model is easily extended to multiple observation spaces without constraints on type.
Conference Paper
Full-text available
We propose an algorithm that uses Gaussian process regression to learn common hidden structure shared between corresponding sets of het- erogenous observations. The observation spaces are linked via a single, reduced-dimensionality latent variable space. We present results from two datasets demonstrating the algorithms's ability to synthesize novel data from learned correspondences. We first show that the method can be used to learn the nonlinear mapping between corresponding views of objects, filling in missing data as needed to synthesize novel views. We then show that the method can be used to acquire a mapping between hu- man degrees of freedom and robotic degrees of freedom for a humanoid robot, allowing robotic imitation of human poses from motion capture data.
Article
Abstract While research on articulated human,motion and pose estimation has progressed rapidly in the last few years, there has been no systematic quantitative evaluation of competing methods to establish the current state of the art. Current algorithms make many,different choices about how to model the human body, how to exploit image evidence and how to approach the inference problem. We argue that there is a need for common,datasets that allow fair comparison between different methods,and,their design choices. Until recently gathering ground-truth data for evaluation of results (especially in 3D) was challenging. In this report we present a novel dataset obtained using a unique setup for capturing synchronized video and ground-truth 3D motion. Data was captured simultaneously using a calibrated marker-based motion capture system and multiple high-speed video capture systems. The video and motion capture streams were synchronized in software using a direct optimization method. The resulting HumanEvaI dataset contains multiple subjects performing a set of predefined actions with a number,of repetitions. On the order of 50,000 frames of synchronized motion capture and video was collected at 60 Hz with an additional 37,000 frames of pure motion capture data. The data is partitioned into training, validation, and testing sub-sets. A standard set of error metrics is defined that can be used for evaluation of both 2D and D pose estimation and tracking algorithms. Support software and an on-line evaluation system for quantifying results using the test data is being made,available to the community. This report provides an overview of the
Article
Nonlinear dimensionality reduction (NLDR) methods have become useful tools for practi-tioners who are faced with the analysis of high-dimensional data. Of course, not all NLDR methods are equally applicable to a particular dataset at hand. Thus it would be useful to come up with model selection criteria that help to choose among different NLDR algorithms. This paper explores various approaches to this problem and evaluates them on controlled data sets. Comprehensive experiments will show that model selection scores based on stability are not useful, while scores based on Gaussian processes are helpful for the NLDR problem.
Article
We give a probabilistic interpretation of canonical correlation (CCA) analysis as a latent variable model for two Gaussian random vectors. Our interpretation is similar to the proba-bilistic interpretation of principal component analysis (Tipping and Bishop, 1999, Roweis, 1998). In addition, we cast Fisher linear discriminant analysis (LDA) within the CCA framework.
Conference Paper
Kernel Principal Component Analysis (KPCA) is a widely used technique for visualisation and feature extraction. Despite its success and flexibility, the lack of a probabilistic interpretation means that some problems, such as handling missing or corrupted data, are very hard to deal with. In this paper we exploit the probabilistic interpretation of linear PCA together with recent results on latent variable models in Gaussian Processes in order to introduce an objective function for KPCA. This in turn allows a principled approach to the missing data problem. Furthermore, this new approach can be extended to reconstruct corrupted test data using fixed kernel feature extractors. The experimental results show strong improvements over widely used heuristics.
Article
Summarising a high dimensional data set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an over view of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA). The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be non- linearised through Gaussian processes. We refer to this model as a Gaussian process latent variable model (GP-LVM). Through analysis of the GP-LVM objective function, we relate the model to popular spectral techniques such as kernel PCA and multidimensional scaling. We then review a practical algorithm for GP-LVMs in the context of large data sets and develop it to also handle discrete valued data and missing attributes. We demonstrate the model on a range of real-world and artificially generated data sets.