Content uploaded by Philip Hilaire Torr
Author content
All content in this area was uploaded by Philip Hilaire Torr on Jan 30, 2014
Content may be subject to copyright.
Motivation: Static pose rotating 360◦
Data consists of actual pose and features derived from silhouette
(data artificially generated in Poser)
Visualization on the left from silhouette features. Visualization on the
right from pose features.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 2 / 25
Our Approach
Reduce dimensionality of the data.
INon linear dimensionality reduction.
IUnderlying assumption that data is really low dimensional — e.g. a
prototype with non-linear distortions.
Fusion of different modalities.
IConcatanate data observations
IY= [y1. . . yN]T∈ <N×DY(silhouette)
IZ= [z1. . . zN]T∈ <N×DZ(pose).
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 3 / 25
Fusion of the Data
Assume data sets have intrinsic low dimensionality, X= [x1,...,xN]T
where xn∈ <q,qDyand qDz.
yni =fY
i(xn) + Y
ni ,zni =fZ
i(xn) + Z
ni .
For Gaussian process priors over fY
i(·) and fZ
i(·) this is a shared
latent space variant of the GP-LVM (Shon et al., 2006; Ek et al., 2007;
Navaratnam et al., 2007).
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 4 / 25
Probabilistic CCA
If fi(·) are taken to be linear and
n∼N(0,C)
this model is probabilistic canonical correlates analysis (Bach and
Jordan, 2005).
For non-linear f·
i(·) with Gaussian process priors we have
GPLVM-CCA (Leen and Fyfe, 2006).
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 5 / 25
New Model
yni =fY
ixS
n,xY
n+Y
ni ,zni =fZ
ixS
n,xZ
n+Z
ni ,
The mappings are occurring from a latent space which is split into
three parts, XY=xY
nN
n=1,XZ=xZ
nN
n=1 and XS=xS
nN
n=1.
The1XYand XZtake the role of CZand CY.
1For linear mappings and qY=DY
−1 and qZ=DZ
−1 CCA is recovered.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 6 / 25
Non Linear CCA
Kernel-CCA (see e.g. Kuss and Graepel, 2003) implicitly assumes that
there is a smooth mapping from each of the data-spaces to a shared
latent space,
xs
ni =gY
i(yn) = gZ
i(zn).
We augment CCA to extract private spaces, XYand XZ.
To do this we make further assumption about the non-consolidating
subspaces,
xY
ni =hY
i(yn),xZ
ni =hZ
i(zn),
where hY
i(·) and hZ
i(·) are smooth functions.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 7 / 25
Initialize the GP-LVM
Spectral methods used to initialize the GP-LVM (Lawrence, 2005).
Harmeling (2007) observed that high quality embeddings are backed up
by high GP-LVM log likelihoods.
First step: apply kernel CCA to find shared sub-space.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 8 / 25
Canonical Correlates Analysis
Find linear transformations WYand WZmaximizing the correlation
between WYYand WZZ.
{ˆ
WY,ˆ
WZ}=argmax{WY,WZ}tr WT
YΣYZ WZ
s.t.tr WT
YΣYY WY=Itr WT
ZΣZZ WZ=I
the optima is found through an eigenvalue problem.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 9 / 25
Non Linear Canonical Correlates Analysis
We apply CCA in the dominant principal subspace of each feature
space instead of directly in the feature space (Kuss and Graepel, 2003).
Applying CCA recovers two sets of bases WYand WZexplaining the
correlated or shared variance between the two feature spaces.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 10 / 25
NCCA I
Need to describe private subspaces (XZ,XY).
Look for directions of maximum data variance that are orthogonal to
the canonical correlates.
Call the procedure non-consolidating components analysis (NCCA).
Seek the first direction v1of maximum variance orthogonal to W.
v1=argmaxv1vT
1Kv1
subject to: vT
1v1= 1 and vT
1W=0.
The optimal v1is found via an eigenvalue problem,
C−WWTKv1=λ1v1.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 11 / 25
NCCA II
For successive directions further eigenvalue problems of the form
K− WWT+
k−1
X
i=1
vivT
i!K!vk=λkvk
need to be solved.
Embeddings then take form:
XS=1
2(WYFY+WZFZ) (1)
XY=VYFY;XZ=VZFZ,(2)
where FYand FZrepresent the kernel PCA representation of each
observation space.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 12 / 25
Initialization of a GP-LVM
Purely spectral algorithm: the optimization problems are convex and
they lead to unique solutions.
Spectral methods are less useful in “inquisition” of the model.
The pre-image problem means that handling missing data can be
rather involved (Sanguinetti and Lawrence, 2006).
Build Gaussian process mappings from the latent to the data space.
This results in a GP-LVM model.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 13 / 25
Inference I
Given a silhouette (y∗), we can find the corresponding xS
∗position.
The likelihood of different poses (z∗) can then be visualized in the
private space for the poses, xZ
∗.
Disambiguation (not dealt with here) can then be achieved through
e.g. temporal information.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 14 / 25
Experiments
A walking sequence from the HumanEva database (Sigal and Black,
2006).
IFour cycles in a circular walk.
IUse two for training and two for testing for the same subject.
IEach image is represented using a 100 dimensional integral HOG
descriptor (Zhu et al., 2006).
IRepresent the pose space as the sum of a MVU kernel (Weinberger
et al., 2004) applied to the full pose space and a linear kernel applied on
the local motion.
IRepresent the HOG features with an MVU kernel.
On HumanEva: one dimensional shared space explaining data
variance: 9% image space. 18% pose space.
To retain 95% of the total variance in each observation two
dimensions are needed for private spaces.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 18 / 25
Discussion
Careful fusion of multimodal data at training stage allows for elegant
disambiguation when only part of the data is available at test time.
Further work:
IRefinement with GPLVM algorithm.
IDisambiguation with temporal information.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 23 / 25
References I
F. R. Bach and M. I. Jordan. A probabilistic interpretation of canonical
correlation analysis. Technical Report 688, Department of Statistics, University
of California, Berkeley, [PDF].
C. H. Ek, P. H. Torr, and N. D. Lawrence. Gaussian process latent variable
models for human pose estimation. In 4th Joint Workshop on Multimodal
Interaction and Related Machine Learning Algorithms (MLMI 2007), volume
LNCS 4892, pages 132–143, Brno, Czech Republic, Jun. 2007. Springer-Verlag.
S. Harmeling. Exploring model selection techniques for nonlinear dimensionality
reduction. Technical Report EDI-INF-RR-0960, University of Edinburgh, [PDF].
M. Kuss and T. Graepel. The geometry of kernel canonical correlation analysis.
Technical Report TR-108, Max Planck Institute for Biological Cybernetics,
T¨
ubingen, Germany, [PDF].
N. D. Lawrence. Probabilistic non-linear principal component analysis with
Gaussian Process latent variable models. J. Mach. Learn. Res., 6:1783–1816,
2005. ISSN 1533-7928. [URL].
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 24 / 25
References II
G. Leen and C. Fyfe. A Gaussian process latent variable model formulation of
canonical correlation analysis. Bruges (Belgium), 26-28 April 2006 2006.
[PDF].
R. Navaratnam, A. Fitzgibbon, and R. Cipolla. The joint manifold model. In IEEE
International Conference on Computer Vision (ICCV), 2007.
G. Sanguinetti and N. D. Lawrence. Missing data in kernel pca. In ECML,
Lecture Notes in Computer Science, Berlin, 2006. Springer-Verlag.
A. Shon, K. Grochow, A. Hertzmann, and R. Rao. Learning shared latent
structure for image synthesis and robotic imitation. Proc. NIPS, pages
1233–1240, 2006.
L. Sigal and M. Black. Humaneva: Synchronized video and motion capture
dataset for evaluation of articulated human motion. Brown Univertsity TR,
2006.
K. Weinberger, F. Sha, and L. Saul. Learning a kernel matrix for nonlinear
dimensionality reduction. ACM International Conference Proceeding Series,
2004.
Q. Zhu, S. Avidan, M. Yeh, and K. Cheng. Fast Human Detection Using a
Cascade of Histograms of Oriented Gradients. CVPR, 1(2):4, 2006.
Carl Henrik Ek et al (NIPS Workshops 2008) Data Consolidation December 14, 2008 25 / 25