Spectral Properties of the Alignment Matrices in Manifold Learning.
-
Citations (0)
- Cited In (1)
-
Article: On affine rigidity
[show abstract] [hide abstract]
ABSTRACT: We study the properties of affine rigidity of a hypergraph and prove a variety of fundamental results. First, we show that affine rigidity is a generic property (i.e., depends only on the hypergraph, not the particular embedding). Then we prove that a graph is generically neighborhood affinely rigid in d-dimensional space if it is (d+1)-vertex-connected. We also show neighborhood affine rigidity of a graph implies universal rigidity of its squared graph. Our results, and affine rigidity more generally, have natural applications in point registration and localization, as well as connections to manifold learning. Comment: 18 pages, 6 figures11/2010;
Page 1
SPECTRAL PROPERTIES OF THE ALIGNMENT MATRICES
IN MANIFOLD LEARNING
HONGYUAN ZHA∗AND ZHENYUE ZHANG†
Abstract. Local methods for manifold learning generate a collection of local parameterizations
which is then aligned to produce a global parameterization of the underlying manifold. The alignment
procedure is carried out through the computation of a partial eigendecomposition of a so-called
alignment matrix. In this paper, we present an analysis of the eigen-structure of the alignment
matrix giving both necessary and sufficient conditions under which the null space of the alignment
matrix recovers the global parameterization. We show that the gap in the spectrum of the alignment
matrix is proportional to the square of the size1of the overlap of the local parameterizations thus
deriving a quantitative measure of how stably the null space can be computed numerically. We also
give a perturbation analysis of the null space of the alignment matrix when the computation of the
local parameterizations is subject to error. Our analysis provides insights into the behaviors and
performance of local manifold learning algorithms.
1. Introduction. Consider the following unsupervised learning problem: we are
given a parameterized manifold M of dimension d embedded into the m-dimensional
Euclidean space Rm, d < m, and M = f(Ω) with a mapping f : Ω → Rm, where Ω
is open in Rd[9, section 5.22]; suppose we have a set of points x1, ···, xN, sampled
possibly with noise from the manifold M,
xi= f(τi) + ?i,
(1.1)
i = 1,...,N,
where the {?i} represent noise; we are interested in recovering the {τi} and/or the
mapping f(·) from the noisy data {xi}. This problem is generally known as manifold
learning or nonlinear dimension reduction, and has generated much research interest
in the machine learning and statistics communities [10, 12]. A class of local methods
for manifold learning starts with estimating a collection of local structures around each
sample point xiand then aligns (either implicitly or explicitly) those local structures to
obtain estimates for {τi} by computing a partial eigendecomposition of an alignment
matrix. Examples of local methods include LLE (Locally Linear Embedding) [10],
manifold charting [2], geodesic null space analysis [3], Hessian LLE [5], LTSA (Local
Tangent Space Alignment) [17], and the modified LLE (MLLE) [13]. Those methods
have been applied to analyzing high-dimensional data arising from application areas
such as computer vision, speech analysis as well as molecular dynamics simulations.
In contrast to the ever-increasing use of manifold learning methods and the fre-
quent appearance of new algorithms, little has been done to assess the performance of
those methods, even though manifold learning methods in general tend to be rather
sensitive to the selection of several tuning parameters [3, 4, 17]. Usually one applies
a manifold learning algorithm to a high-dimensional data set, sometimes one recovers
satisfactory parameter vectors, sometimes one obtains catastrophic folds in the com-
puted parameterization and one needs to tune the parameters and try again. This
∗Division of Computational Science and Engineering, College of Computing, Georgia Institute of
Technology, Atlanta, GA, 30332-0280, zha@cc.gatech.edu. The work of this author was supported
in part by NSF grants CCF-0430349 and DMS-0736328. A preliminary version of a subset of the
results reported in this paper were published without proof in [16].
†Department of Mathematics, Zhejiang University, Yuquan Campus, Hangzhou, 310027, P. R.
China. zyzhang@zju.edu.cn. Corresponding author. The work of this author was supported in part
by NSFC (project 10771194) and NSF grant CCF-0305879.
1The precise definition of the size of the overlap is given in section 6.
1
Page 2
2
is a rather unsatisfactory situation which calls for more research into the robustness
issues and the performance issues of manifold learning algorithms.
One source of the catastrophic folds in the computed parameterization is the
variety of errors involved in manifold learning including the noise in the data, the
approximation errors in learning the local structures, and the numerical errors in
computing the eigenspace of the alignment matrix. It is not surprising that these
errors will degrade the accuracy of the computed parameter vectors. However, in
addition to those issues, there is another important question that has been largely
ignored in the past: assuming, in the ideal noise-free case, that the local structures are
exactly known and the eigenvector space is exactly computed, will the local manifold
learning algorithms produce the true parameter vectors? The answer may actually be
negative: it very much depends on how the local structures overlap with one another.
If we cannot obtain the true parameter vectors in the noise-free case with all the
computations done without error, then we cannot expect to do something reasonable
when the data as well as the computations are subject to error.
The objective of this paper is to gain a better understanding of the key alignment
procedure used in local manifold learning methods by analyzing the eigen-structure
of the alignment matrix. We focus on the representative alignment matrix used in
LTSA, and we address two questions in particular: 1) under what conditions wa can
recover the parameter vectors {τi} from the null space of the alignment matrix if
the alignment matrix is computed exactly, and 2) how stable this null space is if the
computation of the alignment matrix is subject to error. We motivate the importance
of addressing the two problems using a simple example in section 3 after a brief review
of LTSA in section 2. We then approach the two problems as follows: in section 4, we
address the issue of how errors in computing the local parameterizations will affect the
null space of the alignment matrix. This allows us to focus on the spectral properties
of the ideal alignment matrices and separate the local error issues from the rest of the
discussions; section 5 is the main part of the paper, where we propose the concept of
affine rigidity to precisely address the first question above. We then establish a variety
of conditions to characterize when an alignment matrix is affinely rigid. Along the way
we also prove some properties of the alignment matrix that will have computational
significance; in section 6, we address the second question by proving a lower bound
for the smallest nonzero eigenvalue of the alignment matrix.
Remark. Though only the alignment matrices of LTSA are discussed in detail, we
believe that similar approaches can be applied to the analysis of other local methods
such as LLE, Hessian LLE, or even Laplacian eigenmap [1]. (See Appendix A for a
brief discussion of the alignment matrices used in LLE and Laplacian eigenmap.)
Notation. We use e to denote a column vector of all 1’s the dimension of which
should be clear from the context. N(·) and span(·) denote the null space and the range
space of a matrix, respectively. For an index set Ii = {i1,...,ik}, A(:,Ii) denotes
the submatrix of A consisting of columns of A with indices in Ii. A similar definition
A(Ii,:) is for the rows. We also represent the submatrix consisting of vectors xj, j ∈ Ii
by Xi= [...,xj,...] with j ∈ Ii(in the increasing order of the index j). For a set of
submatrices Ti= T(:,Ii), i ∈ Jj, we denote by TJjthe submatrix T(:,∪i∈JjIi). ? · ?2
is the spectrum norm and ? · ?F is the Frobenius norm of a matrix. The superscript
T denotes matrix transpose. A†denotes the Moore-Penrose generalized inverse of A.
For an Hermitian matrix A of order n, λ1(A) ≤ ··· ≤ λn(A) denote the eigenvalues
of A in nondecreasing order. The identity matrix is denoted by I or I(d)if its order
d is indicated. Finally, λ+
min(A) denotes the smallest nonzero eigenvalue of a positive
Page 3
3
semi-definite matrix.
2. Alignment Matrices of LTSA. We first review how the LTSA alignment
matrices are constructed [17]. For a given set of sample points {xi}, we begin with
building a connectivity graph on top of those sample points which specifies, for each
sample point, which subset of the sample points constitutes its neighborhood [10].
Let the set of neighbors for the sample point xibe Xi= [xi1,...,xiki], including xi
itself. We approximate those neighbors using a d-dimensional (affine) linear subspace,
xij≈ ¯ xi+ Qiθ(i)
j,Qi= [q(i)
1,...,q(i)
d],j = 1,...,ki.
Here d is the dimension of the manifold,2¯ xi∈ Rm, Qi∈ Rm×dis orthonormal, and
θ(i)
j
∈ Rdare the local coordinates of xij’s associated with the basis matrix Qj. The
optimal least-square-fitting is determined by solving the following problem
min
c,Q,{θj}:QTQ=I(d)
ki
?
j=1
?xij− (c + Qθj)?2
2.
That is, ¯ xiis the mean of the xij’s and θ(i)
decomposition of the centered matrix Xi−¯ xieT, Qican be computed as the matrix of
the right singular vectors corresponding to the d largest singular values of Xi− ¯ xieT
[6]. We postulate that in each neighborhood, the corresponding global parameter
vectors Ti= [τi1,...,τiki] of T ∈ Rd×Ndiffer from the local ones Θi= [θ(i)
by a local affine transformation. The errors of the optimal affine transformation are
then given by
j
= QT
i(xj− ¯ xi). Using the singular value
1,...,θ(i)
ki]
min
ci,Li
ki
?
j=1
?τij− (ci+ Liθ(i)
j)?2
2= min
ci,Li?Ti− (cieT+ LiΘi)?2
F= ?Ti?Φi?2
F,
(2.1)
where?Φi is the orthogonal projection whose null space is spanned by the columns
then Ti?Φi= 0. In general, Ti?Φi?= 0, and we seek to compute the parameter vectors
N
?
over T = [τ1,...,τN]. Here
of [e,ΘT
i].3Note that if Θiis affinely equal to Ti, i.e., span([e,ΘT
i]) = span([e,TT
i]),
{τi} by minimizing the following objective function,
?
i
min
ci,Li
ki
?
j=1
?τij− (ci+ Liθ(i)
j)?2
2
?
=
N
?
i
?Ti?Φi?2
F= tr(T?ΦTT) (2.2)
?Φ =
N
?
i=1
Si?ΦiST
i
(2.3)
is the alignment matrix with Si∈ RN×ki, the 0-1 selection matrix, such that Ti= TSi.
Imposing certain normalization conditions on T such as TTT= I(d)and Te = 0, the
corresponding optimization problem,
min
TTT=I(d), Te=0tr(T?ΦTT) (2.4)
2We assume d is known which can be estimated using a number of of existing methods [8, 15].
3? Φican be represented as? Φi= I − [e,ΘT
i][e,ΘT
i]†∈ Rki×ki.
Page 4
4
−1001020
−15
−10
−5
0
5
10
020 40
arc length
60 80100
−0.3
−0.2
−0.1
0
0.1
0.2
ui
02040
arc length
6080100
−0.2
−0.1
0
0.1
0.2
0.3
0.4
ui
−100 1020
−15
−10
−5
0
5
10
020 4060 80100
0
0.05
0.1
0.15
0.2
arc length
ui
020 40 6080100
−0.3
−0.25
−0.2
−0.15
−0.1
−0.05
0
arc length
ui
Fig. 3.1. The spiral data points (left column) and the second eigenvector u of ? Φ when ? Φ has
two small eigenvalues (middle column) or more than two small eigenvalues (right column) close to
zero.
is solved by computing the eigenvectors corresponding to λ2,···,λd+1of?Φ, here the
··· ≤ λN. We remark that if T is a solution to the problem (2.4), then QT is also a
solution for any orthogonal matrix Q of order d.
eigenvalues are arranged in nondecreasing order, i.e., λ1= 0 ≤ λ2≤ ··· ≤ λd+1≤
3. An Illustrative Example. In the ideal case when we have span([e,ΘT
span([e,TT
ment matrix (cf. section 4), and it looks like we can just use the (approximate) null
space of the alignment matrix to compute the parameter vectors as suggested be-
fore. Unfortunately, the null space may contain unwanted information in addition to
span([e,TT]), depending on how the neighborhoods overlap with each other. In the
following, we present a simple example to illustrate this phenomenon.
In what follows, we call each Xi(or the corresponding Ti) a section. Our analysis
is general enough that the Xi can correspond to an arbitrary subset of the sample
points. So henceforth, the sample points x1,...,xN are grouped into s (possibly
overlapping) sections X1,...,Xs, and the i-th section Xi is denoted by the points
{xj|j ∈ Ii} with the index subset Ii⊆ {1,...,N}.
Example 1. We generate N = 100 two-dimensional points
i]) =
i]), it is not difficult to see that TTbelongs to the null space of the align-
xi= [ticos(ti), tisin(ti)]T,i = 1,...,100,
sampled from the one-dimensional spiral curve with t1,...,tN equally spaced in the
interval [π/5,2π] with t0 = π/5 and tN = 2π. See the upper-left panel of Figure
3.1 for the set of the two-dimensional sample points. It is well known that a regular
smooth curve is isometric to its arc length. The exact arc length coordinate τifor the
sample point xion the spiral curve is given by τi=
?ti
t0
?
1 + t2dt.
First we choose 19 sections Xi= X(:,Ii), i = 1,...,19, with the index subsets,
Ii= (5(i − 1) + 1) : (5i + 2),i = 1,...,18,I19= 91 : 100.
Page 5
5
Thus, each pair of two consecutive sections share exactly two points. We construct the
alignment matrix?Φ as defined by (2.3): initially set?Φ = 0 and update its principle
?Φ(Ii,Ii) :=?Φ(Ii,Ii) +?Φi,
The orthogonal projection?Φiis given by?Φi= I − PiPT
The resulting alignment matrix?Φ has two smallest eigenvalues 10−16in magnitude
eigenvalues. The solution of problem (2.4) with d = 1 is given by the eigenvector
u = [u1,...,uN]Tof?Φ corresponding to the second smallest eigenvalue, that is an
affinely equivalent to the arc length τ, i.e., there are a ?= 0 and b such that ui= aτi+b
for all i. In the middle panel of the top row of Figure 3.1, we plot the computed {ui}
against the arc length coordinates {τi}. The plotted points are approximately on a
straight line, indicating an accurate recovery of {τi} within an affine transformation.
If the minimal number of the shared points among some of the consecutive sections
is reduced to one,?Φ may have more than two small eigenvalues close to zero. The
unwanted vectors. For example, if we delete the last columns in the two sections X6
and X13, respectively, then the two consecutive sections X6and X7share one point
only. So do X13and X14. This weakens the overlap between X6and X7, as well as
that between X13and X14. As a result,?Φ has four eigenvalues close to zero (there are
eigenvectors which include e, τ and two other vectors. Since the computed eigenvector
of the second smallest eigenvalue is a linear combination of the four eigenvectors, it
generally will not give the correct approximation to τ. In the top right panel of Figure
3.1, we plot such a computed eigenvector u against τ, showing that it is no longer
proportional to τ. Similar phenomenon occurs for noisy data as well, see the bottom
row of Figure 3.1 where we added noise to the spiral curve data.
This example clearly shows the importance of the null space structure of the
alignment matrix in recovering the parameter vectors. In particular, lack of overlap
among the sections will result in a null space producing incorrect parameter vectors.
submatrix?Φ(Ii,Ii) one-by-one as follows,
i = 1,···,19.
i with Pi= [
1
√kie,vi]. Here
viis the eigenvector of (Xi− ¯ xieT)T(Xi− ¯ xieT) corresponding its largest eigenvalue.
and the third smallest eigenvalue is about 10−5, distinguishable from the two smallest
affine approximation of the arc length coordinates of the sample points. Ideally, u is
corresponding eigenvectors contain not only e and τ = [τ1,...,τN]Tbut also other
four computed smallest eigenvalues of magnitude 10−16) and four linearly independent
4. Perturbation Analysis of the Alignment Matrix. Affine errors intro-
duced in the local coordinates will produce an inaccurate alignment matrix which
determines if the resulting parameter vectors acceptable or not. In this section, we
consider the effects of local approximation errors on the alignment matrix and its null
space. We will make use of matrix perturbation analysis on the alignment matrix. Our
approach consists of the following two parts: 1) error estimation of the approximation
of alignment matrix in terms of the local errors and 2) perturbation analysis of null
space of the alignment matrix resulted by the approximation error. In particular, we
will show that the local errors are magnified by the condition numbers of the centered
sectionsˆTi= Ti−¯tieT, where¯ti is the mean of columns in Ti. In addition to the
error in the alignment matrix due to the local approximations, the nonzero smallest
eigenvalue of the exact alignment matrix Φ is also crucial to the determination of the
accuracy of the computed parameter vectors.
To this end, let X1,...,Xsbe s sections of the sample points x1,...,xNgiven in
(1.1) and T1,...,Tsthe corresponding sections of the parameter vectors τ1,...,τN.
Page 6
6
Denote by Iithe index subset of the section i with size ki= |Ii|, i.e.,
Xi= {xj|j ∈ Ii},
The local coordinates, denoted by Θi, of points in section Xiare generally not equal
to Tiwithin an affine transformation. The optimal affine error is
Ti= {τj|j ∈ Ii}.
?Ei?2= min
c,L?Ti− (ceT+ LΘi)?2.
(4.1)
As shown in (2.1), ?Ei?2= ?Ti?Φi?2. We now consider how the local errors affect the
Denote by Φ the alignment matrix constructed by the exact parameter vector
sections T1,...,Ts, and?Φ the alignment matrix constructed by the sections of local
?Φ =
where,?Φiand Φiare the orthogonal projections with null spaces
alignment matrix.
coordinates Θ1,...,Θs, as in (2.3),
s
?
i=1
Si?ΦiST
i,
Φ =
s
?
i=1
SiΦiST
i,
N(?Φi) = span([e,ΘT
i]),
N(Φi) = span([e,TT
i] and [e,TT
i]),
respectively. We assume that both [e,ΘT
ki≥ d+2 for all i to insure that?Φiand Φiare not identically zero. It is easy to verify
is of full row rank. In that case,ˆTihas a finite condition number defined by κ(ˆTi) =
?ˆTi?2?ˆT†
Theorem 4.1. Let ?Ei?2 denote the local error defined in (4.1) and κ(ˆTi) the
condition number ofˆTi. Then
i] are of full-column rank, and
that [e,TT
i] is of full column rank if and only if the centered matrixˆTi= Ti−¯tieT
i?2, which will appear in our error bound below.
??Φ − Φ?2≤
s
?
i=1
?Ei?2
?ˆTi?2
κ(ˆTi).
(4.2)
Proof. The error matrix?Φ − Φ is clearly given by?Φ − Φ =?s
??Φi− Φi?2. Since both?Φiand Φiare orthogonal projections with the same rank, by
??Φi− Φi?2= ?(I − Φi)?Φi?2.
We can write I −Φi=
span([e,TT
i=1Si(?Φi− Φi)ST
i,
and hence, ??Φ − Φ?2≤?
Theorem 2.6.1 of [6] we have that
i??Φi− Φi?2. What we need to do is to bound the errors
1
kieeT+ˆT†
iˆTi, because I −Φiis the orthogonal projection onto
i]). It follows from eT?Φi= 0 that
??Φi− Φi?2= ?ˆT†
The error bound in (4.2) follows immediately by summing the above error bounds.
It is gratifying to see that the local errors affect the alignment matrix in a linear
fashion, albeit by a factor which is the condition number ofˆTi. We remark that these
condition numbers may be made smaller if we increase the size of the neighborhoods.
iˆTi?Φi?2≤ ?ˆT†
i?2?ˆTi?Φi?2=?Ei?2
?ˆTi?2
κ(ˆTi).
Page 7
7
Now we consider the perturbation analysis of the null space of the alignment
matrix. The following theorem gives an error bound for this approximation (related
to Theorem 4.1 in [18]) in terms of the the smallest nonzero eigenvalue λ+
and the approximation error ??Φ − Φ?2.
corresponding to the r smallest eigenvalues. Denote λ+
If ? <1
basis matrix G of N(Φ) such that
min(Φ) of Φ
Theorem 4.2. Let r = dim(N(Φ)) and let U be an eigenvector matrix of?Φ
min= λ+
min(Φ), ? = ??Φ − Φ?2.
4λ+
minand 4?2(1 − λ+
min+ 2?) < (λ+
min− 2?)3, then there exists an orthonormal
?U − G?2≤
2?
λ+
min− 2?.
(4.3)
Proof. Let G0be an orthonormal basis matrix of N(Φ) and G1the orthogonal
complement of G0, i.e., [G0,G1] is an orthogonal matrix. By the standard perturba-
tion theory [11, Theorem V.2.7] for invariant subspaces, there is a matrix P satisfying
?P?2≤
2?
λ+
min− 2?
(4.4)
such that?U = (G0+G1P)(I+PTP)−1/2is an orthogonal basis matrix of an invariant
??U − G0?2=
=
P(I + PTP)−1/2
subspace of?Φ. By simple calculation, we have that
????
?(I + PTP)−1/2− I
?????
≤ ?P?2.
P(I + PTP)−1/2
?(I + PTP)−1/2− I
= ?2(I − (I + PTP)−1/2)?1/2
?????
2
?T ?(I + PTP)−1/2− I
P(I + PTP)−1/2
??????
1/2
2
2
The error bound (4.3) follows from the above bound and (4.4) if we can prove that
U =?UQTholds with an orthogonal matrix Q of order r and we also set G = G0QT.
with the r smallest eigenvalues of?Φ. We just need to show that ??Φ?U?2< λr+1(?Φ).
trices [11], |λi(?Φ) − λi(Φ)| ≤ ??Φ − Φ?2. It follows that
This is equivalent to proving that the invariant subspace span(?U) of?Φ is associated
We first estimate λr+1(?Φ). By eigenvalue perturbation theory of symmetric ma-
λr+1(?Φ) ≥ λ+
min− ?,
min. On the other hand, by (4.4),
since Φ is positive semidefinite and λr+1(Φ) = λ+
??Φ?U?2= ??UT?Φ?U?2≤ ??UT(?Φ − Φ)?U?2+ ??UT?Φ?U?2
< ??Φ − Φ?2+
≤ ? +
4?2+ (λ+
< λ+
min− ?,
min− 2?)3. Thus ??Φ?U?2< λr+1(?Φ).
?P?2
1 + ?P?2
4?2
min− 2?)2
2
2
because 4?2(1 − λ+
min+ 2?) < (λ+
Page 8
8
We now explain why Theorem 4.2 illustrates the importance of N(Φ) and λ+
in understanding the alignment procedure in manifold learning. As we will show in
the next section, it is always true that span([e,TT]) ⊆ N(Φ). Theorem 4.2 shows
that the true parameter vectors can be obtained, up to the error bound in (4.3),
from the invariant subspace of the computed alignment matrix?Φ corresponding to
relatively small. The smallest positive eigenvalue λ+
determines how much error is allowed in the computed alignment matrix for a reliable
recovery of the parameter vectors by LTSA. Specifically, if N(Φ) = span([e,TT]),
good approximation in the local coordinate matrices Θiand a not too small λ+
will guarantee that the eigenvector matrix of?Φ corresponding to the d + 1 smallest
transformation.
5. The Null Space of the Alignment Matrix. This section focuses on the
null space of the ideal alignment matrix Φ. We will establish conditions under which
the equality N(Φ) = span([e,TT]) holds. The section is divided into the following
five parts: 1) we first establish some general properties about the null space of the
alignment matrix; 2) we then present a necessary and sufficient condition for N(Φ) =
span([e,TT]) in the special case when we have two sections, i.e., s = 2; 3) we give
necessary conditions for the general case s ≥ 3; 4) we also present sufficient conditions
for the general case s ≥ 3, and 5) finally we establish an interesting contraction
property of N(Φ) when some sections are merged into super-sections.
5.1. General properties of N(Φ). It follows from the definition of Φ that
Φ[e,TT] =
SiΦiST
min(Φ)
its smallest eigenvalues, provided the errors introduced to the alignment matrix are
min(Φ) of the true alignment matrix
min(Φ)
eigenvalues will give a good approximation of the parameter vectors T up to an affine
?
i
i[e,TT] =
?
i
SiΦi[e,TT
i] = 0,
which implies that
span([e,TT]) ⊆ N(Φ).
(5.1)
Consider a null vector v ∈ N(Φ). Denote by vi = Siv the restriction of v to
the section Ti, i = 1,...,s. Since each term SiΦiST
Φv = 0 implies SiΦiST
vi∈ span([e,TT
vi= [e,TT
(5.2)
i in Φ is positive semidefinite,
iv = 0, hence the restriction vimust be a null vector of Φi. So
i]) by the definition of Φi, and therefore, it can be represented as
i]wi,wi∈ Rd+1.
The vector widefines an affine transformation from Rdto R, wi: τ → [1,τT]wi≡
wi(τ). Notice that the common part of each pair viand vjshould be equal, i.e.,
[e,TT
ij]wi= [e,TT
ij]wj,
(5.3)
where Tijis the intersection of Tiand Tj.
Definition 5.1. Let w = {w1,...,ws} be a set of (d + 1)-dimensional vectors.
We call w a certificate for the collection {T1,...,Ts} if the conditions (5.3) hold for
all i ?= j. In particular, w is a trivial certificate if all wi’s are equal to each other.
As we mentioned above, each certificate w = {w1,...,ws} defines a collection of
s linear affine maps from Rdto R: τ → [1,τT]wi. If we restrict the i-th map wion
the columns of Ti, then w defines a function on the N columns of T to R:
w : τ ∈ Ti→ [1,τT]wi,i = 1,···,s,
Page 9
9
where τ ∈ Ti means that τ is a column of Ti. There is no ambiguity for vectors
belonging to the intersection of two sections, say Tiand Tj, since the conditions (5.3)
hold. Thus, w maps T ∈ Rd×Nto a vector v ∈ RNwhose j-th component is defined
by w(τj) = [1,τT
j]wiif the j-th column τj∈ Ti, i.e.,
v = w(T) ≡ [w(τ1),···,w(τN)]T.
What we are interested is the set W = W{Ti}of all certificates of a fixed collection
{Ti} of T. It is easy to verify that W is a linear space with the usual addition and
scalar multiplication operations. For the fixed collection {Ti} of T, let us denote by
φ the mapping from W to RNdetermined by w(T):
φ : w → v = w(T),
and denote it as v = φ(w). It is easy to verify that φ is a linear map.
There is a close relation between N(Φ) and the certificate space W through the
linear map φ for the considered collection {T1,...,Ts}: for a given w ∈ W, consider
the restriction vi of vector v = φ(w) to Ti. By definition, vi is given by (5.2) for
i = 1,···,s. It follows that v is a null vector of Φ. On the other hand, we have
shown that for each v ∈ N(Φ), there is a certificate w = {w1,...,ws} satisfying (5.2).
This implies v = φ(w). Therefore, φ is an onto-map from W to N(Φ). Since we
always assume that each [e,TT
isomorphic. Specially, φ maps a trivial certificate to a vector in span([e,TT]) ⊂ N(Φ).
Theorem 5.2. 1) The null space N(Φ) and the certificate space W are isomor-
phic to each other and the linear transformation φ defined above is an isomorphism
between the two linear spaces. Moreover, the subspace of all trivial certificates is
isomorphic to the subspace span([e,TT]) of N(Φ).
2) The equality N(Φ) = span([e,TT]) holds if and only if {T1,...,Ts} has only
trivial certificates.
We single out those collections that have only trivial certificates.
Definition 5.3. We call a collection {T1,...,Ts} affinely rigid if it has only
trivial certificates.
Geometrically, those are the collections the overlaps among their sections are
strong and exhibit certain rigidity reminiscent of graph rigidity discussed in [7]. In
particular, part 2) of Theorem 5.2 can be restated as
(5.4)
i] is of full column rank, φ is also one-to-one and hence
N(Φ) = span([e,TT]) if and only if {T1,...,Ts} is affinely rigid.
5.2. Necessary and sufficient conditions of affine rigidity for s = 2.
Consider the case when s = 2, i.e., Φ = S1Φ1ST
T2. In this case, we can characterize affine rigidity using an intuitive geometric notion
defined below.
Definition 5.4.
We say two sections Ti and Tj are fully overlapped if the
intersection Tij= Ti∩ Tjis not empty and [e,TT
Clearly, Tiand Tj are fully overlapped if they share at least two distinct points
in the one-dimensional case d = 1, or if they share at least three points that are
not co-linear in the two-dimensional case. Using this concept, we can establish the
following result.
Theorem 5.5. {T1,T2} is affinely rigid if and only if T1and T2are fully over-
lapped.
1+ S2Φ2ST
2for two sections T1and
ij] is of full column-rank.
Page 10
10
(a)(b)
Fig. 5.1. Two possible layouts for the global coordinates.
Proof. We only show the necessity. Let us assume that T1and T2are not fully
overlapped. Since [e,TT
such that T12w1= T12w2. Thus w = {w1,w2} is a non-trivial certificate for {T1,T2}.
Hence, {T1,T2} is not affinely rigid by Theorem 5.2.
We now illustrate the case when a pair of sections are not fully overlapped by a
simple example with d = 1.
Example 2. Consider the situation depicted in Figure 5.1. The data set has
five points marked by short vertical bars. Two sections are considered as shown in
panel (a) of Figure 5.1 with a thick line segment and a thin line segment connecting
the points in each section. The first section consists of the left three points, and the
second one consists of the right three points. The two sections share a single point
denoted by a circle. We can fold the second section around the point marked by circle,
while keeping the first section unchanged, see the resulting layout shown in the panel
(b). This example clearly shows that the collection of the two sections is not affinely
rigid.
The algebraic picture of the above is also clear. Let τ1,...,τ5be real numbers
denoting the five different points. T = [τ1,...,τ5], T1= [τ1,τ2,τ3] and T2= [τ3,τ4,τ5],
giving T12= τ3. It is easy to verify that [e, TT
[e,TT
{w1,w2} of {T1,T2}, w?= {w1,w2+ w0} is a different certificate of {T1,T2}. One of
w and w?must be non-trivial, and hence the collection of sections is not affinely rigid.
12] is not of full column rank, we can find distinct w1and w2
1] and [e, TT
2] are of full rank. However
12] has a nonzero null vector w0 = [τ3,−1]T. Thus, for each certificate w =
5.3. Necessary conditions of affine rigidity for s ≥ 3. For the case when a
collection has three or more sections, we can partition the sections into two subsets,
say {Ti1, ..., Tik} and {Tik+1, ..., Tis}, and consider the union of the sections in each
subset,
T1:k= Ti1∪ ... ∪ Tik,Tk+1:s= Tik+1∪ ... ∪ Tis.
The following theorem shows that affine rigidity of T implies that T1:kand Tk+1:sare
fully overlapped.
Theorem 5.6. If the collection {T1,...,Ts} is affinely rigid, then for any parti-
tions {Ti1, ..., Tik} and {Tik+1, ..., Tis} with 1 ≤ k < s, ∪k
fully overlapped.
Proof. We prove this theorem by reduction to absurdity. If there is a partition,
without loss of generality we denote the partition as, {T1,···,Tk} and {Tk+1, ..., Ts}
(k < s) such that the two super-sections T1:k= ∪k
not fully overlapped, then there are (d + 1)-dimensional vectors w??= w??such that
j=1Tijand ∪s
j=k+1Tijare
j=1Tj and Tk+1:s= ∪s
j=k+1Tj are
[e,TT
1:k,k+1:s]w?= [e,TT
1:k,k+1:s]w??,
where T1:k,k+1:sis the intersection of T1:kand Tk+1:s. Define w = {w1,...,ws} with
w1= ··· = wk= w?,wk+1= ··· = ws= w??,
Page 11
11
section 1
section 2 section 3
section 4
Fig. 5.2. Overlapping patterns of four sections.
It is obvious that w = {w1,...,ws} is a non-trivial certificate for {T1,...,Ts}. By
Definition 5.3, {T1,...,Ts} is not an affinely rigid, a contradiction to the assumption
of the theorem.
The necessary condition shown above is, however, not sufficient if s > 2. Below
is a counterexample for s = 4 and an arbitrary k with 1 ≤ k < s.
Example 3. Consider a data set of seven one-dimensional points
{−3,−2,−1,0,1,2,3}
and an associated collection of four sections (s = 4),
T1= [−3,−2,−1],
See Figure 5.2 for each section denoted by arrows emitting from a single point. Clearly
each section Ti and the union of the rest are fully overlapped.
3. Consider any partitions {Ti1, ..., Tik} and {Tik+1, ..., Ti4}. It is easy to verify
that the union ∪k
two or more distinct points in the line. We show, however, that {T1,···,T4} is not
affinely rigid. To this end, we represent each Φi explicitly as follows: Φi =1
with q = [1,−2,1]T, due to each [e,TT
z = [0,0,0,1,2,2,2]T. The restrictions zi= ST
0
2
T2= [−1,0,1],T3= [1,2,3],T4= [−2,0,2].
Let k = 1,2 or
j=1Tijand ∪4
j=k+1Tijare always fully overlapped, since they share
6qqT
Let
i]Thas the same null space span(q).
iz of z corresponding to Tiare
2
ST
1z =
0
0
,
iq = 0 for i = 1,···,4, we conclude that z is a null vector of Φ.
However, z / ∈ span([e,TT]). So N(Φ) ?= span([e, TT]), or equivalently, {T1,···,T4} is
not affinely rigid.
In the above example, any pair of sections are not fully overlapped. However, it
is also possible that a collection is affinely rigid even if any pair of its sections are not
fully overlapped. Here is a simple example: Let T be the matrix of three vertices of
a regular triangle and T1,T2,T3be three sections each consists of two vertices. The
resulting collection is affinely rigid but Tiand Tjare not fully overlapped for i ?= j.
5.4. Sufficient conditions of affine rigidity for s ≥ 3. We associate a col-
lection of sections {T1,...,Ts} with a graph G constructed as follows: its s vertices
represent the s sections, where there is an edge between vertices i and j if sections
Tiand Tjare fully overlapped. The following theorem gives a sufficient condition for
ST
2z =
0
1
,ST
3z =
2
2
,ST
4z =
0
1
2
,
respectively. Since zT
Page 12
12
affine rigidity of a collection of sections based on the connectedness of its associated
graph G.
Theorem 5.7. The collection {T1,...,Ts} is affinely rigid if its associated graph
G is connected.
Proof. We need to show if w = {w1,...,ws} is a certificate of T, then G is
connected implies that w is a trivial certificate.
Consider any pair of wi and wj. Because G is connected, there is a path, say
i1= i,···,ir= j, connecting vertices i and j. The adjacency between ik and ik+1
implies that Tikand Tik+1are fully overlapped, i.e., N([e,TT
from (5.3) with i = ik and j = ik+1that wik= wik+1for k = 1,···,r − 1. Hence
wi≡ wi1= wi2= ... = wir≡ wj.
Now we consider the case when the graph G of {T1,...,Ts} is not connected. Let
the connected components of G be {G1,···,Gr}, i.e., each Gjis a connected subgraph
of G and there are no edges between vertices in different subgraphs. We denote by Jj
the index set of the vertices in subgraph Gj, and merge the sections Tk, k ∈ Jjinto
a super-section
ik,ik+1]) = {0}. It follows
TJj= ∪k∈JjTk,
i.e., the matrix consisting of column vectors in {Tk,k ∈ Jj}. This collection of super-
sections {TJ1,...,TJr} produces an alignment matrix?Φ. We show that both Φ and?Φ
Theorem 5.8. Let {TJ1,···,TJr} be the super-sections obtained by merging con-
nected components of {T1,···,Ts}. Then N(?Φ) = N(Φ).
of {T1,...,Ts}. Due to the connectedness of subgraphs Gj, the sub-collection {Tk, k ∈
Jj} is affinely rigid by Theorem 5.7, and hence, each subset {wk, k ∈ Jj} is a trivial
certificate for the sub-collection {Tk, k ∈ Jj}, i.e., all wk,k ∈ Jj are equal to each
other. We simply denote them by wJj, i.e., wk= wJjfor k ∈ Jj, j = 1,···,r. The
set ˆ w = {wJ1,...,wJr} is clearly a certificate of {TJ1,...,TJr}. It is easy to verify
that φ(w) =ˆφ( ˆ w), whereˆφ is the isomorphic mapping from the certificate space of
{TJ1,...,TJr} to the null space of the alignment matrix?Φ. Thus v = φ(w) =ˆφ( ˆ w) ∈
The above theorem says that merging sections with connected associated graphs
does not change the null space of the alignment matrix. Equivalently, the affine
rigidity of the original sections can be detected from the affine rigidity of the resulting
super-sections. This fact motivates us to consider the connectedness of the associated
graphˆG for the collection of the super-sections {TJ1,···,TJr}, where there is an edge
between two vertices if the associated super sections are fully overlapped. We callˆG
a coarsening of G. By Theorem 5.7 and Theorem 5.8, {T1,...,Ts} is affinely rigid
ifˆG is connected. This coarsening procedure can be repeated, i.e., by finding the
connected components ofˆG and so on. This coarsening procedure terminates, if
1) the current graph has only one vertex, or
2) the current graph has two or more vertices and all vertices are isolated.
We call the graph obtained in the last step of the above coarsening procedure the
coarsest graph and denote it by G∗. We also use |G| to denote the number of vertices
in a graph G. One can easily prove the following result by Theorems 5.5 and 5.7.
Theorem 5.9. Let G∗be the coarsest graph of the collection {T1,...,Ts}. Then
(1) {T1,...,Ts} is affinely rigid if |G∗| = 1, and
(2) {T1,...,Ts} is not affinely rigid if |G∗| = 2, or if |G∗| = 3 and d = 1.
share a common null space.
Proof. Consider a null vector v of Φ, v = φ(w) with a certificate w = {w1,...,ws}
N(?Φ). On the other hand, any null vector of?Φ also belongs to N(Φ).
Page 13
13
Proof. We just prove that if d = 1 and |G∗| = 3, then {T1,T2,T3} is not affinely
rigid. We show this by constructing a non-trivial certificate for T.
Without loss of generality, we assume that the intersection Tij between Tiand
Tj is not empty for i ?= j. Since Ti and Tj are not fully overlapped for i ?= j,
rank([e,TT
Now for the construction of a non-trivial certificate w = {w1,w2,w3}, we can
assume w3= 0 without loss of generality. Thus, w = {w1,w2,w3} is non-trivial if and
only if either w1or w2is not zero. The conditions given in (5.3) now state
ij]) < d + 1 = 2 and hence rank([e,TT
ij]) = 1.
[e,TT
12]w1= [e,TT
12]w2,
[e,TT
23]w2= 0,
[e,TT
31]w1= 0.
We rewrite the equations in the following matrix form:
[e,TT
12][e,TT
[e,TT
12]
23]
0
[e,TT
13]0
?w1
−w2
?
= 0.
(5.5)
Because the rank of the coefficient matrix is less than or equal to three and its column
number is no less than four, the above linear equations have a nonzero solution
Therefore, a non-trivial certificate exists for the collection {T1,T2,T3}. By Theorem
5.2, {T1,T2,T3} is not affinely rigid.
Unfortunately, we still cannot conclude that the original collection is not affinely
rigid for the more general case |G∗| > 3. Here is a counterexample with d = 1 from
Example 3.
Example 4. We change the first section in Example 3 by adding the last point
to it, and keep other sections unchanged,
?
w1
−w2
?
.
T1= [−3,−2,−1,3],
Any two sections in the collection are not fully overlapped since the size of each
intersection Tijis one and [e,TT
there are no edges in the associated graph, i.e., G∗= G and |G∗| = 4. However, the
collection is still affinely rigid.
Appendix B shows the existence of an affinely rigid collection with |G∗| = s for
any s and d satisfying 3 ≤ s ≤ d + 1. Of course, one can also construct a collection
with |G∗| = s that is not affinely rigid. Appendix C gives geometric conditions for
affinely rigid collections with |G∗| = 3 and d = 2.
5.5. Merging sections. In the last subsection, we discuss a coarsening pro-
cedure that involves merging connected components, i.e., merging the sections in a
connected component into a super-section. This kind of coarsening procedure pre-
serves the null space (cf. Theorem 5.8). In this subsection, we further discuss the
merging process with regard to: 1) merging components that are not necessarily con-
nected; and 2) merging sections that do not form a connected component but the
corresponding sub-collection is affinely rigid. We will show that for 1) the size of null
space does not increase while for 2) the null space remains unchanged.
Theorem 5.10. Let Φ and?Φ be the two alignment matrices of {T1,...,Ts} and
Jj}, j = 1,...,t. Then N(?Φ) ⊆ N(Φ).
It can be split to a certificate w = {w1,...,ws} of {T1,...,Ts} with wk= wJj, for
T2= [−1,0,1],T3= [1,2,3],T4= [−2,0,2].
ij] is a 1×2 matrix that is not of full collum rank. So
{TJ1,...,TJt}, respectively, where TJjis the super-section merging sections {Ti, i ∈
Proof. Given a certificate ˆ w = {wJ1,...,wJt} for the collection {TJ1,...,TJt}.
Page 14
14
k ∈ Jj, j = 1,...,t. By definition, φ(w) =ˆφ( ˆ w) with the isomorphic mappings φ and
ˆφ as defined in (5.4). So each null vector φ( ˆ w) of?Φ is also a null vector of Φ, i.e.,
Theorem 5.10 suggests that one can modify the alignment matrix Φ by merging
the sections in order to push the null space to the desired subspace span([e,TT]). For
example, if merge any two sections given in Example 3, the graph of the resulting
sections can be recursively coarsened to a connected graph and hence for the modified
?Φ, N(?Φ) = span([e,TT]) holds by Theorem 5.9.
change the null space of the alignment matrix. It generalizes Theorem 5.8 slightly
and has a similar proof which will not be repeated here.
Theorem 5.11. Let Φ and?Φ be the two alignment matrices of {T1,...,Ts} and
j = 1,...,t, then N(?Φ) = N(Φ).
well N(Φ) can be determined numerically depends on the magnitude of its smallest
nonzero eigenvalue(s). This has significant ramifications when we need to use eigen-
vectors of an approximation of Φ corresponding to small eigenvalues to recover the
parameter vectors {τi}. Theorem 4.2 gives further elaboration in this regard. The
objective of this section is to establish bounds for the smallest nonzero eigenvalue.
We first give a characterization of the smallest nonzero eigenvalue λ+
Theorem 6.1. Let Φi= QiQT
span([e,TT
Let H = (Hij) be a block matrix with blocks
Hij= (SiQi)T(SjQj), Siis the selection matrix for Ti. Then λ+
Furthermore, if s = 2, then
N(?Φ) ⊆ N(Φ).
The following theorem shows that merging affinely rigid sub-collections cannot
{TJ1,...,TJt}, respectively. If each sub-collection {Ti, i ∈ Jj} is affinely rigid for
6. The Smallest Nonzero Eigenvalue(s) of the Alignment Matrix. How
min(Φ) of Φ.
ibe the orthogonal projections such that N(Φi) =
i]) and Qi orthonormal.
min(Φ) = λ+
min(H).
λ+
min(Φ) = 1 − max{σ : σ ∈ σ(H12), σ < 1},
where σ(·) denotes the set of singular values of a matrix.
Proof. Let R = [S1Q1,···,SsQs]. By definition, Φ = RRTand H = RTR. It is
well known that Φ and H have the same nonzero eigenvalues, since 1) the eigenvalue
equation RRTx = λx implies RTRy = λy with y = RTx ?= 0, while RTRy = λy
yields RRTz = λz with z = Ry, and 2) the condition λ ?= 0 guarantees that x, y, and
z are nonzero simultaneously. So λ+
?
?
are the singular values of H12[6]. Assume that 1 = σ1= ... = σ?> σ?+1≥ ··· ≥
σd+1, then λ+
6.1. Submatrices Hij of H. Now we focus on the matrix H, and proceed to
derive an expression of its submatrices Hij= (SiQi)T(SjQj) = QT
by Tc
Ti= [Tc
We first derive an expression for Qi which will allow us to relate the centered
matrix Tc
partition
?
min(Φ) = λ+
min(H).
H12
0
For the case when s = 2, H = I +
?
0
HT
12
?
. Notice that the eigenvalues of
0
H12
0
HT
12
are given by {σ1,...,σd+1,−σ1,...,−σd+1}, where σ1≥ ... ≥ σd+1
min(H) = 1 − σ?+1.
iST
iSjQj. Denote
ijthe remainder of Ti by deleting Tij.
ij,Tij].
Without loss of generality, we write
ij− tijeTto Hij, here tijis the mean of the columns in Tij. To this end, we
[e,(Ti− tijeT)T] =
(e,(Tc
(e,(Tij− tijeT)T)
ij− tijeT)T)
?
≡
?
Bc
Bij
ij
?
Page 15
15
and split it as [e,(Ti− tijeT)T] = A1+ A2, where
A1= [e,(Ti− tijeT)T]B†
A2= [e,(Ti− tijeT)T]?I − B†
It is known that Qiis orthogonal to [e,TT
tijeT)T], or equivalently, Qi is orthogonal to both A1 and A2 since span([e,(Ti−
tijeT)T]) = span(A1) ∪ span(A2). Because of the structures of A1and A2, one can
construct such a Qias follows: Let Vijbe an orthogonal basis matrix of the orthogonal
complement space of Bij, and Vian orthogonal basis matrix of the subspace orthogonal
to Bc
ij
?0
are orthogonal to both A1and A2. Note that C1and C2are orthogonal to each other
since the columns of (B†
and C2can be normalized by multiplying it with Di= (I+VT
from the right. So we can set Qito be the following orthonormal matrix,
?
Vij
ijBij=
?
Bc
ijB†
Bij
?
ijBij
?
,
ijBij
?=
Bc
ij
?I − B†
ijBij
?
0
?
.
i] if and only if Qiis orthogonal to [e,(Ti−
?I − B†
ijBij
?. Then the two matrices
C1=
Vij
?
and
C2=
?
−Vi
ijB†
(Bc
ij)TVi
?
ij)Tare still in the range space of Bij. C1is also orthonormal
i(Bc
ijB†
ij)(Bc
ijB†
ij)TVi)−1/2
Qi= [C1,C2Di] =
0
−ViDi
ijB†
(Bc
ij)TViDi
?
,
It follows that Qij= [Vij, (Bc
Bc
B†
QT
ijB†
ij)TViDi]. Similarly, Qji= [Vij, (Bc
jiB†
ij)TVjDj], where
ji= [e,(Tc
ijBij), and Dj = (I + VT
ijQjias
ji− tijeT)T], Vjis the basis matrix of the subspace orthogonal to Bc
j(Bc
ji(I −
jiB†
ij)(Bc
jiB†
ij)TVj)−1/2. Now we can represent Hij =
Hij=
?
(ViDi)TBc
ijB†
ij(Bc
jiB†
ij)TVjDj
Iij
?
≡
?
Pij
Iij
?
.
Note that
?Pij?2
2≤
?Bc
ijB†
ijB†
ij?2
ij?2
2?Bc
2)(1 + ?Bc
jiB†
ij?2
jiB†
2
(1 + ?Bc
ij?2
2)
< 1.
(6.1)
Therefore, the singular values of Hijless than one consist of the singular values of Pij.
6.2. Estimation of the singular values of Pij. The matrix B†
?
since the first column of Bij= [e,(Tij−tijeT)T] is orthogonal to the other columns. It
follows that Bc
the smallest nonzero singular value of Tij− tijeT. We obtain
?Bc
≤ 1 + ?(Tij− tijeT)†?2
= 1 + ?Tc
ijis given by
B†
ij=
e†
(Tij− tijeT)T†
?
,
ijB†
ij= ee†+?(Tij−tijeT)†(Tc
ijB†
ij−tijeT)?T. Define ηij= σmin(Tij−tijeT),
2≤ 1 + ?(Tij− tijeT)†(Tc
2?Tc
ij− tijeT?2
ij?2
ij− tijeT)?2
ij− tijeT?2
2/η2
2
2
ij.
View other sources
Hide other sources
-
Available from Zhenyue Zhang · 7 Feb 2013
-
Available from gatech.edu