Joint manifolds for data fusion.
ABSTRACT The emergence of low-cost sensing architectures for diverse modalities has made it possible to deploy sensor networks that capture a single event from a large number of vantage points and using multiple modalities. In many scenarios, these networks acquire large amounts of very high-dimensional data. For example, even a relatively small network of cameras can generate massive amounts of high-dimensional image and video data. One way to cope with this data deluge is to exploit low-dimensional data models. Manifold models provide a particularly powerful theoretical and algorithmic framework for capturing the structure of data governed by a small number of parameters, as is often the case in a sensor network. However, these models do not typically take into account dependencies among multiple sensors. We thus propose a new joint manifold framework for data ensembles that exploits such dependencies. We show that joint manifold structure can lead to improved performance for a variety of signal processing algorithms for applications including classification and manifold learning. Additionally, recent results concerning random projections of manifolds enable us to formulate a scalable and universal dimensionality reduction scheme that efficiently fuses the data from all sensors.
-
Citations (0)
-
Cited In (0)
Page 1
Joint Manifolds for Data Fusion
Mark A. Davenport, Student Member, IEEE, Chinmay Hegde, Student Member, IEEE
Marco F. Duarte, Member, IEEE, and Richard G. Baraniuk, Fellow, IEEE
Abstract
The emergence of low-cost sensing architectures for diverse modalities has made it possible to deploy
sensor networks that capture a single event from a large number of vantage points and using multiple
modalities. In many scenarios, these networks acquire large amounts of very high-dimensional data. For
example, even a relatively small network of cameras can generate massive amounts of high-dimensional
image and video data. One way to cope with such a data deluge is to develop low-dimensional data
models. Manifold models provide a particularly powerful theoretical and algorithmic framework for
capturing the structure of data governed by a low-dimensional set of parameters, as is often the case
in a sensor network. However, these models do not typically take into account dependencies among
multiple sensors. We thus propose a new joint manifold framework for data ensembles that exploits such
dependencies. We show that joint manifold structure can lead to improved performance for a variety of
signal processing algorithms for applications including classification and manifold learning. Additionally,
recent results concerning random projections of manifolds enable us to formulate a network-scalable
dimensionality reduction scheme that efficiently fuses the data from all sensors.
I. INTRODUCTION
The emergence of low-cost sensing devices has made it possible to deploy sensor networks that capture
a single event from a large number of vantage points and using multiple modalities. This can lead to a
veritable data deluge, fueling the need for efficient algorithms for processing and efficient protocols for
transmitting the data generated by such networks. In order to address these challenges, there is a clear
need for a theoretical framework for modeling the complex interdependencies among signals acquired by
these networks. This framework should support the development of efficient algorithms that can exploit
this structure and efficient protocols that can cope with the massive data volume.
MAD, CH, and RGB are with the Department of Electrical and Computer Engineering, Rice University, Houston, TX. MFD
is with the Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ. This work was supported
by grants NSF CCF-0431150 and CCF-0728867, DARPA/ONR N66001-08-1-2065, ONR N00014-07-1-0936 and N00014-08-
1-1112, AFOSR FA9550-07-1-0301, ARO MURI W311NF-07-1-0185, and the TI Leadership Program. Thanks to J.P. Slavinsky
for his help in acquiring the data for the experimental results presented in this paper.
September 30, 2009 DRAFT
Page 2
Consider, for example, a camera network consisting of J video acquisition devices each acquiring
N-pixel images of a scene simultaneously. Ideally, all cameras would send their raw recorded images
to a central processing unit, which could then holistically analyze all the data produced by the network.
This na¨ ıve approach would in general provide the best performance, since it exploits complete access
to all of the data. However, the amount of raw data generated by a camera network, on the order of
JN, becomes untenably large even for fairly small networks operating at moderate resolutions and frame
rates. In such settings, the amount of data can (and often does) overwhelm network resources such as
power and communication bandwidth. While the na¨ ıve approach could easily be improved by requiring
each camera to first compress the images using a compression algorithm such as JPEG or MPEG, this
modification still fails to exploit any interdependencies between the cameras. Hence, the total power and
bandwidth requirements of the network will still grow linearly with J.
Alternatively, exploiting the fact that in many cases the end goal is to solve some kind of inference
problem, each camera could independently reach a decision or extract some relevant features, and then
relay the result to the central processing unit which would then combine the results to provide the solution.
Unfortunately, this approach also has disadvantages. First, the cameras must be “smart” in that they must
possess some degree of sophistication so that they can execute nonlinear inference tasks. Such technology
is expensive and can place severe demands on the available power resources. Perhaps more importantly,
the total power and bandwidth requirement will still scale linearly with J.
In order to cope with such high-dimensional data, a common strategy is to develop appropriate models
for the acquired images. A powerful model is the geometric notion of a low-dimensional manifold. We
will provide a more formal definition of a manifold in Section II, but informally manifold models arise
in cases where (i) a K-dimensional parameter θ can be identified that carries the relevant information
about a signal and (ii) the signal f(θ) ∈ RNchanges as a continuous (typically nonlinear) function of
these parameters. Typical examples include a one-dimensional (1-D) signal shifted by an unknown time
delay (parameterized by the translation variable), a recording of a speech signal (parameterized by the
underlying phonemes spoken by the speaker), and an image of a 3-D object at an unknown location
captured from an unknown viewing angle (parameterized by the 3-D coordinates of the object and its
roll, pitch, and yaw). In these and many other cases, the geometry of the signal class forms a nonlinear
K-dimensional manifold in RN,
M = {f(θ) : θ ∈ Θ},
(1)
where Θ is the K-dimensional parameter space. In recent years, researchers in image processing have
2
Page 3
become increasingly interested in manifold models due to the observation that a collection of images
obtained from different target locations/poses/illuminations and sensor viewpoints form such a man-
ifold [1]–[3]. As a result, manifold-based methods for image processing have attracted considerable
attention, particularly in the machine learning community and can be applied to diverse applications
as data visualization, classification, estimation, detection, control, clustering, and learning [3]–[5]. Low-
dimensional manifolds have also been proposed as approximate models for a number of nonparametric
signal classes such as images of human faces and handwritten digits [6]–[8].
In sensor networks, multiple observations of the same event are often acquired simultaneously, resulting
in the acquisition of interdependent signals that share a common parameterization. Specifically, a camera
network might observe a single event from a variety of vantage points, where the underlying event is
described by a set of common global parameters (such as the location and orientation of an object of
interest). Similarly, when sensing a single phenomenon using multiple modalities, such as video and
audio, the underlying phenomenon may again be described by a single parameterization that spans all
modalities (such as when analyzing a video and audio recording of a person speaking, where both are
parameterized by the phonemes being spoken). In both examples, all of the acquired signals are functions
of the same set of parameters, i.e., we can write each signal as fj(θ) where θ ∈ Θ is the same for all j.
Our contention in this paper is that we can obtain a simple model that captures the correlation between
the sensor observations by matching the parameter values for the different manifolds observed by the
sensors. More precisely, we observe that by simply concatenating points that are indexed by the same pa-
rameter value θ from the different component manifolds, i.e., by forming f(θ) = [f1(θ),f2(θ),...,fJ(θ)],
we obtain a new manifold, which we dub the joint manifold, that encompasses all of the component
manifolds and shares the same parameterization. This structure captures the interdependencies between
the signals in a straightforward manner. We can then apply the same manifold-based processing techniques
that have been proposed for individual manifolds to the entire ensemble of component manifolds.
In this paper we conduct a careful examination of the topological and geometrical properties of joint
manifolds; in particular, we compare joint manifolds to their component manifolds to see how properties
like geodesic distances, curvature, branch separation, and condition number are affected. We then observe
that these properties lead to improved performance and noise-tolerance for a variety of signal processing
algorithms when they exploit the joint manifold structure. As a key advantage of our proposed model,
we illustrate how the joint manifold structure can be exploited via a simple and efficient data fusion
algorithm based on random projections. For the case of J cameras jointly acquiring N-pixel images of a
3
Page 4
common scene characterized by K parameters, we demonstrate that the total power and communication
bandwidth required by our scheme is linear in the dimension K and only logarithmic in J and N. This
technique resembles the acquisition framework proposed in compressive sensing (CS) [9], [10]; in fact,
prototypes of inexpensive sensing hardware that directly acquires random projections of images have
already been built [11].
Related prior work has studied manifold alignment, where the goal is to discover maps between datasets
that are governed by the same underlying low-dimensional structure. Lafon et al. proposed an algorithm
to obtain a one-to-one matching between data points from several manifold-modeled classes [12]. The
algorithm first applies dimensionality reduction using diffusion maps to obtain data representations that
encode the intrinsic geometry of the class. Then, an affine function that matches a set of landmark
points is computed and applied to the remainder of the datasets. This concept was extended by Wang
and Mahadevan, who applied Procrustes analysis on the dimensionality-reduced datasets to obtain an
alignment function between a pair of manifolds [13]. Since an alignment function is provided instead of
a data point matching, the mapping obtained is applicable for the entire manifold rather than for the set
of sampled points. In our setting, we assume that either (i) the manifold alignment is implicitly present,
for example, via synchronization between the different sensors, or (ii) the manifolds have been aligned
using one of these approaches. Our main focus is an analysis of the benefits provided by analyzing the
joint manifold versus solving the task of interest separately on each of the manifolds. For concreteness,
but without loss of generality, we couch our analysis in the language of camera networks, although much
of our theory is sufficiently generic so as to apply to a variety of other scenarios.
This paper is organized as follows. Section II introduces and establishes some basic properties of
joint manifolds. Section III provides discussion of practical examples of joint manifolds in the camera
network setting and describes how to use random projections to exploit the joint manifold structure in such
a setting. Sections IV and V then consider the application of joint manifolds to the tasks of classification
and manifold learning, providing both a theoretical analysis as well as extensive simulations. Section VI
concludes with a brief discussion.
II. JOINT MANIFOLDS: THEORY
In this section we develop a theoretical framework for ensembles of manifolds that are jointly param-
eterized by a small number of common degrees of freedom. Informally, we propose a data structure for
jointly modeling such ensembles; this is obtained simply by concatenating points from different ensembles
that are indexed by the same articulation parameter to obtain a single point in a higher-dimensional space.
4
Page 5
First we recall the definition of a general topological manifold. Specifically, a K-dimensional manifold
M is a topological space that is (i) second countable, (ii) Hausdorff, and (iii) locally homeomorphic to
RK. The first two conditions are rather technical, but they are always satisfied when M is embedded
in RN. The heart of the definition is the third requirement, which essentially says that any sufficiently
small region on the manifold behaves like RK. A comprehensive introduction to topological manifolds
can be found in [14].
We now define the joint manifold for the setting of general topological manifolds. In order to simplify
our notation, we will let M = M1× M2× ··· × MJ denote the product manifold. Furthermore, we
will use the notation p = (p1,p2,...,pJ) to denote a J-tuple of points, or concatenation of J points,
which lies in the Cartesian product of J sets (e.g., M)
Definition 1. Let {Mj}J
that the manifolds are homeomorphic to each other, in which case there exists a homeomorphism ψj
j=1be an ensemble of J topological manifolds of equal dimension K. Suppose
between M1and Mjfor each j. For a particular set of {ψj}J
j=2, we define the joint manifold as
M∗= {p ∈ M : pj= ψj(p1),2 ≤ j ≤ J}.
Furthermore, we say that {Mj}J
j=1are the corresponding component manifolds.
Note that M1 serves as a common parameter space for all the component manifolds. Since the
component manifolds are homeomorphic to each other, this choice is ultimately arbitrary. In practice
it may be more natural to think of each component manifold as being homeomorphic to some fixed
K-dimensional parameter space Θ. However, in this case one could still define M∗as is done above by
defining ψjas the composition of the homeomorphic mappings from M1to Θ and from Θ to Mj.
As an example, consider the one-dimensional manifolds in Figure 1. Figures 1(a) and (b) show two
isomorphic manifolds, where M1= (0,2π) is an open interval, and M2= {ψ2(θ) : θ ∈ M1} where
ψ2(θ) = (cos(θ),sin(θ)), i.e., M2= S1\(1,0) is a circle with one point removed (so that it remains
isomorphic to a line segment). In this case the joint manifold M∗= {(θ,cos(θ),sin(θ)) : θ ∈ (0,2π)},
illustrated in Figure 1(c), is a helix. Notice that there exist other possible homeomorphic mappings from
M1 to M2, and that the precise structure of the joint manifold as a submanifold of R3is heavily
dependent on the choice of this mapping.
Returning to the definition of M∗, observe that although we have called M∗the joint manifold, we
have not shown that it actually forms a topological manifold. To prove that M∗is indeed a manifold,
5
Page 6
(a) M1⊆ R: line segment
A pair of isomorphic manifolds M1and M2, and the resulting joint manifold M∗.
(b) M2⊆ R2: circle segment(c) M∗⊆ R3: helix segment
Fig. 1.
we will make use of the fact that the joint manifold is a subset of the product manifold M. One can
show that M forms a JK-dimensional manifold using the product topology [14]. By comparison, we
now show that M∗has dimension only K.
Proposition 1. M∗is a K-dimensional submanifold of M.
Proof: We first observe that since M∗⊂ M, we automatically have that M∗is a second countable
Hausdorff topological space. Thus, all that remains is to show that M∗is locally homeomorphic to RK.
Suppose p ∈ M∗. Since p1∈ M1, we have a pair (U1,φ1) such that U1⊂ M1is an open set containing
p1and φ1: U1→ V is a homeomorphism where V is an open set in RK. We now define for 2 ≤ j ≤ J
Uj = ψj(U1) and φj = φ1◦ ψ−1
homeomorphism (since ψjis a homeomorphism).
j
: Uj → V . Note that for each j, Uj is an open set and φj is a
Now set U = U1× U2× ··· × UJ and define U∗= U ∩ M∗. Observe that U∗is an open set and
that p ∈ U∗. Furthermore, let q be any element of U∗. Then φj(qj) = φ1◦ ψ−1
2 ≤ j ≤ J. Thus, since the image of each qj∈ Uj in V under their corresponding φj is the same, we
can form a single homeomorphism φ∗: U∗→ V by assigning φ∗(q) = φ1(q1). This shows that M∗is
locally homeomorphic to RKas desired.
Since M∗is a submanifold of M, it also inherits some desirable properties from {Mj}J
j(qj) = φ1(q1) for each
j=1.
Proposition 2. Suppose that {Mj}J
1) If {Mj}J
2) If {Mj}J
j=1are isomorphic topological manifolds and M∗is defined as above.
j=1are Riemannian, then M∗is Riemannian.
j=1are compact, then M∗is compact.
Proof: The proofs of these facts are straightforward and follow from the fact that if the component
manifolds are Riemannian or compact, then M will be as well. M∗then inherits these properties as a
6
Page 7
submanifold of M [14].
Up to this point we have considered general topological manifolds. In particular, we have not assumed
that the component manifolds are embedded in any particular space. If each component manifold Mj
is embedded in RNj, the joint manifold is naturally embedded in RN∗where N∗=?J
the joint manifold can be viewed as a model for sets of data with varying ambient dimension linked
j=1Nj. Hence,
by a common parametrization. In the sequel, we assume that each manifold Mj is embedded in RN,
which implies that M∗⊂ RJN. Observe that while the intrinsic dimension of the joint manifold remains
constant at K, the ambient dimension increases by a factor of J. We now examine how a number of
geometric properties of the joint manifold compare to those of the component manifolds.
We begin with the following simple observation that Euclidean distances1between points on the joint
manifold are larger than distances on the component manifolds. The result follows directly from the
definition of the Euclidean norm, so we omit the proof.
Proposition 3. Let p,q ∈ M∗be given. Then
?p − q? =
?
?
?
?
J
?
j=1
?pj− qj?2.
While Euclidean distances are important (especially when noise is introduced), the natural measure of
distance between a pair of points on a Riemannian manifold is not Euclidean distance, but rather the
geodesic distance. The geodesic distance between points p,q ∈ M is defined as
dM(p,q) = inf{L(γ) : γ(0) = p,γ(1) = q},
(2)
where γ : [0,1] → M is a C1-smooth curve joining p and q, and L(γ) is the length of γ as measured by
?1
In order to see how geodesic distances on M∗compare to geodesic distances on the component manifolds,
we will make use of the following lemma.
L(γ) =
0
?˙ γ(t)?dt.
(3)
Lemma 1. Suppose that {Mj}J
smooth curve on the joint manifold. Denote by γjthe restriction of γ to the ambient dimensions of M∗
j=1are Riemannian manifolds, and let γ : [0,1] → M∗be a C1-
1In the remainder of this paper, whenever we use the notation ? · ? we mean ? · ??2, i.e., the ?2 (Euclidean) norm on RN.
When we wish to differentiate this from other ?p norms, we will be explicit.
7
Page 8
corresponding to Mj. Then each γj: [0,1] → Mjis a C1-smooth curve on Mj, and
1
√J
j=1
J
?
L(γj) ≤ L(γ) ≤
J
?
j=1
L(γj).
Proof: We begin by observing that
L(γ) =
?1
0
?˙ γ(t)?dt =
?1
0
?
?
?
?
J
?
j=1
? ˙ γj(t)?2dt.
(4)
For a fixed t, let xj= ? ˙ γj(t)?, and observe that x = (x1,x2,...,xJ) is a vector in RJ. Thus we may
apply the standard norm inequalities
1
√J?x??1≤ ?x??2≤ ?x??1
(5)
to obtain
1
√J
J
?
j=1
? ˙ γj(t)? ≤
?
?
?
?
J
?
j=1
? ˙ γj(t)?2≤
J
?
j=1
? ˙ γj(t)?.
(6)
Combining the right-hand side of (6) with (4) we obtain
L(γ) ≤
?1
0
J
?
j=1
? ˙ γj(t)?dt =
J
?
j=1
?1
0
? ˙ γj(t)?dt =
J
?
j=1
L(γj).
Similarly, from the left-hand side of (6) we obtain
?1
L(γ) ≥
0
1
√J
J
?
j=1
? ˙ γj(t)?dt =
1
√J
J
?
j=1
?1
0
? ˙ γj(t)?dt =
1
√J
J
?
j=1
L(γj).
We are now in a position to compare geodesic distances on M∗to those on the component manifold.
Theorem 1. Suppose that {Mj}J
j=1are Riemannian manifolds. Let p,q ∈ M∗be given. Then
1
√J
j=1
dM∗(p,q) ≥
J
?
dMj(pj,qj).
(7)
If the mappings ψ2,ψ3,...,ψJ are isometries, i.e., dM1(p1,q1) = dMj(ψj(p1),ψj(q1)) for any j and
for any pair of points (p,q), then
dM∗(p,q) =
1
√J
J
?
j=1
dMj(pj,qj) =
√J · dM1(p1,q1).
(8)
Proof: If γ is a geodesic path between p and q, then from Lemma 1,
dM∗(p,q) = L(γ) ≥
1
√J
J
?
j=1
L(γj).
8
Page 9
By definition L(γj) ≥ dMj(pj,qj); hence, this establishes (7).
Now observe that lower bound in Lemma 1 is derived from the lower inequality of (5). This inequality
is attained with equality if and only if each term in the sum is equal, i.e., L(γj) = L(γk) for all j and
k. This is precisely the case when ψ2,ψ3,...,ψJare isometries. Thus we obtain
dM∗(p,q) = L(γ) =
1
√J
J
?
j=1
L(γj) =
√JL(γ1).
We now conclude that L(γ1) = dM1(p1,q1) since if we could obtain a shorter path ˜ γ1from p1to q1this
would contradict the assumption that γ is a geodesic on M∗, which establishes (8).
Next, we study local smoothness and global self avoidance properties of the joint manifold.
Definition 2. [15] Let M be a Riemannian submanifold of RN. The condition number is defined as
1/τ, where τ is the largest number satisfying the following: the open normal bundle about M of radius
r is embedded in RNfor all r < τ.
The condition number controls both local smoothness properties and global properties of the manifold;
as 1/τ becomes smaller, the manifold becomes smoother and more self-avoiding, as observed in [15].
Lemma 2. [15] Suppose M has condition number 1/τ. Let p,q ∈ M be two distinct points on M,
and let γ(t) denote a unit speed parameterization of the geodesic path joining p and q. Then
max
t
?¨ γ(t)? ≤1
τ.
Lemma 3. [15] Suppose M has condition number 1/τ. Let p,q ∈ M be two points on M such that
?p − q? = d. If d ≤ τ/2, then the geodesic distance dM(p,q) is bounded by
dM(p,q) ≤ τ(1 −
?
1 − 2d/τ).
We wish to show that if the component manifolds are smooth and self avoiding, the joint manifold
is as well. It is not easy to prove this in the most general case, where the only assumption is that there
exists a homeomorphism (i.e., a continuous bijective map ψ) between every pair of manifolds. However,
suppose the manifolds are diffeomorphic, i.e., there exists a continuous bijective map between tangent
spaces at corresponding points on every pair of manifolds. In that case, we make the following assertion.
Theorem 2. Suppose that {Mj}J
condition number of Mj. Suppose also that the {ψj}J
j=1are Riemannian submanifolds of RN, and let 1/τj denote the
j=2that define the corresponding joint manifold
9
Page 10
M∗are diffeomorphisms. If 1/τ∗is the condition number of M∗, then
1
τ∗≤ max
1≤j≤J
1
τj.
Proof: Let p ∈ M∗. Since the {ψj}J
morphic to M1; i.e., we can build a diffeomorphic map from M1to M∗as
j=2are diffeomorphisms, we may view M∗as being diffeo-
p = ψ∗(p1) := (p1,ψ2(p2),...,ψJ(pJ)).
We also know that given any two manifolds linked by a diffeomorphism ψj: M1→ Mj, each vector
v1in the tangent space T1(p1) of the manifold M1at the point p1is uniquely mapped to a tangent vector
vj:= φj(v1) in the tangent space Tj(pj) of the manifold Mjat the point pj= ψj(p1) through the map
φj:= J ◦ ψj(p1) , where J denotes the Jacobian operator.
Consider the application of this property to the diffeomorphic manifolds M1and M∗. In this case,
the tangent vector v1 ∈ T1(p1) to the manifold M1 can be uniquely identified with a tangent vector
v = φ∗(v1) ∈ T∗(p) to the manifold M∗. This mapping is expressed as
φ∗(v1) = J ◦ ψ∗(p1) = (v1,J ◦ ψ2(p1),...,J ◦ ψJ(p1)),
since the Jacobian operates componentwise. Therefore, the tangent vector v can be written as
v = φ∗(v1) = (v1,φ2(v1),...,φJ(p1)).
In other words, a tangent vector to the joint manifold can be decomposed into J component vectors,
each of which are tangent to the corresponding component manifolds.
Using this fact, we now show that a vector η that is normal to M∗can also be broken down into
sub-vectors that are normal to the component manifolds. Consider p ∈ M∗, and denote T∗(p)⊥as the
normal space at p. Suppose η ∈ T∗(p)⊥. Decompose each ηjas a projection onto the component tangent
and normal spaces, i.e., for j = 1,...,J,
ηj= xj+ yj,xj∈ Tj(pj), yj∈ Tj(pj)⊥.
such that ?xj,yj? = 0 for each j. Then η = x + y, and since x is tangent to the joint manifold M∗,
we have ?η,x? = ?x + y,x? = 0, and thus ?y,x? = −?x?2. But, ?y,x? =?J
x = 0, i.e., each ηjis normal to Mj.
Armed with this last fact, our goal now is to show that if r < min1≤j≤Jτj then the normal bundle
of radius r is embedded in RN, or equivalently, for any p,q ∈ M∗, that p + η ?= q + ν provided that
j=1?yj,xj? = 0. Hence
10
Page 11
Fig. 2.
Point at which the normal bundle for the helix manifold from Figure 1(c) intersects itself. Note that the helix
has been slightly rotated.
?η?,?ν? ≤ r. Indeed, suppose ?η?,?ν? ≤ r < min1≤j≤Jτj. Since ?ηj? ≤ ?η? and ?νj? ≤ ?ν? for all
1 ≤ j ≤ J, we have that ?ηj?,?νj? < min1≤i≤Jτi≤ τj. Since we have proved that ηj,νjare vectors in
the normal bundle of Mjand their magnitudes are less than τj, then pj+ηj?= qj+νjby the definition
of condition number. Thus p + η ?= q + ν and the result follows.
This result states that for general manifolds, the most we can say is that the condition number of the
joint manifold is guaranteed to be less than that of the worst manifold. However, in practice this is not
likely to happen. As an example, Figure 2 illustrates the point at which the normal bundle intersects itself
for the case of the joint manifold from Figure 1(c). In this case we obtain τ∗=
?π2/2 + 1. Note that
the condition numbers for the manifolds M1and M2generating M∗are given by τ1= ∞ and τ2= 1.
Thus, while the condition number in this case is not as good as the best manifold, it is still notably better
than the worst manifold. In general, even this example may be somewhat pessimistic, and it is possible
that in many cases the joint manifold may be better conditioned than even the best manifold.
III. JOINT MANIFOLDS: PRACTICE
As noted in the Introduction, a number of algorithms exploit manifold models for signal processing
tasks such as pattern classification, estimation, detection, control, clustering, and learning [3]–[5]. The
performance of these algorithms often depends on geometric properties of the manifold model, such as its
condition number and geodesic distances along its surface. The theory developed in Section II suggests
that the joint manifold preserves or improves these properties. In Sections IV and V we consider two
possible applications and observe that when noise is introduced, it can be extremely beneficial to use
algorithms specifically designed to exploit the joint manifold structure. However, before we address these
particular applications, we must first address some key practical concerns.
11
Page 12
A. Examples of joint manifolds
Many of our results assume that the component manifolds are isometric to each other. This may seem
an undue burden, but in fact this requirement is fulfilled by manifolds that are isometric to the parameter
space that governs them, a class of manifolds that has been studied in [2]. Many examples from this
class correspond to common image articulations that occur in vision applications, including:
• articulations of radially symmetric images, which are parameterized by a 2-D offset;
• articulations of four-fold symmetric images with smooth boundaries, such as discs, ?pballs, etc.;
• pivoting of images containing smooth boundaries, which are parameterized by the pivoting angle;
• articulations of
dimensional manifold.
K
2discs over distinct non-overlapping regions, with
K
2
> 1, producing a K-
These examples can be extended to objects with piecewise smooth boundaries as well as to video
sequences corresponding to different paths for the articulation/pivoting.
B. Acceptable deviations from theory
While manifolds are a natural way to model the structure of a set of images governed by a small
number of parameters, the results in Section II make a number of assumptions concerning the structure
of the component manifolds. In the most general case, we assume that the component manifolds are
homeomorphic to each other. This means that between any pair of component manifolds there should
exist a bijective mapping φ such that both φ and φ−1are continuous. This assumption assures us that
the joint manifold is indeed a topological manifold. Unfortunately, this excludes some scenarios that can
occur in practice. For example this assumption might not be applicable to a camera network featuring
non-overlapping fields of view. In such camera networks, there are cases in which only some cameras are
sensitive to small changes in the parameter values. Strictly speaking, our theory may not apply in these
cases (since the joint “manifold” as we have defined it is not necessarily even a topological manifold).
We provide additional discussion of this issues in Section V-B below, but for now we simply note that in
Sections IV and V we conduct extensive experiments using both synthetic and experimental datasets and
observe that in practice the joint manifold-based processing techniques still exhibit significantly better
performance than techniques that operate on each component manifold separately. Thus, this violation of
our theory seems to be more technical than substantive in nature.
Additionally, in our results concerning the condition number, we assume that the component manifolds
are smooth, but the manifolds induced by the motion of an object where there are sharp edges or
12
Page 13
occlusions are nondifferentiable [3]. This problem can easily be addressed by applying a smoothing
kernel to the image, which induces a smooth manifold. In fact, that there exists a sequence of smooth
(regularized) manifolds (with decaying regularization parameter) that converge to any non-differentiable
image manifold [3]. More generally, if the cameras have limited processing capabilities, then it may
be possible to perform some simple processing tasks such as segmentation, background subtraction, and
illumination compensation that will make the manifold assumption more rigorously supported in practice.
C. Efficient data fusion via joint manifolds via random projections
Observe that when the number J and ambient dimension N of the manifolds become large, the ambient
dimension of the joint manifold — JN — may be so large that it becomes impossible to perform any
meaningful computations. Furthermore, it might appear that in order to exploit the joint manifold structure,
we must collect all the data, which we earlier claimed was a potentially impossible task.
Fortunately, we can transform the data into a more amenable form by obtaining random projections.
Specifically, it has been shown that the essential structure of a K-dimensional manifold with condition
number 1/τ residing in RNis approximately preserved under an orthogonal projection into a random
subspace of dimension O(K log(N/τ)) ? N [16]. This result has been leveraged in the design of
efficient algorithms for inference applications, such as classification using multiscale navigation [17],
intrinsic dimension estimation [18], and manifold learning [18].
In a camera network, for example, an immediate option would be to apply this result individually for
each camera. In particular, if the ambient dimension N of each of the J component manifolds is large,
then the above result would suggest that we should project the image acquired by each camera onto a
lower-dimensional subspace. By collecting this data at a central location, we would obtain J vectors,
each of dimension O(K logN), so that we would have to collect O(JK logN) total measurements.
This approach, however, essentially ignores the joint manifold structure present in the data. If we instead
view the data as arising from a K-dimensional joint manifold in RJNwith bounded condition number
as given by Theorem 2, then we can then project the joint data into a subspace that is only logarithmic
in J as well as the largest condition number among the components, and still approximately preserve
the manifold structure. This is formalized in the following theorem, which follows directly from [16].
Theorem 3. Let M∗be a compact, smooth, Riemannian joint manifold in a JN-dimensional space
with condition number 1/τ∗. Let Φ denote an orthogonal linear mapping from M∗into a random M-
dimensional subspace of RJN. Let M = O(K log(JN/τ∗)/?2). Then, with high probability, the geodesic
13
Page 14
and Euclidean distances between any pair of points on M∗are preserved up to distortion ? under Φ.
Thus, we obtain a faithful approximation of manifold-modeled data via a representation of dimension
only O(K logJN). This represents a significant improvement over the O(JK logN)-dimensional rep-
resentation obtained by performing separate dimensionality reduction on each component manifold and
a massive improvement over the original JN-dimensional representation consisting of all uncompressed
images. Note that if we were interested in compressing only a finite number P of images, we could
apply the Johnson-Lindenstrauss lemma [19] to obtain that M = O(logP) would be sufficient to obtain
the result in Theorem 3. However, the M required in Theorem 3 is independent of the number of images
P, and therefore provides scalability to extremely large datasets.
Furthermore, the linear nature of Φ can be utilized to perform dimensionality reduction in a distributed
manner, which is particularly useful in applications when data transmission is expensive. Specifically,
given a network of J cameras, let xj∈ RN,1 ≤ j ≤ J denote the image acquired by camera j, and
denote the concatenation of the images x = [xT
1xT
2··· xT
j]T. Since the required random projections are
linear, we can take local random projections of the images acquired by each camera, and still calculate
the global projections of x in a distributed fashion. Let each camera calculate yj = Φjxj, with the
matrices Φj ∈ RM×N,1 ≤ j ≤ J. Then, by defining the M × JN matrix Φ = [Φ1Φ2 ··· ΦJ], the
global projections y = Φx can be obtained by
y = Φx = [Φ1
Φ2
···
ΦJ][xT
1
xT
2
···
xT
J]T
= Φ1x1+ Φ2x2+ ··· + ΦJxJ.
Thus, the final measurement vector can be obtained by simply adding independent random projections of
the images acquired by the individual cameras. This gives rise to a novel scheme for a compressive data
fusion2protocol as illustrated in Figure 3. Suppose the individual cameras are associated with the nodes
of a binary tree of size J, where the edges represent communication links between nodes. Let the root
of the tree denote the final destination of the fused data (the central unit). Then the fusion is represented
by the flow of data from the leaves to the root, with a binary addition occurring at every parent node.
The dimensionality of the data is M = O(K logJN), and the depth of the tree is R = O(logJ); hence
the total communication bandwidth requirement is given by R × M ≤ O(K log2JN) ? JN.
2These methods provide approximate data fusion for geometric models like manifolds, in the sense that we tolerate some
distortion in pairwise and geodesic distances.
14
Page 15
Camera 1
Camera
Camera 2
Random
Projections
Random
Projections
Central
Processing
Unit
Random
Projections
x1
x2
J
xJ
J
y
y1
y2
y
Fig. 3.
Distributed data fusion using O(K logJN) random projections in a camera network.
The number of random measurements prescribed by Theorem 3 is a small factor larger than the
intrinsic dimensionality K of the data observed; however, it is far lower than its ambient dimensionality
N. In addition, the bandwidth required by this scheme is only logarithmic in the number of cameras
J; this offers a significant improvement from previous data fusion methods that necessarily require
the communication bandwidth to scale linearly with the number of cameras. The joint manifold model
integrates the network transmission and data fusion steps, in a similar fashion to the protocols discussed
in randomized gossiping [20] and compressive wireless sensing [21].
IV. JOINT MANIFOLD CLASSIFICATION
We now examine the application of joint manifold-based techniques to some common inference
problems. In this section we will consider the problem of binary classification when the two classes
corresponds to different manifolds. As an example, we will consider the scenario where a camera network
acquires images of an unknown vehicle, and we wish to classify between two vehicle types. Since the
location of the vehicle is unknown, each class forms a distinct low-dimensional manifold in the image
space. The performance of a classifier in this setting will depend partially on topological quantities of the
joint manifold described in Section II, which in particular provide the basis for the random projection-
based version of our algorithms. However, the most significant factor determining the performance of
the joint manifold-based classifier is of a slightly different flavor. Specifically, the probability of error is
determined by the distance between the manifolds. Thus, we also provide additional theoretical analysis
of how distances between the joint manifolds compare to those between the component manifolds.
15
View other sources
Hide other sources
-
Available from Marco Duarte · 11 Mar 2013
-
Available from psu.edu