ArticlePDF Available

Abstract and Figures

Subspace clustering is an effective method that has been successfully applied to many applications. Here we propose a novel subspace clustering model for multi-view data using a latent representation termed Latent Multi-View Subspace Clustering (LMSC). Unlike most existing single-view subspace clustering methods, which directly reconstruct data points using original features, our method explores underlying complementary information from multiple views and simultaneously seeks the underlying latent representation. Using the complementarity of multiple views, the latent representation depicts data more comprehensively than each individual view, accordingly making subspace representation more accurate and robust. We proposed two LMSC formulations: linear LMSC (lLMSC), based on linear correlations between latent representation and each view, and generalized LMSC (gLMSC), based on neural networks to handle general relationships. The proposed method can be efficiently optimized under the Augmented Lagrangian Multiplier with Alternating Direction Minimization (ALM-ADM) framework. Extensive experiments on diverse datasets demonstrate the effectiveness of the proposed method.
Content may be subject to copyright.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1
Generalized Latent
Multi-View Subspace Clustering
Changqing Zhang, Huazhu Fu, Qinghua Hu, Xiaochun Cao,
Yuan Xie, Dacheng Tao, Fellow, IEEE, and Dong Xu, Fellow, IEEE
Abstract—Subspace clustering is an effective method that has been successfully applied to many applications. Here we propose a
novel subspace clustering model for multi-view data using a latent representation termed Latent Multi-View Subspace Clustering
(LMSC). Unlike most existing single-view subspace clustering methods, which directly reconstruct data points using original features,
our method explores underlying complementary information from multiple views and simultaneously seeks the underlying latent
representation. Using the complementarity of multiple views, the latent representation depicts data more comprehensively than each
individual view, accordingly making subspace representation more accurate and robust. We proposed two LMSC formulations: linear
LMSC (lLMSC), based on linear correlations between latent representation and each view, and generalized LMSC (gLMSC), based on
neural networks to handle general relationships. The proposed method can be efficiently optimized under the Augmented Lagrangian
Multiplier with Alternating Direction Minimization (ALM-ADM) framework. Extensive experiments on diverse datasets demonstrate the
effectiveness of the proposed method.
Index Terms—Multi-view clustering, subspace clustering, latent representation, neural networks.
F
1 INTRODUCTION
SUBSPAC E clustering has been successfully used in nu-
merous applications, especially those involving high-
dimensional data [1], [2]. Existing subspace clustering ap-
proaches can be categorized into iterative methods [3], [4],
algebraic approaches [5], [6], statistical methods and spec-
tral clustering-based methods [7], [8]. Recently proposed
subspace clustering methods [9], [10], [11], [12], [13], [14]
are based on the assumption that data points are drawn
from multiple subspaces corresponding to different clusters,
where each data point can be expressed by a linear combina-
tion of the data points themselves. The general formulation
of existing subspace clustering methods is
min
ZL(X,XZ) + λΩ(Z),(1)
where X= [x1,· · · ,xn]is the d×nfeature matrix whose
columns are the samples, and λ > 0is the tradeoff factor.
C. Zhang and Q. Hu are with the College of Intelligence and
Computing, Tianjin University, Tianjin 300072, China (e-mail:
zhangchangqing@tju.edu.cn; huqinghua@tju.edu.cn).
X. Cao is with the State Key Laboratory of Information Security, Insti-
tute of Information Engineering, Chinese Academy of Sciences, Beijing
100093, China (e-mail: caoxiaochun@iie.ac.cn).
H. Fu is with the Inception Institute of Artificial Intelligence, Abu Dhabi,
United Arab Emirates (e-mail: huazhufu@gmail.com).
Y. Xie is with the Research Center of Precision Sensing and Control,
Institute of Automation, Chinese Academy of Sciences, Beijing, 100190,
China (e-mail: yuan.xie@ia.ac.cn).
D. Tao is with the UBTECH Sydney Artificial Intelligence Centre and
the School of Information Technologies, the Faculty of Engineering and
Information Technologies, the University of Sydney, 6 Cleveland St,
Darlington, NSW 2008, Australia (email: dacheng.tao@sydney.edu.au).
D. Xu is with the School of Electrical and Information Engineer-
ing, University of Sydney, Sydney, NSW 2006, Australia (e-mail:
dong.xu@sydney.edu.au).
*The authors contributed equally to this work (Corresponding author:
Qinghua Hu).
The loss function L(·,·)and regularization term Ω(·)are
usually defined under specific assumptions. The represen-
tative approach - Sparse Subspace Clustering (SSC) [9] -
focuses on searching for the sparsest representation from an
infinite number of possible representations based on the `1-
norm. Unlike SSC, which separately constructs the sparsest
representation for each data point, Low-Rank Representa-
tion (LRR) [10] tries to find the lowest rank representation
of all data jointly by using the structured sparsity loss. Con-
strained by graph regularization, SMooth Representation
clustering (SMR) [11] investigates theoretically the grouping
effect for self-representation based approaches. With the
reconstruction coefficient matrix Z, the affinity matrix is
obtained by S=abs(Z) + abs(ZT), where abs(·)is the
element-wise absolute operator. Finally, with the affinity
matrix Sas the input, the final clustering result is obtained
by conducting standard spectral clustering [7].
Although these subspace clustering approaches are ef-
fective, they tend to be heavily influenced by the original
features, especially when the observations are insufficient
and/or grossly corrupted. Fortunately, multi-view subspace
clustering methods [15], [16], [17] have been proposed to
overcome this issue, in which multiple views describe each
data point. The complementary information from multiple
views can benefit clustering, and the effectiveness has been
empirically proven under different multi-view constraints.
Existing multi-view subspace clustering methods usually
reconstruct the data points on the original view directly and
generate individual, view-specific subspace representations,
and generally share the following formulation:
min
{Z(v)}V
v=1
L{(X(v),X(v)Z(v))}V
v=1+λ{Z(v)}V
v=1,(2)
where X(v)and Z(v)correspond to the feature matrix and
subspace representation of the vth view, respectively. Using
the above formulation, existing methods employ differen-
t loss functions L(·,·)and impose different assumptions
(with different regularization terms Ω(·)) to explore relation-
ships between subspace representations of multiple views.
Although these methods have achieved promising results,
they insufficiently describe data within each view, making
reconstruction using only information from one view risky.
Moreover, noise - which is ubiquitous - further increases the
difficulty in reconstruction on the original feature space.
Here we propose using a latent representation for mul-
tiple views to explore the relationships between data points
and effectively deal with noise. As discussed in [18], [19],
the underlying assumption is that multiple views originate
from one underlying latent representation, which depicts
the essence of the data and reveals the common under-
lying structure shared by different views. Based on this
assumption, we propose a novel method that we call Latent
Multi-view Subspace Clustering (LMSC). Our approach learns
a latent representation to encode complementary informa-
tion from multi-view features and produces a common
subspace representation for all views rather than that of
each individual view. More importantly, and expanding on
the linear correlation used in our previous work [20], we
further generalize our model for non-linear correlation, and
accordingly propose generalized Latent Multi-view Subspace
Clustering (gLMSC). Our method jointly learns the laten-
t representation and multi-view subspace representation
within a unified framework, which can be effectively op-
timized using the Augmented Lagrangian Multiplier with
Alternating Direction Minimization (ALM-ADM) strategy.
We conduct extensive experiments to compare our method
with the current state-of-the-art to demonstrate our model’s
performance.
The main contributions of this paper are as follows:
Based on self-representation-based subspace clustering,
we propose a novel multi-view subspace clustering method
called Latent Multi-view Subspace Clustering (LMSC),
which can integrate multiple views into a comprehensive
latent representation.
The automatically learned latent representation encodes
complementary information from different views and can
meet the self-expressiveness property thus it well reflects
the underlying clustering structure.
In addition to exploring linear correlations between the
latent representation and each view, we further introduce
neural networks to explore more general relationships and
propose the generalized Latent Multi-view Subspace Clus-
tering (gLMSC) method.
Finally, our formulation is effectively solved by using
the Alternating Direction Minimization (ADM) and our
optimization algorithm empirically reaches convergence.
The remainder of the paper is organized as follows.
Related works, including multi-view learning, multi-view
subspace-based clustering, and latent representation-based
clustering methods are briefly reviewed in Section 2. Details
of our proposed approach are presented in Section 3. In Sec-
tion 4, we present experimental results that demonstrate our
model’s performance using a variety of real-world datasets.
Conclusions are drawn in Section 5.
2 RE LATED WORK
Based on the ubiquitous multi-view data, multi-view learn-
ing [21], [22], [23], [24], [25] has shown remarkable success
in a wide range of real-world applications. Most existing
multi-view clustering methods are graph-based models. One
of the early methods presented in [26] focuses on handling
two-view data. Under a matrix factorization framework, some
methods [27], [28] attempt to uncover a common represen-
tation to link different views for clustering. The multi-view
subspace clustering methods [15], [16], [17] relate different
data points in a self-representing manner on the original
view and simultaneously constrain these subspace represen-
tations of different views to exploit complementary informa-
tion. Based on spectral clustering, [29], [30] co-regularize the
clustering hypothesis of different views to enforce consis-
tence. For large-scale data, a robust, large-scale, multi-view
k-means clustering method [31] can be parallelized and
run on multi-core processors for large-scale data clustering.
Multiple Kernel Learning (MKL) can be considered a nature
way to integrate multiple views. As a result, the method
in [32] directly combines multiple kernels corresponding
to different views and validated the approach’s effective-
ness. Based on MKL, [33] further proposes to automatically
weight different views. There are some multi-view methods
focusing on other topics, e.g., dimensionality reduction [34]
and feature selection [35].
Two groups of multi-view subspace clustering methods
are most related to ours. The first employs CCA to project
multiple views onto a low-dimensional subspace and then
uses the learned representation for clustering [36], [37].
The second group are the self-representation-based meth-
ods [15], [16], [17]. Diversity-induced Multi-view Subspace
Clustering (DiMSC) [15] explores complementary informa-
tion with Hilbert-Schmidt Independence Criterion (HSIC)
under the self-representation subspace clustering frame-
work. Low-Rank Tensor Constrained Multi-view Subspace
Clustering (LT-MSC) [16] explores the high-order correla-
tion among these subspace representations. The method in
[17] unifies different views with a common indicator ma-
trix rather than a common subspace representation. These
methods reconstruct data points within each single view.
Instead, our method constructs a unified similarity matrix
for multiple views by using a latent representation, and
thus well utilizes complementarity across different views
for subspace clustering.
Under the self-representation-based subspace clustering
framework, some methods [9], [10] introduce latent repre-
sentations. Latent Space Sparse Subspace Clustering (LS3C)
[38] jointly performs dimensionality reduction and sparse
coding on sparse subspace clustering [9]. Latent Low-Rank
Representation (LatLRR) [39] is based on LRR [10] and
constructs a dictionary by jointly using observed and hid-
den data. The methodology of ours is quite different from
these work: (1) Our algorithm performs subspace clustering
with the learned common latent representation, while these
methods conduct data self-representation within each single
view. (2) The correlations among different views are linear
for these methods, while our algorithm gLMSC explores
more general correlations by neural networks. There are also
some algorithms for multi-view representation learning.
Some approaches [18], [19], [40] explicitly learn a common
representation for multiple views as a joint optimization
problem with a common subspace representation matrix.
Generalized Multiview Analysis (GMA) [41] is an extension
of Canonical Correlational Analysis (CCA), which is de-
signed for cross-view classification and retrieval. Multiview
LSA [42] is an algorithm that can efficiently approximate
Generalized Canonical Correlational Analysis (GCCA). Be-
yond kernel technique, Deep Canonical Correlation Anal-
ysis (DCCA) [43] explores nonlinear correlation between
views with neural networks. Some recent approaches [44],
[45] aim to learn a new representation based on auto-
encoders. In contrast to these methods, which learn the
latent representation by linearly [38] or non-linearly [44],
[45] mapping the original single-view data, our method
jointly recovers the latent multi-view representation and the
mappings corresponding to different views to encode the
intrinsic complementary information.
3 LATENT MULTI-VIE W SUBSPACE CLUSTERING
In our method, subspace clustering is performed based on
the latent representation encoding complementary infor-
mation in multiple views. Specifically, given nmulti-view
observations {[x(1)
i;...;x(V)
i]}n
i=1 consisting of Vdifferent
views, our model aims to seek a shared multi-view latent
representation hfor each data point. The underlying as-
sumption is that these different views originate from one un-
derlying latent representation. Basically, in one respect, the
information from different views should be encoded into the
learned representation. In another respect, the learned latent
representation should meet the specific task (task-oriented
goal), e.g., self-representation or subspace reconstruction.
Therefore, we consider the general objective function
I({Xv}V
v=1,H;Θ1)
| {z }
information preservation
+λS(H;Θ2)
| {z }
task-oriented goal
.(3)
where H= [h1,· · · ,hn]Rk×nis the latent represen-
tation matrix. The first term I(·,·)ensures that the latent
representation encodes information from the original views,
thus avoids the bias of the latent representation towards
the specific task. The second term S(·,·)is the task-oriented
term. λ > 0balances the two terms. Θ1and Θ2are the
parameters corresponding to I(·,·)and S(·,·), respectively.
Specifically, for latent multi-view subspace clustering,
which aims to explore the subspace structure based on the
latent representation, we have the following formulation:
min
θv,H,ZLS(H,HZ) +
V
X
v=1
αvLV(Fv(H;θv),X(v)) + λΩ(Z),
(4)
where LS(H,HZ)is the loss function for the subspace
representation. LV(Fv(H;θv),X(v))and Fv(H;θv)are the
reconstruction loss and underlying mapping from the la-
tent representation Hto the observations for the vth view,
respectively. The tradeoff factors αv>0and λ > 0are
used to control the influence of the vth view and regulariza-
tion degree of subspace representation, respectively. With
objective function (4), we can learn the latent multi-view
representation, which benefits from complementarity of all
X(1)
X(2)
P(1)
P(2)
H
Fig. 1: Illustration of multi-view latent representation. Ob-
servations {X(v)}V
v=1 (V2) corresponding to different
views are partially projected by {P(v)}V
v=1 from one under-
lying latent representation H.
views and is therefore beneficial to subspace clustering.
In our work, we propose two latent multi-view subspace
clustering (LMSC) methods: linear (l)LMSC and generalized
(g)LMSC.
3.1 Linear Latent Multi-view Subspace Clustering
We first model the correlation between the latent represen-
tation and each view by using a linear model, termed linear
Latent Multi-view Subspace Clustering (lLMSC). As shown
in Fig. 1, observations corresponding to different views can
be linearly recovered with their respective models {P(1),
..., P(V)}based on the shared latent representation hi, i.e.,
x(v)
i=P(v)hi. Considering noise in observations, we have
x(v)
i=P(v)hi+e(v)
i,(5)
where e(v)
iis the noise of the ith sample in the vth view.
To infer the multi-view latent representation, the objective
function becomes
min
P,HLV(X,PH),
with X=
X(1)
· · ·
X(V)
and P=
P(1)
· · ·
P(V)
,(6)
where Xand Pare the observations and reconstruction
models concatenated and aligned according to multiple
views, respectively. The loss function LV(·,·)is associated
with the reconstruction from the latent (hidden) represen-
tation to different views. In this way, complementary infor-
mation from multiple views is automatically encoded into
the latent representation H, making it more comprehensive
than that of each single view individually.
For the task-oriented goal (the second term in Eq. (3)),
our aim is to perform subspace clustering as in Eq. (1).
Therefore, the objective function based on latent represen-
tation His reformulated as
min
ZLS(H,HZ) + λΩ(Z),(7)
where the loss function LS(H,HZ)is defined based on
the self-representation-based reconstruction error. The re-
construction coefficient matrix Zis regularized with Ω(Z).
For multi-view subspace clustering, we jointly conduct
latent representation learning in Eq. (6) and subspace clus-
tering in Eq. (7) within one unified objective function
min
P,H,ZLV(X,PH) + λ1LS(H,HZ) + λ2Ω(Z),(8)
where λ1>0and λ2>0are the tradeoff parameters
used to balance the three terms. Generally, the quality of
subspace clustering is improved by a comprehensive latent
representation, while the quality of the latent representation
is ensured by the complementary information from multiple
views and the clustering structure identification. Consider-
ing outliers, the objective function of lLMSC is
min
P,H,Z,EV,ES
kEVk2,1+λ1kESk2,1+λ2kZk
s.t. X=PH +EV,H=HZ +ESand PPT=I,(9)
where EVand ESdenote the errors corresponding to recon-
struction from the latent representation to each view and
subspace representation, respectively. The subspace repre-
sentation is ensured to be low-rank with matrix nuclear
norm || · ||. The `2,1-norm || · ||2,1enforces columns of a
matrix to be zero [10]. The definition of `2,1-norm used for
a matrix (A) is: kAk2,1=
q
P
j=1 sp
P
i=1
A2
ij with ARp×q. It
is robust to outliers due to its underlying assumption that
the corruptions are sample-specific. The projection matrix
Pis constrained to avoid Hbeing pushed arbitrarily close
to zero, since rescaling H/s and Ps(s > 0) preserves
the same loss. For our objective function, the first term
ensures that the latent representation His comprehensive,
while the second term relates data points with subspace
representation. The last term finds the lowest rank subspace
representation and prevents a trivial solution. Note that our
model holds the robustness from: (1) complementary infor-
mation in different views enhances robustness compared to
each single view, subsequently improving clustering; (2) the
structured sparsity regularization with the `2,1-norm on the
error handles outliers well compared to the Frobenius norm.
To ensure that the outliers are consistent with the errors
ESand EV, we vertically concatenate them along column.
This enforces ESand EVto be with the same pattern of
column-wise sparsity [46]. Accordingly, the final objective
function of our lLMSC is formulated as
min
P,H,Z,EV,ES
kEk2,1+λkZk
s.t. X=PH +EV,H=HZ +ES,
E= [EV;ES]and PPT=I.
(10)
In our model, only one parameter λ > 0is involved
to balance the reconstruction error and regularization on
subspace representation.
3.1.1 lLMSC Optimization
According to the objective function of our lLMSC in Eq. (10),
we simultaneously seek the effective latent representations
from different views and obtain the affinity matrix based
on the latent representations. Since it is not jointly convex
for all the variables, we divide our objective function into
subproblems that can be efficiently solved. We employ the
Augmented Lagrange Multiplier (ALM) with Alternating
Direction Minimization (ADM) strategy [47] for our opti-
mization. To adopt the ADM strategy, the objective function
should be separable. Hence, auxiliary variable Jis intro-
duced to replace Z. Accordingly, the following problem,
which is equivalent to Eq. (10), is proposed:
min
P,H,Z,EV,ES,JkEk2,1+λkJk
s.t. X=PH +EV,H=HZ +ES,
E= [EV;ES],PPT=Iand J=Z.
(11)
To solve the above objective function, we minimize the
following ALM problem:
L(P,H,Z,EV,ES,J)
=kEk2,1+λkJk
+ Φ(Y1,XPH EV)
+ Φ(Y2,HHZ ES) + Φ(Y3,JZ)
s.t. E= [EV;ES]; PPT=I.
(12)
Note that, for better presentation, we have the definition
as: Φ(C,D) = µ
2||D||2
F+hC,Di, where ,·i is known as
the Frobenius inner product defined by hA,Bi=tr(ATB).
µ > 0is the penalty scalar and Cis the Lagrangian
multiplier. According to the Alternating Direction Mini-
mization (ADM) strategy [47], we separate our objective
into subproblems that can be efficiently optimized. Then,
the optimization is cycled over all variables while keeping
the previously updated variables fixed. Specifically, each
subproblem is solved as follows:
1. P-subproblem: With other variables fixed, we should
optimize the following problem for updating P:
P= arg minPΦ(Y1,XPH EV)
s.t. PPT=I.(13)
To efficiently solve the above problem, we introduce Theo-
rem 1 [48] which is used for “Wahba’s problem”, i.e., seek-
ing a (orthogonal) rotation matrix between two coordinate
systems given a set of observations.
Theorem 1. Given the objective function minR||Q
GR||2
Fs.t. RTR=RRT=I, the optimal solution is
R=UVT, where Uand Vare left and right singular values of
SVD decomposition of GTQ.
We can show that PT=UVTis the optimal solution for
the P-subproblem with U(V) being the left (right) singular
values of H(X+Y1EV)T. Specifically, we have
P= arg minPΦ(Y1,XPH EV)
= arg minP
µ
2||XPH EV+Y1||2
F
= arg minP
µ
2||(X+Y1EV)PH||2
F
= arg minP
µ
2||(X+Y1EV)THTPT||2
F.
According to Theorem 1, if Pis constrained to be or-
thogonal (i.e., PPT=PTP=I), PT=UVTwill be
the optimal solution. In practice, the constraint for Pcould
be relaxed (i.e., PPT=I, where PRk×d, k d).
Promising performance and convergence results validate
this relaxation.
TABLE 1: Main notations used throughout the paper.
Model Specification
Notation Meaning
X(v)Rdv×nThe feature matrix of the vth view
HRk×nThe learned latent representation matrix
ZRn×nThe subspace representation matrix
PRk×d, d =PvdvThe projection from latent representation to all views
ESRk×n,EVRd×nThe reconstruction errors
Y1,Y2,Y3Lagrangian multipliers for constraints
W(1,v)Rd(1,v)×kThe neural networks parameters
W(2,v)Rd(2,v)×d(1,v)The neural networks parameters
2. H-subproblem: To update H, the following objective
should be optimized:
H= arg minHΦ(Y1,XPH EV)
+ Φ(Y2,HHZ ES).(14)
Differentiating the objective function with respect to Hand
then setting the derivative to zero, we obtain the following
equation:
AH +HB =C
with A=µPTP,B=µ(ZZTZZT+I),
C= (PTY1+Y2(ZTI))
+µ(PTX+ET
SPTEVESZT).
(15)
Equation (15) is a Sylvester equation [49]. For stability,
matrix Ais enforced to be strictly positive-definite with
ˆ
A=A+I. The matrix Iis an identity matrix and is
a small positive scalar, i.e., 0<  1.
Proposition 1. The Sylvester equation (15) has a unique solu-
tion.
Proof. There is a unique solution for Sylvester equation
ˆ
AH +HB =Cwith respect to Hif there is no common
eigenvalue for ˆ
Aand -B[49]. Matrix ˆ
Ais positive-definite,
so all eigenvalues of ˆ
Aare positive, i.e., αi>0. Matrix B
is positive semi-definite, so all eigenvalues of Bare non-
negative, i.e., βi0. Therefore, αi+βj>0holds for
any eigenvalues of ˆ
Aand B. Accordingly, there is a unique
solution for Sylvester equation (15).
Remark. We employ the Bartels-Stewart algorithm [49] to
solve the Sylvester equation. In this algorithm, the coef-
ficient matrices are transformed into Schur forms by QR
decomposition before employing back-substitution to solve
the obtained triangular system. Note that, the proposed
model can be solved exactly under the condition PPT=
PTP=I. That is to say, when A=PTPis a positive-
definite matrix, and Pis orthogonal.
3. Z-subproblem: With the other variables fixed, the sub-
space representation matrix Zcan be updated by optimizing
the following objective function:
Z= arg minZΦ(Y3,JZ) + Φ(Y2,HHZ ES).
(16)
Accordingly, the following update rule is obtained:
Z= (HTH+I)1[(J+HTHHTES)
+ (Y3+HTY2)].(17)
4. E-subproblem: To update the reconstruction error E,
we need to solve the following problem:
E= arg minEkEk2,1+ Φ(Y1,XPH EV)
+ Φ(Y2,HHZ ES)
= arg minE
1
µkEk2,1+1
2kEGk2
F,
(18)
where the matrix Gis constructed by vertically concatenat-
ing XPH +Y1and HHZ +Y2. The optimal
solution can be obtained by Lemma 3.2 in [10].
5. J-subproblem: With the other variable fixed, we ob-
tain the following objective function with respect to J:
J= arg minJλkJk+ Φ(Y3,JZ)
=λ
µkJk+1
2kJ(ZY3)k2
F.(19)
This low-rank approximation problem can be solved with
the singular value thresholding (SVT) algorithm [50].
6. Updating multipliers: The multipliers can be updated
with the following rule:
Y1=Y1+µ(XPH EV)
Y2=Y2+µ(HHZ ES)
Y3=Y3+µ(JZ).
(20)
The complete algorithm of lLMSC is shown in Algorithm 1.
Algorithm 1: Optimization algorithm for lLMSC
Input: Multi-view matrices: {X(1), ..., X(V)},
hyperparameter λand the dimension kof
latent representation H.
Initialize: P= 0,EV= 0,ES= 0,J=Z= 0,
Y1= 0,Y2= 0,Y3= 0,µ= 106,ρ= 1.2,= 104,
maxµ=106; Initialize Hwith random values.
while not converged do
Update variables P,H,Z,EV,ES,Jaccording to
subproblems 1-5;
Update multipliers Y1,Y2,Y3according to
subproblems 6;
Update the parameter µby µ= min(ρµ; maxµ);
Check the convergence conditions:
||XPH EV||< , ||HHZ ES)||< 
and ||JZ||< .
end
Output:Z,H,Pand E.
Multi-view Information Preservation
Latent representation Subspace clusteringInput featureOriginal data Neural networks
Subspace Structure Preservation
Potential applications
H
X
X
Link graph
Attributes
Multi-View
Reconstruction Subspace
Representation
(1)
(2)
Fig. 2: Illustration of the proposed generalized Latent Multi-view Subspace Clustering (gLMSC). The latent representation
non-linearly encodes the information from multiple views with neural networks for uncovering the data distribution in
subspaces. Our model can also be considered as an unsupervised multi-view representation learning method, where the
learned representation could be used for other potential applications. For comparison, the dashlines indicate the linear
LMSC (lLMSC) mentioned in subsection 3.1.
Remark. Several details of our algorithm must be clarified.
(1) We employ linear projection which is effective and easy
to resolve. The non-linear correlation is addressed in the
next subsection. (2) For the P-subproblem optimization,
although orthogonal condition is needed for the strict cor-
rectness, promising performance and stable convergence
are achieved with low-dimensional projection in practice.
Moreover, with other constraints for P(e.g., ||P(:, j)||21),
it can be solve with the ADMM algorithm [51]. Although it
has similar performance, the inner iteration with ADMM
makes the algorithm complexity much greater. (3) It is
not appropriate to initialize Hwith a zero value. In this
case, the optimal solution for H-subproblem will be zero,
and subsequent optimizations for the other subproblems
(e.g., Z-subproblem in Eq. (16)) will have trivial solutions.
Therefore, we initialize Hrandomly in our implementation,
and Hcan also be initialized with other preprocessing (e.g.,
PCA) to address the instability issue.
3.2 Generalized Latent Multi-view Subspace Clustering
lLMSC assumes a linear relationship between the latent
representation and the features from each view. Accord-
ingly, relationships between different views are also linear.
Nevertheless, in real-world applications, relationships are
usually much more complex and non-linear. The kernel trick
is regularly adopted to implicitly address the non-linearity
problem by mapping data points onto a high-dimensional
space and then solving the learning algorithms in that space.
However, the kernel is usually selected in an ad hoc man-
ner and hence suffers from generalization problem. Neural
network-based methods [52], [53] can flexibly learn highly
non-linear mappings, so here we employ neural networks
to address complex relationships between the latent repre-
sentation and the features from individual views, and the
non-linear interactions among multiple views. Accordingly,
we propose the generalized Latent Multi-view Subspace
Clustering (gLMSC) method shown in Fig. 2.
The objective function of gLMSC is formulated as fol-
lows:
min
{θv}V
v=1,H,Z
`(H,HZ) +
V
X
v=1
αvdvXv, gθv(H)+λΩ(Z)
with gθv(H) = W(k,v)f(W(k1,v)...f (W(1,v)H)),
(21)
where `(·,·)(corresponding to LS(·,·)in (4)) is the loss
for subspace representation, and dv(·,·)(corresponding to
LV(·,·)in (4)) measures the distortion of reconstruction
from the latent representation to the observation in the vth
view. The neural network gθv(H)accounts for the non-
linear mapping, with f(·)being the activation function and
W(k,v)being the weight matrix of between the kth and
(k+ 1)th layers for the vth view. The tradeoff factor αv
is used to control the fusion portion from the vth view,
which encodes the influence of the vth view on the latent
representation. By using a three-layer network, we propose
the following objective function for gLMSC under the low-
rank constraint for subspace representation:
min
{θv}V
v=1,H,Z
1
2kHHZk2
F
+
V
X
v=1
αv
2
XvW(2,v)f(W(1,v)H)
2
F+λkZk,
(22)
where the activation function used in our model is the tanh
function which is defined as:
f(a) = tanh(a) = 1e2a
1 + e2a.(23)
Accordingly, the corresponding derivative can be calculated
as:
f0(a) = tanh0(a)=1tanh2(a).(24)
To summarize, gLMSC has the following merits. (1) Our mod-
el focuses on seeking the comprehensive common representation of
multiple views, based on which (and instead of each single view)
subspace clustering is performed. (2) Since subspace clustering is
specific for high-dimensional data, therefore, for existing methods
(e.g., [15], [16]), the data should not have low-dimensional views.
In contrast, our method is free of the restraint due to the latent
representation. (3) Inter-view correlations are implicitly encoded
by the network which non-linearly maps the latent representation
to reconstruct each view. (4) Our framework has flexibility due
to the use of different components, i.e., both the network and
the regularization terms are replaceable (for example with low-
rank/sparse/graph regularization); 5) Although our work focuses
on subspace clustering, gLMSC can be considered a general multi-
view representation learning framework.
3.2.1 gLMSC Optimization
The objective function in Eq. (22) can be solved as follows:
Update the network parameters, i.e., W(1,v)and W(2,v).
Letting Mv=tanh(W(1,v)H)and imposing regularization
on W(1,v)and W(2,v), for the vth view, we have
LW=αv
2
XvW(2,v)f(W(1,v)H)
2
F+γΩ(Θ),(25)
where Ω(Θ)=(
W(1,v)
2
F+
W(2,v)
2
F)and γ > 0is the
tradeoff parameter for model regularization of the network.
Then, we have
W(2,v)=XvMT
v(MvMT
v+γ
αv
I)1(26)
and
LW
W(1,v)
=αv(1MvMv)
(WT
(2,v)W(2,v)MvWT
(2,v)Xv)HT+γW(1,v),
(27)
where denotes element-wise multiplication, 1is a matrix
whose elements are all ones, and 1MvMvis the gradient
of Mv=tanh(W(1,v)H). We update W(1,v)using the gra-
dient descent (GD) algorithm. The optimization procedure
of our neural networks is summarized in Algorithm 2.
Update H. The update of His similar to that of W(1,v)as
follows:
LH
H=
V
X
v=1
αvWT
(1,v)(1MvMv)
(WT
(2,v)W(2,v)MvWT
(2,v)Xv)+H(IZZT+ZZT)
with LH=1
2kHHZk2
F
+
V
X
v=1
αv
2
XvW(2,v)f(W(1,v)H)
2
F.
(28)
We update Husing the gradient descent (GD) algorithm.
Update Z. To update Z, we introduce an auxiliary variable
Jand iteratively update Z,Jand the multiplier Ywith
ADMM as follows:
Z= (HTH+µI)1(µJY+HTH),
J= arg minJ
λ
µkJk+1
2kJ(Z+Y)k2
F,
Y=Y+µ(JZ),
(29)
where it can be solved by singular value thresholding [50]
for updating J. The optimization procedure is summarized
in Algorithm 3.
Algorithm 2: Update networks with the GD algorithm
Input: Multi-view data {X(1), ..., X(V)}, latent
representation H, hyperparameter λ, learning
rate η, dimensionality kof latent representation
H, and maximal iteration number T.
Initialization: Initialize randomly W(1,v)and t= 1.
while t<T and not converged do
v= 1;
for vVdo
Update Mvby Mv=tanh(W(1,v)H);
Update W(2,v)according to (26);
Update W(1,v)by W(1,v)=W(1,v)ηLW
W(1,v);
v=v+ 1;
end
Check the convergence conditions:
PV
v=1 αvkXvgθv(H)k2
F< .
t=t+ 1;
end
Output:{W(1,v),W(2,v)}V
v=1.
Algorithm 3: Optimization algorithm for gLMSC
Input: Multi-view matrices: {X(1), ..., X(V)},
hyperparameter λand the dimension Kof the
latent representation H.
Initialization: µ= 106,ρ= 1.2,= 104, maxµ
=106; randomly initialize Hand W(1,v).
while not converged do
Update the networks by using Alg. 2;
Update the latent representation Haccording to
(28);
Update the subspace representation Z,Jand Y
according to (29);
Update the parameter µby µ= min(ρµ; maxµ);
Check the convergence conditions:
||JZ||< .
end
Output:Zand H.
3.3 Complexity and Convergence
The optimization of lLMSC comprises six sub-problems. For
clarification, we define k,d, and nas the dimensionality of
the latent representation, the sum of the dimensionalities
for multiple views, and the size of data, respectively. Then,
the complexities of the six sub-problems are induced as
follows. For updating Pand J(the nuclear norm proximal
operator), the complexities are O(k2d+d3)and O(n3),
respectively. The complexity of updating Hwith Bartels-
Stewart algorithm [49] is O(k3). The main computational
cost of updating Zis the matrix inversion, and the complex-
ity is O(n3). The complexity of updating Eand multipliers
is O(dkn +kn2)due to the matrix multiplication. Then, the
overall complexity of lLMSC is O(k2d+d3+k3+n3+dkn+
kn2). Since the dimension of latent representation is usually
much lower than that of original views, i.e., kd, then
the complexity is basically O(d3+n3). For the complexity
of gLMSC, the main computational cost arises from three
sub-problems. For the meanings of d(1,v),d(2,v), please refer
to Table. 1. The complexities are O(d(1,v)kn +d2
(1,v)d(2,v)+
d2
(1,v)n),O(d(1,v)kn), and O(d2
(1,v)k+d3
(1,v))for updating
M,W(1,v), and W(2,v), respectively. For updating Hand
Z, the complexity is O(d(2,v)d(1,v)n+d2
(1,v)d(2,v)+d2
(1,v)n+
n3+kn2)and O(n3), respectively. Similarly, under the
condition d1=max({d(1,v)}V
v=1),d2=max({d(2,v)}V
v=1),
and kmin(d1, d2), the total complexity of gLMSC is
O(d3
1+n3+d2
1n+d2
1d2+d1d2n). It is difficult to provide
a general proof of the convergence for our algorithm. Fortu-
nately, comprehensive results on both synthesized and real
data empirically demonstrate that the proposed algorithm
has very strong and stable convergence, even with random
Hinitialization.
4 EXPERIMENTS
4.1 Experimental Setting
To comprehensively evaluate our model, both synthetic and
real-world benchmark datasets are employed. We conduct
experiments on synthetic data to test the effectiveness of
using multiple views compared with a single view. We also
employ datasets from diverse applications including general
images, medical images, text, and community networks.
Specifically, we use the following datasets. ADNI1consists
of 360 samples with Magnetic Resonance (MR) and Positron
Emission Tomography (PET) images, where 93 ROI-based
neuroimaging features for each neuroimage (i.e., MRI or
PET) are extracted. Multilingual dataset Reuters [54] con-
sists of 2000 samples with 5 types of languages and the
documents are represented as a bag of words using a TFIDF-
based weighting scheme. Football2is a collection of 248
English Premier League football players and clubs active on
Twitter. The disjoint ground truth communities correspond
to the 20 clubs in the league. Politicsie3is a collection of Irish
politicians and political organizations assigned to seven dis-
joint ground truth groups according to their affiliation. The
two Twitter datasets are associated with 9 different views.
MSRCV1 [55] consists of 210 images from 7 classes. There
are 6 types of features extracted: CENT, CMT, GIST, HOG,
LBP, and SIFT. BBCSport4consists of documents of sports
news corresponding to 5 topics, where for each document
two different types of features are extracted [56]. The dataset
Animals with Attributes [57] consists of 30475 images of 50
animals classes. We sampled 1/3 data points from each class
with equal interval to generate a subset with 10158 samples.
1. http://adni.loni.usc.edu/
2. http://mlg.ucd.ie/aggregation/
3. http://mlg.ucd.ie/aggregation/
4. http://mlg.ucd.ie/datasets/
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
α (Noise degree)
NMI
0.1
SingleView
MultipleView
(a) Performance with different degrees of noise.
(b) Visualization of similarity matrices corresponding to single view (left
and middle) and multiple views (right).
Fig. 3: Experiments to evaluate the robustness of multi-view
and single-view methods on synthetic data.
Two types of deep features (i.e., extracted with DECAF [53]
and VGG19 [58]) are used. We also extract two types of deep
features (extracted with DECAF and VGG19) for Caltech-
101 which contains 8677 images from 101 classes.
We conduct experiments on multiple benchmark dataset-
s to compare the following methods:
(1) LRRBestSV [10] performs subspace clustering with the
low-rank constraint for each single view with the best
performance reported.
(2) RMSC [56] recovers a shared low-rank transition proba-
bility matrix as the input to the standard Markov chain.
(3) DiMSC [15] enforces subspace representations of dif-
ferent views to be diverse to reduce redundancy and then
integrates them all into an affinity matrix.
(4) LT-MSC [16] employs a low-rank tensor to enforce
the consistence in high-order manner to make use of the
complementary information of multiple views.
(5) t-SVD-MSC [59] imposes a new type of low-rank tensor
constraint on the rotated tensor to capture the complemen-
tary information from multiple views.
(6) DSSC [60] proposes a deep extension of Sparse Subspace
Clustering, termed Deep Sparse Subspace Clustering (DSS-
C). We employ PCA to reduce the number of dimensions for
each view and then concatenate together all views.
(7) MLAP [61] performs multi-view subspace clustering by
concatenating subspace representations of different views
together and imposing low-rank constraint to explore the
complementarity.
(8) MSSC [62] exploits the complementarity by using a
common representation across different modalities.
(9-10) lLMSC/gLMSC are the proposed linear/generalized
Latent Multi-View Subspace Clustering algorithms.
For clustering measures, NMI (normalized mutual infor-
mation), ACC (accuracy), F-measure, and RI (Rand index)
are employed to conduct comprehensively evaluation. Note
that a higher value indicates a better performance for each
TABLE 2: Performance comparison of different clustering methods.
Datasets Methods NMI ACC F-measure RI
ADNI
LRRBestSV 6.28 ±0.19 42.28 ±0.21 39.90 ±0.47 55.67 ±0.16
RMSC 6.81 ±0.30 42.78 ±0.46 38.34 ±0.63 55.65 ±0.12
DiMSC 5.84 ±0.12 39.17 ±0.36 40.12 ±0.33 50.88 ±0.23
LT-MSC 8.63 ±0.03 42.78 ±0.05 39.40 ±0.13 56.57 ±0.00
t-SVD-MSC 4.37 ±0.43 42.38 ±0.59 37.76 ±0.23 55.47 ±0.07
DSSC 6.98 ±0.53 44.17 ±0.56 39.82 ±1.20 55.50 ±0.49
MLAP 9.68 ±0.81 45.27 ±0.67 39.30 ±0.18 56.61 ±0.02
MSSC 5.89 ±0.45 44.45 ±0.60 38.47 ±0.21 55.44 ±0.02
lLMSC 8.20 ±0.19 45.56 ±0.21 40.78 ±0.40 55.50 ±0.16
gLMSC 10.98 ±0.15 46.67 ±0.23 41.91 ±0.20 57.20 ±0.11
Reuters
LRRBestSV 20.69 ±0.62 39.90 ±0.31 32.55 ±0.48 68.11 ±0.07
RMSC 19.00 ±0.75 39.46 ±1.29 31.86 ±1.40 68.05 ±0.92
DiMSC 18.21 ±0.33 40.00 ±1.13 28.68 ±0.39 67.49 ±0.28
LT-MSC 17.93 ±1.32 36.20 ±1.46 28.29 ±0.95 68.16 ±0.53
t-SVD-MSC 24.88 ±0.03 43.40 ±0.68 33.17 ±0.04 69.54 ±0.02
DSSC 12.86 ±1.25 42.78 ±2.03 35.61 ±2.19 66.90 ±0.78
MLAP 17.04 ±2.24 38.40 ±1.63 32.15 ±1.83 63.69 ±0.52
MSSC 20.56 ±0.63 44.50 ±1.04 37.23 ±1.12 62.09 ±0.36
lLMSC 27.99 ±0.79 47.90 ±0.64 40.15 ±0.50 70.08 ±0.39
gLMSC 23.00 ±1.00 42.70 ±0.99 34.76 ±1.21 65.37 ±0.63
Football
LRRBestSV 81.07 ±1.56 75.40 ±2.36 66.36 ±2.57 96.66 ±0.15
RMSC 84.34 ±2.04 78.55 ±3.84 70.97 ±4.01 97.08 ±0.44
DiMSC 82.16 ±1.45 75.40 ±2.26 67.13 ±1.19 96.74 ±0.59
LT-MSC 84.22 ±1.17 79.03 ±2.01 71.32 ±1.37 97.19 ±0.55
t-SVD-MSC 85.65 ±0.73 80.15 ±0.88 73.04 ±0.40 97.34 ±0.22
DSSC 78.16 ±1.38 76.81 ±1.25 48.44 ±2.14 92.52 ±0.63
MLAP 85.19 ±1.89 80.64 ±2.36 73.35 ±2.04 97.36 ±0.34
MSSC 84.27 ±0.93 84.65 ±1.37 74.78 ±2.16 97.50 ±0.43
lLMSC 83.96 ±2.08 80.24 ±2.18 70.82 ±1.09 97.14 ±0.82
gLMSC 89.31 ±2.22 86.25 ±1.45 79.40 ±1.40 97.97 ±0.73
Politicsie
LRRBestSV 72.94 ±3.37 64.94 ±4.58 64.59 ±3.06 85.36 ±2.06
RMSC 70.88 ±3.22 63.30 ±4.17 60.61 ±3.38 84.09 ±1.56
DiMSC 76.63 ±4.16 80.46 ±3.21 77.57 ±2.19 89.97 ±1.19
LT-MSC 68.61 ±1.22 64.08 ±1.56 62.69 ±1.53 84.59 ±0.90
t-SVD-MSC 76.86 ±1.55 78.86 ±2.10 75.58 ±1.60 89.39 ±0.77
DSSC 75.79 ±3.87 70.52 ±3.99 70.05 ±2.50 87.69 ±1.35
MLAP 78.10 ±2.01 71.26 ±2.37 72.26 ±1.34 88.66 ±1.72
MSSC 69.27 ±2.53 66.38 ±2.06 63.05 ±1.49 84.86 ±1.01
lLMSC 81.46 ±0.89 83.33 ±0.94 80.66 ±0.69 91.42 ±0.19
gLMSC 78.65 ±1.16 82.18 ±1.71 78.42 ±0.91 90.48 ±0.22
MSRCV1
LRRBestSV 56.47 ±2.09 66.19 ±2.73 51.72 ±3.56 68.34 ±1.28
RMSC 64.99 ±2.21 75.00 ±4.81 62.78 ±2.34 89.42 ±0.69
DiMSC 62.87 ±2.18 68.57 ±3.92 57.92 ±2.44 89.72 ±1.10
LT-MSC 70.04 ±0.13 80.00 ±0.09 68.48 ±0.03 91.12 ±0.00
t-SVD-MSC 96.03 ±0.03 98.10 ±0.01 96.16 ±0.03 98.93 ±0.00
DSSC 63.34 ±0.24 71.01 ±0.10 63.29 ±0.35 86.91 ±0.25
MLAP 66.71 ±0.52 72.86 ±0.76 64.45 ±0.38 89.98 ±0.08
MSSC 63.10 ±0.16 70.99 ±0.22 62.87 ±0.19 86.54 ±0.07
lLMSC 65.34 ±1.17 80.55 ±1.41 65.17 ±1.62 90.40 ±0.20
gLMSC 75.25 ±1.03 84.81 ±1.27 73.80 ±1.79 92.51 ±0.23
BBCSport
LRRBestSV 69.02 ±0.19 78.72 ±0.26 76.98 ±0.23 87.35 ±0.13
RMSC 60.84 ±0.75 73.72 ±0.37 65.51 ±0.20 92.29 ±0.33
DiMSC 85.11 ±0.13 95.10 ±2.17 91.02 ±0.14 95.72 ±0.10
LT-MSC 77.54 ±0.46 90.26 ±0.73 80.16 ±0.59 90.36 ±0.27
t-SVD-MSC 91.82 ±0.08 97.61 ±0.21 94.90 ±0.06 97.57 ±0.11
DSSC 72.56 ±0.32 89.43 ±0.13 81.19 ±0.26 92.91 ±0.01
MLAP 71.23 ±0.36 85.29 ±0.15 73.53 ±0.19 85.27 ±0.02
MSSC 69.96 ±0.39 79.78 ±0.92 76.13 ±0.51 87.27 ±0.34
lLMSC 82.59 ±0.81 91.07 ±0.59 88.65 ±0.77 94.53 ±0.15
gLMSC 88.66 ±0.46 96.32 ±0.78 92.54 ±0.26 96.49 ±0.11
metric. Since there are different accuracy definitions in clus-
tering, we specify the definition used in our experiments.
Given a sample xi, we denote the cluster and class labels as
ωiand ci, respectively, giving:
ACC =PN
i=1 δ(ci, map(ωi))
n,(30)
where δ(a, b) = 1 when a=b, otherwise δ(a, b) = 0.
map(ωi)is the permutation map function, which maps the
cluster labels into class labels. nis the number of samples.
The best map can be obtained by the Kuhn-Munkres algo-
rithm.
For our algorithm, we tune the tradeoff parameter λ
from the set {0.01,0.1,1,10,100}. For simplicity, we set α1
=...= αV=αand tune αfrom {0.1,0.2,· · · ,1.0}on all
datasets. The network parameter γ(for regularization) is
TABLE 3: Performance comparison of different clustering methods.
Datasets Methods NMI ACC F-measure RI
ANIMAL
LRRBestSV 34.59 ±0.60 28.83 ±0.33 16.99 ±0.47 96.36 ±0.31
RMSC 70.46 ±1.84 61.58 ±4.50 54.30 ±4.16 97.95 ±0.35
DiMSC 44.62 ±0.89 32.61 ±1.81 20.66 ±1.10 96.30 ±0.23
LT-MSC 41.29 ±0.40 33.65 ±0.67 21.65 ±0.49 96.53 ±0.16
t-SVD-MSC 70.66 ±0.19 63.44 ±0.23 54.40 ±0.26 97.91 ±0.01
DSSC – – – –
MLAP 69.98 ±0.03 63.32 ±0.06 52.61 ±0.11 97.88 ±0.18
MSSC 66.93 ±0.35 59.24 ±0.32 50.12 ±0.15 97.22 ±0.02
lLMSC 70.11 ±0.25 59.86 ±0.29 51.90 ±0.64 97.86 ±0.01
gLMSC 72.66 ±0.35 64.47 ±0.44 54.54 ±0.37 97.97 ±0.08
CALTECH
LRRBestSV 77.59 ±1.23 52.58 ±2.00 36.86 ±1.82 97.48 ±0.70
RMSC 81.41 ±1.57 56.02 ±2.10 27.35 ±2.63 97.58 ±0.56
DiMSC 63.72 ±0.99 37.09 ±1.81 25.47 ±2.11 97.02 ±0.34
LT-MSC 80.38 ±1.58 56.02 ±1.11 39.86 ±1.26 97.59 ±0.73
t-SVD-MSC 81.51 ±1.40 56.60 ±0.79 40.43 ±1.10 97.58 ±0.46
DSSC – – – –
MLAP 82.03 ±1.09 57.62 ±1.57 42.30 ±0.77 97.57 ±0.36
MSSC 78.14 ±0.45 55.90 ±0.66 42.11 ±0.29 97.02 ±0.07
lLMSC 76.26 ±1.11 52.84 ±1.30 37.72 ±0.96 97.48 ±0.22
gLMSC 81.63 ±1.10 59.68 ±0.60 41.90 ±0.41 97.68 ±0.28
TABLE 4: Performance comparison between single view and the learned latent representation.
Datasets Methods NMI ACC F-measure RI
ADNI
View1 10.06 ±0.20 48.33 ±0.34 40.42 ±0.09 55.32 ±0.15
View2 2.33 ±0.16 41.50 ±0.70 38.28 ±0.38 54.12 ±0.06
GCCA 1.49 ±0.21 41.94 ±0.37 41.43 ±0.14 52.45 ±0.06
DCCA 3.93 ±0.32 37.33 ±0.64 41.37 ±0.25 54.08 ±0.13
Latent(lLMSC) 10.21 ±0.11 44.72 ±0.51 46.58 ±0.35 54.85 ±0.40
Latent(gLMSC) 11.15 ±0.39 45.00 ±0.29 46.38 ±0.70 56.31 ±0.17
Reuters
View1 19.89 ±6.56 41.74 ±7.00 39.26 ±4.12 51.05 ±6.77
View2 16.64 ±7.35 40.94 ±6.65 34.19 ±2.56 51.03 ±3.21
View3 21.18 ±8.58 42.22 ±3.85 36.05 ±4.04 58.56 ±5.92
GCCA 28.18 ±5.26 37.43 ±4.00 33.96 ±2.73 69.58 ±3.68
DCCA 17.40 ±3.57 42.67 ±5.39 36.79 ±3.86 54.37 ±5.11
Latent(lLMSC) 31.21 ±0.65 38.98 ±2.77 39.32 ±1.20 60.71 ±4.02
Latent(gLMSC) 31.23 ±4.82 42.94 ±3.63 39.79 ±2.92 68.76 ±4.11
Football
View1 64.13 ±2.31 52.82 ±2.42 29.61 ±2.66 85.31 ±1.38
View2 67.21 ±2.35 62.10 ±2.24 38.17 ±3.32 89.04 ±1.07
View8 62.65 ±2.33 50.81 ±2.36 25.72 ±2.45 85.93 ±1.03
GCCA 39.87 ±1.42 25.81 ±2.07 12.25 ±2.67 63.41 ±1.44
DCCA 79.56 ±1.99 64.19 ±2.14 54.08 ±2.46 94.35 ±0.73
Latent(lLMSC) 70.61 ±2.40 61.69 ±3.71 44.81 ±2.19 93.56 ±1.17
Latent(gLMSC) 83.29 ±1.95 70.56 ±1.32 66.69 ±1.72 95.56 ±0.89
Politicsie
View1 56.47 ±1.86 45.11 ±1.91 48.76 ±1.90 77.61 ±0.76
View2 44.04 ±2.13 43.97 ±1.53 36.22 ±2.86 68.10 ±1.22
View8 18.01 ±1.69 39.37 ±2.05 34.33 ±3.45 62.04 ±1.01
GCCA 20.65 ±2.67 52.30 ±1.88 41.29 ±2.09 42.60 ±1.43
DCCA 56.19 ±1.07 60.06 ±1.55 49.31 ±2.62 78.42 ±1.23
Latent(lLMSC) 72.36 ±2.17 67.58 ±1.99 60.28 ±2.00 83.81 ±1.46
Latent(gLMSC) 74.10 ±2.81 68.10 ±2.72 64.86 ±2.58 85.26 ±1.20
MSRCV1
View1 51.95 ±3.12 54.00 ±5.94 47.91 ±4.30 83.80 ±0.41
View3 62.03 ±0.72 70.42 ±0.51 58.95 ±0.85 88.39 ±0.13
View4 53.45 ±1.41 60.63 ±1.69 49.79 ±2.13 85.57 ±1.46
GCCA 62.51 ±2.12 69.05 ±1.57 58.48 ±1.64 86.89 ±0.85
DCCA 41.20 ±0.16 54.29 ±0.70 38.32 ±0.38 81.63 ±0.06
Latent(lLMSC) 71.67 ±1.31 80.76 ±1.27 68.92 ±1.76 90.64 ±0.96
Latent(gLMSC) 72.99 ±1.36 82.36 ±1.41 70.15 ±1.66 91.37 ±0.55
BBCSport
View1 59.64 ±17.04 64.17 ±15.26 62.43 ±13.46 73.73 ±15.41
View2 23.17 ±16.86 44.85 ±8.47 44.47 ±7.80 42.69 ±12.36
GCCA 59.59 ±7.78 75.92 ±3.89 70.50 ±5.57 84.92 ±6.65
DCCA 35.52 ±12.63 64.52 ±6.98 48.57 ±6.63 76.81 ±11.54
Latent(lLMSC) 62.18 ±12.45 66.66 ±12.15 63.96 ±12.18 76.72 ±12.30
Latent(gLMSC) 76.13 ±13.21 77.21 ±13.68 73.32 ±12.98 87.02 ±11.55
fixed to 0.001. For the baseline approaches, we tune all the
parameters to report their best performances according to
the authors. The dimensionality of the latent representation
is relatively robust hence we set k= 100 for all datasets,
which results in promising performance. Due to random-
ness, we run all algorithms 30 times and report the mean
values and standard deviations.
4.2 Results on Synthetic Data
Firstly, we evaluate our algorithm in exploring multiple
views on synthetic data. In our experiment, the randomly
generated matrices are produced by independently sam-
pling elements from a uniform distribution within the range
[0,1]. The synthetic data are from 6 subspaces/clusters
with the sample numbers corresponding to these subspaces
being {25, 30, 35, 40, 45, 50}, respectively. First, the latent
representation matrix HRk×nis generated randomly,
with the number of dimensions k= 90 and the number
of data points n= 225. These subspaces have 10, 12, 14, 16,
18, and 20 disjoint features, respectively. Then, based on the
latent representation matrix Htwo views are produced with
X(v)=P(v)H+E(v). Two types of noise are considered
for E(v):E(v)=E(v)
s+αE(v)
g, where E(v)
gand E(v)
sare
global and sample-specific noises, respectively. For E(v)
s, we
randomly select a subset of columns (20 in our experiments),
and set the other columns to zeros. For E(v)
g, we multiply it
with a scalar 0< α < 1to tune the noise degree. In Fig.
3(a), benefiting from the complementarity of multiple views
our approach obtains much better performance compared
to that using a single view of features with different degrees
of noise. In Fig. 3(b), we provide a visualization of affinity
matrices for both single view and multiple views with
α= 0.5. The affinity matrix of multiple views reveals the
underlying cluster structure much better than using a single
view.
4.3 Results on Real Datasets
We next test our model on diverse real-world applications
including medical image/general image clustering, commu-
nity detection, and text clustering. Tables 2-3 present the
clustering results of different clustering approaches. From
Tables 2-3, the following observations can be made: (1)
overall, lLMSC achieves very competitive and stable per-
formance compared to most baselines. Taking the datasets
Reuters and Politicsie for example, lLMSC outperforms
all the traditional methods; (2) by exploring the general
correlation with a neural network, gLMSC significantly im-
proves lLMSC on 6 out of 8 datasets. For example, the NMI
improvements of gLMSC over lLMSC are about 5.3% and
6.1% on Football and BBCSport, respectively; The potential
reasons why gLMSC does not always outperform lLMSC
may be: first, for some cases (e.g., Reuters: each document
is associated with multiple types of languages), the linear
model is enough to model the correlations among different
views; second, although gLMSC is more general than lLM-
SC, there is no global optimal solution guaranteed for both
gLMSC and lLMSC; (3) although the performance of our
method is not always top, the performance is rather robust
across different datasets, while the performance of some
methods is very unpredictable and variable. For example,
MLAP achieves the promising performance on ADNI and
CALTECH. However, on Reuters and BBCSport, MLAP
does not perform very well; (4) we also compared our al-
gorithm with Deep Sparse Subspace Clustering (DSSC) [60]
6
0 0.0001 0.001 0.01 0.1 1 10
Clustering Performance
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
NMI
ACC
k (dimensionality of latent representation)
10 20 30 40 50 60 70 80 90 100
Clustering Performance
0.65
0.7
0.75
0.8
0.85
0.9
0.95 NMI
ACC
(a) LMSC
6
0 0.0001 0.001 0.01 0.1
Clustering Performance
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
NMI
ACC
k (dimensionality of latent representation)
10 20 30 40 50 60 70 80 90 100
Clustering Performance
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98 NMI
ACC
(b) gLMSC
Fig. 5: Results of our method when using different parame-
ters: λ(top row) and k(bottom row).
and t-SVD-MSC [59], while the performance of gLMSC is
consistently better than them. For example, the performance
improvements over t-SVD-MSC are about 6.1% and 3.3%
on the two community datasets, i.e., Football and Politicsie,
in terms of accuracy. The method t-SVD-MSC emphasizes
the consistence over different views due to the low-rank
constraint, while it is a challenge for it to balance the con-
sistence and the complementarity. While our algorithm can
handle this issue due to the flexible encoding of the intrinsic
information from different views; (5) the performance of
single-view methods with the best view is generally worse
than multi-view methods, confirming that it is useful to
incorporate multiple views.
Is the latent representation good? To investigate the
improvement gains of our approach, we compare the la-
tent representation of our algorithm, Generalized Canonical
Correlational Analysis (GCCA) [63], Deep Canonical Cor-
relation Analysis (DCCA) [43] and features of each single
view by conducting k-means over them. As shown in Table
4, the performance using our latent representation is gen-
erally better than those using single-view features. This is
empirical proof of the added value of the latent representa-
tion compared to the original features. Although nonlinear
correlations are involved in the CCA-based algorithms, i.e.,
DCCA and GCCA, the performances are not promising
compared with ours. One of the main possible reason is that
the representation learning and clustering are separated for
these algorithms, thus the learned representations are not
guaranteed to be suitable for clustering. Furthermore, we
visualize the features of each view and the latent representa-
tion using t-Distributed Stochastic Neighbor Embedding (t-
SNE) [64] on MSRCV1. As shown in Fig. 4, the visualization
is consistent with the clustering results shown in Table
4. Specifically, Fig. 4(c)(corresponding to view3) and (d)
(corresponding to view4) more clearly reveal the underlying
cluster structure, and the corresponding clustering perfor-
mances are also much better than other views. Fig. 4(g) and
(h) (corresponding to latent representation) further validate
the advantage of our model, since the clusters are more
compact and separable than those of the original features
corresponding to different views.
Parameter tuning and convergence. Fig. 5 shows the
results of our method using different parameters (taking
(a) View1:CENTRIST (b) View2:CMT (c) View3:GIST (d) View4:HOG
(e) View5:LBP (f) View6:SIFT (g) Latent Representation (lLMSC) (h) Latent Representation (gLMSC)
Fig. 4: Visualization of different views and latent representation with t-SNE on the MSRCV1 dataset.
BBCSport as an example). The performances of our lin-
ear and generalized models are both relatively stable and
promising, as shown by the results achieved by setting λin
a relatively large range. The bottom of Fig. 5 presents model
performance with respect to dimensionality (k) of the la-
tent representation. Promising performance can be expected
with relatively low dimensionality. Moreover, while gLMSC
needs a latent representation of higher dimensionality than
that of lLMSC, its performance is generally better because
the more general correlation is addressed. Fig. 6 empirically
shows that our algorithms converge within a small number
of iterations.
Iteration
0 5 10 15 20 25 30 35 40 45 50
Stop criteria
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1X=PH+EV
H=HZ+ES
J=Z
(a) Linear LMSC
Iteration
0 5 10 15 20 25 30 35
Stop criteria
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1J=Z
(b) Generalized LMSC
Fig. 6: Convergence of our method. For a better view, the
plots are normalized into the range [0, 1].
5 CONCLUSIONS AND DISCUSSION
Here we introduce the latent representation into multi-
view subspace clustering. Our model effectively encodes
complementarity in multiple views for subspace clustering
under the assumption - each single feature view origi-
nates from one comprehensive latent representation. This
is essentially different from existing multi-view subspace
clustering approaches that perform self-representation di-
rectly within the single view or simply project each view
of feature to a common space. The latent representation
and the self-representation-based clustering complement
each other. More importantly, by using a neural network-
based approach to learn non-linear mappings, our model
can handle more general correlations between the latent
representation and each feature view. Experiments on both
synthetic and benchmark datasets verify the clear advan-
tages of the learned latent representation for multi-view
subspace clustering compared to the state-of-the-art multi-
view clustering methods.
Our model is able to flexibly explore the complementari-
ty among multiple views for subspace clustering. However,
there are several issues that require further clarifications and
possible future investigations. Firstly, since graph (of the
size n×n) is involved for existing subspace clustering meth-
ods which leads to computational cost matrix operations.
The time complexities of these subspace based clustering
methods are generally in the same order. Specifically, SVD
decomposition and matrix inversion are employed in our
method which makes our algorithm with high computation-
al cost. In the future, sampling technique and binary repre-
sentations [65] will be considered to accelerate the clustering
speed. Second, the quality differences for different views are
not considered. The performance could be degraded, when
low-quality views are more dominating.
ACKNOWLEDGMENT
This work was partly supported by National Natural
Science Foundation of China (No. 61602337, 61732011,
61432011, U1435212, U1636214 and 61733007), NIH Projects
(No. CA206100 and MH100217) and Australian Research
Council Projects (No. FL-170100117 and DP-180103424).
REFERENCES
[1] L. Parsons, E. Haque, and H. Liu, “Subspace clustering for high
dimensional data: a review,” Acm Sigkdd Explorations Newsletter,
vol. 6, no. 1, pp. 90–105, 2004.
[2] R. Vidal, “Subspace clustering,” IEEE Signal Processing Magazine,
vol. 28, no. 2, pp. 52–68, 2011.
[3] P. S. Bradley and O. L. Mangasarian, “K-plane clustering,” Journal
of Global Optimization, vol. 16, no. 1, pp. 23–32, 2000.
[4] L. Lu and R. Vidal, “Combined central and subspace clustering for
computer vision applications,” in ICML. ACM, 2006, pp. 593–600.
[5] R. Vidal, Y. Ma, and S. Sastry, “Generalized principal component
analysis (gpca),” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 27, no. 12, pp. 1945–1959, 2005.
[6] J. P. Costeira and T. Kanade, “A multibody factorization method
for independently moving objects,” International Journal of Comput-
er Vision, vol. 29, no. 3, pp. 159–179, 1998.
[7] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering:
Analysis and an algorithm,” in NIPS, 2002, pp. 849–856.
[8] J. Shi and J. Malik, “Normalized cuts and image segmentation,”
IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 22, no. 8, pp. 888–905, 2000.
[9] E. Elhamifar and R. Vidal, “Sparse subspace clustering: Algorithm,
theory, and applications,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 35, no. 11, pp. 2765–2781, 2013.
[10] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, “Robust recovery
of subspace structures by low-rank representation,” IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp.
171–184, 2013.
[11] H. Hu, Z. Lin, J. Feng, and J. Zhou, “Smooth representation
clustering,” in CVPR, 2014, pp. 3834–3841.
[12] J. Feng, Z. Lin, H. Xu, and S. Yan, “Robust subspace segmentation
with block-diagonal prior,” in CVPR, 2014, pp. 3818–3825.
[13] G. Liu, Z. Lin, and Y. Yu, “Robust subspace segmentation by low-
rank representation,” in ICML, 2010, pp. 663–670.
[14] M. Yin, Y. Guo, J. Gao, Z. He, and S. Xie, “Kernel sparse subspace
clustering on symmetric positive definite manifolds,” in CVPR,
2016, pp. 5157–5164.
[15] X. Cao, C. Zhang, H. Fu, S. Liu, and H. Zhang, “Diversity-induced
multi-view subspace clustering,” in CVPR, 2015, pp. 586–594.
[16] C. Zhang, H. Fu, S. Liu, G. Liu, and X. Cao, “Low-rank tensor
constrained multiview subspace clustering,” in ICCV, 2015, pp.
1582–1590.
[17] H. Gao, F. Nie, X. Li, and H. Huang, “Multi-view subspace
clustering,” in ICCV, 2015, pp. 4238–4246.
[18] Y. Guo, “Convex subspace representation learning from multi-
view data,” in AAAI, 2013.
[19] M. White, X. Zhang, D. Schuurmans, and Y.-l. Yu, “Convex multi-
view subspace learning,” in NIPS, 2012, pp. 1673–1681.
[20] C. Zhang, Q. Hu, H. Fu, P. Zhu, and X. Cao, “Latent multi-view
subspace clustering,” in CVPR, 2017, pp. 4279–4287.
[21] C. Xu, D. Tao, and C. Xu, “Large-margin multi-viewinformation
bottleneck,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 36, no. 8, pp. 1559–1572, 2014.
[22] W. Wang, R. Arora, K. Livescu, and J. Bilmes, “On deep multi-view
representation learning,” in ICML, 2015, pp. 1083–1092.
[23] R. Arora and K. Livescu, “Multi-view learning with supervision
for transformed bottleneck features,” in ICASSP, 2014, pp. 2499–
2503.
[24] Y. Luo, D. Tao, K. Ramamohanarao, C. Xu, and Y. Wen, “Tensor
canonical correlation analysis for multi-view dimension reduc-
tion,” IEEE Transactions on Knowledge and Data Engineering, vol. 27,
no. 11, pp. 3111–3124, 2015.
[25] J. He and R. Lawrence, “A graph-based framework for multi-task
multi-view learning,” in ICML, 2011, pp. 25–32.
[26] V. R. de Sa, “Spectral clustering with two views,” in ICML workshop
on learning with multiple views, 2005, pp. 20–27.
[27] J. Liu, C. Wang, J. Gao, and J. Han, “Multi-view clustering via joint
nonnegative matrix factorization,” in SDM, vol. 13. SIAM, 2013,
pp. 252–260.
[28] W. Tang, Z. Lu, and I. S. Dhillon, “Clustering with multiple
graphs,” in ICDM, 2009, pp. 1016–1021.
[29] A. Kumar, P. Rai, and H. Daume, “Co-regularized multi-view
spectral clustering,” in NIPS, 2011, pp. 1413–1421.
[30] A. Kumar and H. Daum ´
e, “A co-training approach for multi-view
spectral clustering,” in ICML, 2011, pp. 393–400.
[31] X. Cai, F. Nie, and H. Huang, “Multi-view k-means clustering on
big data,” in IJCAI, 2013, pp. 2598–2604.
[32] C. Cortes, M. Mohri, and A. Rostamizadeh, “Learning non-linear
combinations of kernels,” in NIPS, 2009, pp. 396–404.
[33] G. Tzortzis and A. Likas, “Kernel-based weighted multi-view
clustering,” in ICDM, 2012, pp. 675–684.
[34] C. Zhang, H. Fu, Q. Hu, P. Zhu, and X. Cao, “Flexible multi-
view dimensionality co-reduction,” IEEE Transactions on Image
Processing, vol. 26, no. 2, pp. 648–659, 2017.
[35] J. Tang, X. Hu, H. Gao, and H. Liu, “Unsupervised feature se-
lection for multi-view data in social media,” in SDM, 2013, pp.
270–278.
[36] C. K, K. S. M, and L. K, “Multi-view clustering via canonical
correlation analysis,” in ICML, 2009, pp. 129–136.
[37] K. Chaudhuri, S. M. Kakade, K. Livescu, and K. Sridharan, “Multi-
view clustering via canonical correlation analysis,” in ICML, 2009,
pp. 129–136.
[38] V. M. Patel, H. Van Nguyen, and R. Vidal, “Latent space sparse
subspace clustering,” in ICCV, 2013, pp. 225–232.
[39] G. Liu and S. Yan, “Latent low-rank representation for subspace
segmentation and feature extraction,” in ICCV, 2011, pp. 1615–
1622.
[40] M. Abavisani and V. Patel, “Domain adaptive subspace cluster-
ing.” in BMVC, 2016.
[41] A. Sharma, A. Kumar, H. Daume, and D. W. Jacobs, “Generalized
multiview analysis: A discriminative latent space,” in CVPR, 2012,
pp. 2160–2167.
[42] P. Rastogi, B. Van Durme, and R. Arora, “Multiview lsa: Rep-
resentation learning via generalized cca,” in Proceedings of the
2015 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, 2015, pp.
556–566.
[43] G. Andrew, R. Arora, J. Bilmes, and K. Livescu, “Deep canonical
correlation analysis,” in ICML, 2013, pp. 1247–1255.
[44] X. Peng, S. Xiao, J. Feng, W.-Y. Yau, and Z. Yi, “Deep subspace
clustering with sparsity prior.” in IJCAI, 2016, pp. 1925–1931.
[45] P. Ji, T. Zhang, H. Li, M. Salzmann, and I. Reid, “Deep subspace
clustering networks,” in NIPS, 2017.
[46] C. Lang, G. Liu, J. Yu, and S. Yan, “Saliency detection by multitask
sparsity pursuit,” IEEE Transactions on Image Processing, vol. 21,
no. 3, pp. 1327–1338, 2012.
[47] Z. Lin, R. Liu, and Z. Su, “Linearized alternating direction method
with adaptive penalty for low-rank representation,” in NIPS, 2011,
pp. 612–620.
[48] G. Wahba, “A least squares estimate of satellite attitude,” SIAM
review, vol. 7, no. 3, pp. 409–409, 1965.
[49] R. H. Bartels and G. Stewart, “Solution of the matrix equation A X+
XB= C,” Communications of the ACM, vol. 15, no. 9, pp. 820–826,
1972.
[50] J.-F. Cai, E. J. Cand`
es, and Z. Shen, “A singular value thresholding
algorithm for matrix completion,” SIAM Journal on Optimization,
vol. 20, no. 4, pp. 1956–1982, 2010.
[51] S. Gu, L. Zhang, W. Zuo, and X. Feng, “Projective dictionary pair
learning for pattern classification,” in NIPS, 2014.
[52] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly,
A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath et al., “Deep
neural networks for acoustic modeling in speech recognition: The
shared views of four research groups,” IEEE Signal Processing
Magazine, vol. 29, no. 6, pp. 82–97, 2012.
[53] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifi-
cation with deep convolutional neural networks,” in NIPS, 2012,
pp. 1097–1105.
[54] M.-R. Amini, N. Usunier, and C. Goutte, “Learning from multiple
partially observed views - an application to multilingual text
categorization,” in NIPS, 2009.
[55] J. Xu, J. Han, and F. Nie, “Discriminatively embedded k-means for
multi-view clustering,” in CVPR, 2016, pp. 5356–5364.
[56] R. Xia, Y. Pan, L. Du, and J. Yin, “Robust multi-view spectral
clustering via low-rank and sparse decomposition,” in AAAI, 2014,
pp. 2149–2155.
[57] C. H. Lampert, H. Nickisch, and S. Harmeling, “Attribute-based
classification for zero-shot visual object categorization,” IEEE Tran-
s. Pattern Anal. Mach. Intell., vol. 36, no. 3, pp. 453–465, 2014.
[58] K. Simonyan and A. Zisserman, “Very deep convolutional net-
works for large-scale image recognition,” arXiv preprint arX-
iv:1409.1556, 2014.
[59] Y. Xie, D. Tao, W. Zhang, L. Zhang, Y. Liu, and Y. Qu, “On unifying
multi-view self-representations for clustering by tensor multi-rank
minimization,” International Journal of Computer Vision, 2018.
[60] X. Peng, J. Feng, S. Xiao, J. Lu, Z. Yi, and S. Yan, “Deep sparse
subspace clustering,” arXiv preprint arXiv:1709.08374, 2017.
[61] B. Cheng, G. Liu, J. Wang, Z. Huang, and S. Yan, “Multi-task low-
rank affinity pursuit for image segmentation,” in ICCV, 2011, pp.
2439–2446.
[62] M. Abavisani and V. M. Patel, “Multimodal sparse and low-rank
subspace clustering,” Information Fusion, vol. 39, pp. 168–177, 2018.
[63] J. R. Kettenring, “Canonical analysis of several sets of variables,”
Biometrika, vol. 58, no. 3, pp. 433–451, 1971.
[64] L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,”
Journal of Machine Learning Research, vol. 9, no. Nov, pp. 2579–2605,
2008.
[65] Z. Zhang, L. Liu, F. Shen, H. T. Shen, and L. Shao, “Binary multi-
view clustering,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, 2018.
Changqing Zhang received his B.S. and M.S.
degrees from the College of Computer Science,
Sichuan University, Chengdu, China, in 2005
and 2008, respectively, and the Ph.D. degree in
Computer Science from Tianjin University, Chi-
na, in 2016. He is an assistant professor at the
School of Computer Science and Technology,
Tianjin University. His current research interests
include machine learning, computer vision and
medical image analysis.
Huazhu Fu received the B.S. degree in Math-
ematical Sciences from Nankai University in
2006, the M.E. degree in Mechatronics Engi-
neering from Tianjin University of Technology in
2010, and the Ph.D. degree in Computer Sci-
ence from Tianjin University, China, in 2013.
He was a research fellow with Nanyang Tech-
nological University (NTU), Singapore, for two
years. Currently, he is a Research Scientist with
Institute for Infocomm Research, at Agency for
Science, Technology and Research, Singapore.
His research interests include computer vision, image processing, and
medical image analysis. He is the Associate Editor of BMC Medical
Imaging.
Qinghua Hu (SM’13) received the B.S., M.S.,
and Ph.D. degrees from the Harbin Institute of
Technology, Harbin, China, in 1999, 2002, and
2008, respectively. He was a Post-Doctoral Fel-
low with the Department of Computing, Hong
Kong Polytechnic University, Hong Kong, from
2009 to 2011. He is currently a Full Profes-
sor there. He has authored over 150 journal
and conference papers in the areas of granular
computing-based machine learning, reasoning
with uncertainty, pattern recognition, and fault
diagnosis. His current research interests include multi-modality learning,
metric learning, uncertainty modeling and reasoning with fuzzy sets,
rough sets and probability theory. Prof. Hu was the Program Committee
Co-Chair of the International Conference on Rough Sets and Current
Trends in Computing in 2010, the Chinese Rough Set and Soft Com-
puting Society in 2012 and 2014, and the International Conference on
Rough Sets and Knowledge Technology and the International Confer-
ence on Machine Learning and Cybernetics in 2014, and the General
Co-Chair of IJCRS 2015.
Xiaochun Cao (SM’14) received the B.E. and
M.E. degrees in computer science from Beihang
University, Beijing, China, and the Ph.D. degree
in computer science from the University of Cen-
tral Florida, Orlando, FL, USA. He has been a
Professor with the Institute of Information Engi-
neering, Chinese Academy of Sciences, Beijing,
China, since 2012. After graduation, he spent
about three years at ObjectVideo Inc. as a Re-
search Scientist. From 2008 to 2012, he was a
Professor with Tianjin University, Tianjin, China.
He has authored and coauthored more than 120 journal and conference
papers. Prof. Cao is a Fellow of the IET. He is on the Editorial Board of
the IEEE TRANSACTIONS OF IMAGE PROCESSING. His dissertation
was nominated for the University of Central Floridas university-level
Outstanding Dissertation Award. In 2004 and 2010, he was the recipient
of the Piero Zamperoni Best Student Paper Award at the International
Conference on Pattern Recognition.
Yuan Xie (M’12) received the Ph.D. degree
in Pattern Recognition and Intelligent Systems
from the Institute of Automation, Chinese Acade-
my of Sciences (CAS), in 2013. He is currently
an associated professor with the Research Cen-
ter of Precision Sensing and Control, Institute
of Automation, CAS. His research interests in-
clude image processing, computer vision, ma-
chine learning and pattern recognition. He has
published around 30 papers in major internation-
al journals including the IJCV, IEEE TIP, TNNLS,
TCYB, TCSVT, TGRS, TMM, etc. He also has served as a reviewer for
more than 15 journals and conferences. Dr. Xie received the Hong Kong
Scholar Award from the Society of Hong Kong Scholars and the China
National Postdoctoral Council in 2014.
Dacheng Tao (F’15) is Professor of Computer
Science and ARC Laureate Fellow in the School
of Information Technologies and the Faculty of
Engineering and Information Technologies, and
the Inaugural Director of the UBTECH Sydney
Artificial Intelligence Centre, at the University of
Sydney. He mainly applies statistics and math-
ematics to Artificial Intelligence and Data Sci-
ence. His research results have expounded in
one monograph and 200+ publications at presti-
gious journals and prominent conferences, such
as IEEE T-PAMI, T-IP, T-NNLS, IJCV, JMLR, NIPS, ICML, CVPR, ICCV,
ECCV, ICDM; and ACM SIGKDD, with several best paper awards, such
as the best theory/algorithm paper runner up award in IEEE ICDM’07,
the best student paper award in IEEE ICDM’13, the distinguished paper
award in the 2018 IJCAI, the 2014 ICDM 10-year highest-impact paper
award, and the 2017 IEEE Signal Processing Society Best Paper Award.
He is a Fellow of the Australian Academy of Science, AAAS, IEEE, IAPR,
OSA and SPIE.
Dong Xu (F’17) received the B.E. and Ph.D.
degrees from University of Science and Tech-
nology of China, in 2001 and 2005, respectively.
While pursuing his Ph.D., he was with Microsoft
Research Asia, Beijing, China, and the Chinese
University of Hong Kong, Shatin, Hong Kong, for
more than two years. He was a PostDoctoral Re-
search Scientist with Columbia University, New
York, NY, for one year. He worked as a faculty
member with Nanyang Technological University,
Singapore. Currently, he is a Professor and Chair
in Computer Engineering with the School of Electrical and Information
Engineering, the University of Sydney, Australia. His current research
interests include computer vision, statistical learning, and multimedia
content analysis. Dr. Xu was the co-author of a paper that won the Best
Student Paper Award in the IEEE International Conference on Computer
Vision and Pattern Recognition (CVPR) in 2010, and a paper that won
the Prize Paper Award in IEEE Transactions on Multimedia (TMM) in
2014.
... Various algorithms have been developed with improved clustering performance [12,14]. However, in practical applications, data often originate from various sources [15]. As an example, we can utilize various features such as texts, images, or other types of characteristics to depict an object [15]. ...
... However, in practical applications, data often originate from various sources [15]. As an example, we can utilize various features such as texts, images, or other types of characteristics to depict an object [15]. These features are considered as multi-view representations of the object, providing a comprehensive and diverse perspective on the data and resulting in multi-view data. ...
... To tackle this challenge, multi-view clustering techniques have been thoroughly investigated over the past decade, yielding promising clustering performance [16][17][18][19][20]. Existing methods can be broadly categorized into two types based on their approach to constructing the affinity matrix [21], which are briefly summarized in Figure 1. The first type constructs the affinity matrix directly from similarity matrices [16][17][18]22], while the second type, known as multi-view subspace clustering, builds on spectral clustering-based subspace clustering techniques [15,19,23]. For instance, LT-MSC extends the traditional low-rank representation framework by incorporating a tensor nuclear norm constraint to explore cross-view correlations and global structures [24]. ...
Article
Full-text available
Subspace clustering has emerged as a prominent research focus, demonstrating remarkable potential in handling multi-view data by effectively harnessing their diverse and information-rich features. In this study, we present a novel framework for multi-view subspace clustering that addresses several critical aspects of the problem. Our approach introduces three key innovations: First, we propose a dual-component representation model that simultaneously considers both consistent and inconsistent elements across different views. The consistent component is designed to capture shared structural patterns with robust commonality, while the inconsistent component effectively models view-specific variations through sparsity constraints across multiple modes. Second, we implement cross-mode sparsity constraints that enable the inconsistent component to efficiently extract high-order information from the data. This design not only enhances the representation capability of the inconsistent component but also facilitates the consistent component in revealing high-order structural relationships within the data. Third, we develop an adaptive loss function that offers greater flexibility in handling noise and outliers, thereby significantly improving the model’s robustness in real-world applications. Through extensive experimentation, we demonstrate that our proposed method consistently outperforms existing approaches, achieving superior clustering performance across various benchmark datasets. The experimental results comprehensively validate the effectiveness and advantages of our approach in terms of clustering accuracy, robustness, and computational efficiency.
... Subspace learning based on self-representation is an important technique in many data processing applications, where each data point can be represented by a linear combination of other points (including itself) in the same latent subspace. Along this route, many real-world clustering tasks have gained a lot of traction [14][15][16][17][18]. Accordingly, combining MKC with subspace learning, multiple kernel subspace clustering (MKSC), are proposed in recent years [19][20][21][22][23][24]. ...
... By taking the derivative of (14) to zero, the closed-form solution of S (r) can be obtained by ...
Article
Full-text available
Multiple kernel clustering (MKC) excels at integrating information from multiple kernels for effective data clustering. Multiple kernel subspace clustering (MKSC) further enhances this by incorporating subspace learning, achieving significant performance improvements. However, most of existing MKSC methods often learn the affinity graphs directly and overlook the rich high-order correlations between different kernels, leading to suboptimal clustering results. This paper introduces MKSC-DT, a novel approach that addresses this limitation by utilizing dual tensors to capture high-order correlations in both feature space and semantic space. MKSC-DT learns multiple new kernels in original feature spaces and projects them onto a clean subspace to generate candidate affinity graphs. These new kernels and candidate graphs are then stacked into two third-order tensors, enabling the exploration of high-order relationships. An efficient alternating optimization algorithm is proposed to solve the resulting objective function. Extensive experiments on benchmark datasets demonstrate the superiority of MKSC-DT compared to state-of-the-art MKC methods, showcasing its effectiveness in leveraging high-order correlations for improved clustering performance.
... Traditional multi-view learning methods enhanced modeling performance by integrating same-type information from different perspectives within the same dataset (e.g., images and text), but they were constrained by explicit feature fusion and homogeneous data representations [27]. Unlike traditional multi-modal learning, the binary multi-view clustering method presented in [28] is capable of simultaneously processing heterogeneous data from different modalities (e.g., text, speech, etc.) and, by modeling the complex nonlinear relationships between these modalities, it achieves deeper semantic understanding and improved generalization capabilities [29]. Recently, research on the latent multi-view subspace clustering has mainly focused on integrating features from different perspectives to uncover low-dimensional representations of the data, with the aim of improving clustering accuracy and addressing issues of heterogeneity and data inconsistency across views. ...
... Recently, research on the latent multi-view subspace clustering has mainly focused on integrating features from different perspectives to uncover low-dimensional representations of the data, with the aim of improving clustering accuracy and addressing issues of heterogeneity and data inconsistency across views. Zhang et al. proposed a new multi-view subspace clustering model based on latent representation [29]. Compared to the existing methods, this model simultaneously searches for latent representations while exploiting the complementary information across multiple views. ...
Article
Full-text available
Noises and outliers often degrade the final prediction performance in practical data processing. Multi-view learning by integrating complementary information across heterogeneous modalities has become one of the core techniques in the field of machine learning. However, existing methods rely on explicit-view clustering and stringent alignment assumptions, which affect the effectiveness in addressing the challenges such as inconsistencies between views, noise interference, and misalignment across different views. To alleviate these issues, we present a latent multi-view representation learning model based on low-rank embedding by implicitly uncovering the latent consistency structure of data, which allows us to achieve robust and efficient multi-view feature fusion. In particular, we utilize low-rank constraints to construct a unified latent subspace representation and introduce an adaptive noise suppression mechanism that significantly enhances robustness against outliers and noise interference. Moreover, the Augmented Lagrangian Multiplier Alternating Direction Minimization (ALM-ADM) framework enables efficient optimization of the proposed method. Experimental results on multiple benchmark datasets demonstrate that the proposed approach outperforms existing state-of-the-art methods in both clustering performance and robustness.
... In the real world, data with advanced information can be obtained from multiple views [1][2][3]. For instance, news can be broadcasted in different languages such as Chinese, English, and French. ...
Article
Full-text available
In the field of multiview clustering, how to make full use of information from multiple data sources to improve the clustering performance has become a hot research topic. However, the rapid growth of high‐dimensional multiview data brings great challenges to the research of multiview clustering algorithms, especially the time and space complexity of the algorithms. As an effective solution, anchor‐based technique has gained wide attention in large‐scale multiview clustering tasks. Nevertheless, the current anchor‐based methods fail to fully take into account the importance of different views and the difference and diversity of anchors at the same time, which limits the clustering performance to some extent. To address these problems, we propose a dual‐weighted multiview clustering based on anchor (DwMVCA). First, we effectively distinguish the different impacts of high‐quality and low‐quality views on clustering by adaptively learning the weights of different views. Second, by introducing the adaptive weighting matrix of anchors and self‐correlation matrix regularization term, the difference and diversity of anchors are fully considered to effectively reduce the effect of redundant information on clustering. Furthermore, we design a three‐step alternating optimization algorithm to solve the resultant optimization problem and prove its convergence. Extensive experimental results show that the proposed DwMVCA has obvious advantages in clustering performance on large‐scale datasets, especially on datasets with more than 100,000 samples that still maintain linear time complexity.
... In this section, we classify existing MVC representation methods into three categories based on the differences in their approaches: (1) Subspace-based Multi-view Clustering: Multi-view subspace clustering methods learn a unified subspace representation from the specific subspaces of all views. In recent years, some methods have explored the potential representations of multiple views through subspace learning, effectively leveraging the complementary characteristics of multi-view data 17,18 25 . Based on this, the improved DEC avoids the "collapse" problem of deep models by introducing a tradeoff between clustering and reconstructing objects 26 . ...
Article
Full-text available
Multi-view Clustering (MVC) has gained significant attention in recent years due to its ability to explore consensus information from multiple perspectives. However, traditional MVC methods face two major challenges: (1) how to alleviate the representation degeneration caused by the process of achieving multi-view consensus information, and (2) how to learn discriminative representations with clustering-friendly structures. Most existing MVC methods overlook the importance of inter-cluster separability. To address these issues, we propose a novel Contrastive Learning-based Dual Contrast Mechanism Deep Multi-view Clustering Network. Specifically, we first introduce view-specific autoencoders to extract latent features for each individual view. Then, we obtain consensus information across views through global feature fusion, measuring the pairwise representation discrepancy by maximizing the consistency between the view-specific representations and global feature representations. Subsequently, we design an adaptive weighted mechanism that can automatically enhance the useful views in feature fusion while suppressing unreliable views, effectively mitigating the representation degeneration issue. Furthermore, within the Contrastive Learning framework, we introduce a Dynamic Cluster Diffusion (DC) module that maximizes the distance between different clusters, thus enhancing the separability of the clusters and obtaining a clustering-friendly discriminative representation. Extensive experiments on multiple datasets demonstrate that our method not only achieves state-of-the-art clustering performance but also produces clustering structures with better separability.
Article
The Anchor-based Multi-view Subspace Clustering (AMSC) has turned into a favourable tool for large-scale multi-view clustering. However, there still exist some limitations to the current AMSC approaches. First, they typically recover anchor graph structure in the original linear space, restricting their feasibility for nonlinear scenarios. Second, they usually overlook the potential benefits of jointly capturing the inter-view and intra-view information for enhancing the anchor representation learning. Third, these approaches mostly perform anchor-based subspace learning by a specific matrix norm, neglecting the latent high-order correlation across different views. To overcome these limitations, this paper presents an efficient and effective approach termed Large-scale Tensorized Multi-view Kernel Subspace Clustering (LTKMSC). Different from the existing AMSC approaches, our LTKMSC approach exploits both inter-view and intra-view awareness for anchor-based representation building. Concretely, the low-rank tensor learning is leveraged to capture the high-order correlation (i.e., the inter-view complementary information) among distinct views, upon which the l1,2l_{1,2} norm is imposed to explore the intra-view anchor graph structure in each view. Moreover, the kernel learning technique is leveraged to explore the nonlinear anchor-sample relationships embedded in multiple views. With the unified objective function formulated, an efficient optimization algorithm that enjoys low computational complexity is further designed. Extensive experiments on a variety of multi-view datasets have confirmed the efficiency and effectiveness of our approach when compared with the other competitive approaches.
Article
The density peak clustering algorithm can rapidly identify cluster centers by drawing decision graphs without any prior knowledge; however, when multiple density peaks are present in one cluster of the dataset, the cluster centers cannot be accurately obtained, leading to incorrect clustering. Moreover, the single-step allocation strategy is poorly fault-tolerant and can lead to successive allocation errors. To address these problems, this study proposes a density peak clustering method based on data field theory and grid similarity, which divides the original data into grid spaces to obtain subspace codes and grid data. Subsequently, the potential grid density is introduced to measure the density of each grid based on the data field, and the density peaks are identified to obtain the cluster centers from the high-density grids. Finally, self-organized clustering is realized based on neighborhood extension centers and grid similarity. It thus reduces the possibility of successive assignment errors and the negative impact of multiple density peaks on the clustering results. In addition, the algorithm automatically determines the cluster centers and can correctly assign the non-density-peak grid to the corresponding clusters. Experimental results with synthetic and real-world datasets demonstrate that the proposed algorithm outperforms similar algorithms in terms of accuracy and efficiency when dealing with complex datasets with large density differences and cross-tangling.
Article
Full-text available
In this paper, we address the multi-view subspace clustering problem. Our method utilizes the circulant algebra for tensor, which is constructed by stacking the subspace representation matrices of different views and then rotating, to capture the low rank tensor subspace so that the refinement of the view-specific subspaces can be achieved, as well as the high order correlations underlying multi-view data can be explored. By introducing a recently proposed tensor factorization, namely tensor-Singular Value Decomposition (t-SVD) [18], we can impose a new type of low-rank ten-sor constraint on the rotated tensor to ensure the consensus among multiple views. Different from traditional unfolding based tensor norm, this low-rank tensor constraint has opti-mality properties similar to that of matrix rank derived from SVD, so the complementary information can be explored and propagated among all the views more thoroughly and effectively. The established model, called t-SVD based Multi-view Subspace Clustering (t-SVD-MSC), falls into the applicable scope of augmented Lagrangian method, and its minimization problem can be efficiently solved with theoretical convergence guarantee and relatively low computational complexity. Extensive experimental testing on eight challenging image datasets shows that the proposed method has achieved highly competent objective performance compared to several state-of-the-art multi-view clustering methods .
Article
Full-text available
We present a novel deep neural network architecture for unsupervised subspace clustering. This architecture is built upon deep auto-encoders, which non-linearly map the input data into a latent space. Our key idea is to introduce a novel self-expressive layer between the encoder and the decoder to mimic the "self-expressiveness" property that has proven effective in traditional subspace clustering. Being differentiable, our new self-expressive layer provides a simple but effective way to learn pairwise affinities between all data points through a standard back-propagation procedure. Being nonlinear, our neural-network based method is able to cluster data points having complex (often nonlinear) structures. We further propose pre-training and fine-tuning strategies that let us effectively learn the parameters of our subspace clustering networks. Our experiments show that the proposed method significantly outperforms the state-of-the-art unsupervised subspace clustering methods.
Conference Paper
Full-text available
We present a novel deep neural network architecture for unsupervised subspace clustering. This architecture is built upon deep auto-encoders, which non-linearly map the input data into a latent space. Our key idea is to introduce a novel self-expressive layer between the encoder and the decoder to mimic the "self-expressiveness'' property that has proven effective in traditional subspace clustering. Being differentiable, our new self-expressive layer provides a simple but effective way to learn pairwise affinities between all data points through a standard back-propagation procedure. Being nonlinear, our neural-network based method is able to cluster data points having complex (often nonlinear) structures. We further propose pre-training and fine-tuning strategies that let us effectively learn the parameters of our subspace clustering networks. Our experiments show that the proposed method significantly outperforms the state-of-the-art unsupervised subspace clustering methods.
Article
Clustering is a long-standing important research problem, however, remains challenging when handling large-scale image data from diverse sources. In this paper, we present a novel Binary Multi-View Clustering (BMVC) framework, which can dexterously manipulate multi-view image data and easily scale to large data. To achieve this goal, we formulate BMVC by two key components: compact collaborative discrete representation learning and binary clustering structure learning, in a joint learning framework. Specifically, BMVC collaboratively encodes the multi-view image descriptors into a compact common binary code space by considering their complementary information; the collaborative binary representations are meanwhile clustered by a binary matrix factorization model, such that the cluster structures are optimized in the Hamming space by pure, extremely fast bit-operations. For efficiency, the code balance constraints are imposed on both binary data representations and cluster centroids. Finally, the resulting optimization problem is solved by an alternating optimization scheme with guaranteed fast convergence. Extensive experiments on four large-scale multi-view image datasets demonstrate that the proposed method enjoys the significant reduction in both computation and memory footprint, while observing superior (in most cases) or very competitive performance, in comparison with state-of-the-art clustering methods.
Article
Clustering data in high-dimensions is believed to be a hard problem in general. A number of efficient clustering algorithms developed in recent years address this problem by projecting the data into a lower-dimensional subspace, e.g. via Principal Components Analysis (PCA) or random projections, before clustering. Such techniques typically require stringent requirements on the separation between the cluster means (in order for the algorithm to be be successful). Here, we show how using multiple views of the data can relax these stringent requirements. We use Canonical Correlation Analysis (CCA) to project the data in each view to a lower-dimensional subspace. Under the assumption that conditioned on the cluster label the views are uncorrelated, we show that the separation conditions required for the algorithm to be successful are rather mild (significantly weaker than those of prior results in the literature). We provide results for mixture of Gaussians, mixtures of log concave distributions, and mixtures of product distributions.
Article
In this paper, we present a deep extension of Sparse Subspace Clustering, termed Deep Sparse Subspace Clustering (DSSC). Regularized by the unit sphere distribution assumption for the learned deep features, DSSC can infer a new data affinity matrix by simultaneously satisfying the sparsity principle of SSC and the nonlinearity given by neural networks. One of the appealing advantages brought by DSSC is: when original real-world data do not meet the class-specific linear subspace distribution assumption, DSSC can employ neural networks to make the assumption valid with its hierarchical nonlinear transformations. To the best of our knowledge, this is among the first deep learning based subspace clustering methods. Extensive experiments are conducted on four real-world datasets to show the proposed DSSC is significantly superior to 12 existing methods for subspace clustering.
Technical Report
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry