Content uploaded by Christian Bauckhage
Author content
All content in this area was uploaded by Christian Bauckhage on Jan 26, 2019
Content may be subject to copyright.
Lecture Notes on Machine Learning
Kernel k-Means Clustering (Part 3)
Christian Bauckhage
B-IT, University of Bonn
In this note, we study yet another theoretical aspect of kernel k-means
clustering and show that the underlying problem can be formulated
in terms of matrix expressions. One of these expression will later lead
to a rather simple algorithm for kernel k-means.
Setting the Stage
Interested in using kernel k-means clustering to partition a data set
X={x1, . . . , xn}into k<ndisjoint clusters Ci, we already saw1that 1C. Bauckhage. Lecture Notes on Ma-
chine Learning: Kernel k-Means Clus-
tering (Part 2). B-IT, University of Bonn,
2019a
the basic problem is to solve
z∗
ij =argmin
{zij }
−
k
∑
i=1
1
ni
n
∑
p=1
n
∑
q=1
zip ziq K(xp,xq)
s.t. zij ∈ {0, 1}
k
∑
i=1
zij =1
(1)
where the optimization variables zij are binary cluster membership
indicators such that
zij =
1, if xj∈ Ci
0, otherwise. (2)
and
ni=
n
∑
j=1
zij (3)
represents the size of cluster Ci.
Our goal in this note is to rewrite the kernel k-means problem in
(1) in a more compact form. In particular, we show how to rewrite
the minimization objective
−
k
∑
i=1
1
ni
n
∑
p=1
n
∑
q=1
zip K(xp,xq)ziq (4)
in terms of matrix expressions. Throughout, we will use terminology
and notation introduced earlier2.2C. Bauckhage and D. Speicher. Lecture
Notes on Machine Learning: Matrix In-
ner Products, Norms, and Traces. B-IT,
University of Bonn, 2019
Rewriting the Kernel k-Means Objective
First of all, we note that we can gather the binary indicator variables
zij in a binary matrix Z∈ {0, 1}k×nsuch that
(Z)ij =zi j. (5)
© C. Bauckhage
licensed under Creative Commons License CC BY-NC
2 c.bauckhage
Second of all, we introduce the shorthand kpq =K(xp,xq)and
gather the kernel evaluations in (4) in a matrix K∈Rn×nwhere
(K)pq =kpq . (6)
Using these two matrices, the kernel k-means objective can then
also be written as
−
k
∑
i=1
1
ni
n
∑
p=1
n
∑
q=1Zip Kpq Z|pi =−
k
∑
i=1
1
niZKZ|ii . (7)
Next, we replace the factors 1
niby a matrix expression. To this end,
we observe that
ni=
n
∑
j=1
(Z)ij (8)
and exploit the crucial fact that the entries of Zare binary3. Only 3Consider z∈ {0, 1}.
If z=0, then z2=0.
If z=1, then z2=1.
because they are, we actually have the following identities
ni=
n
∑
j=1Zij =
n
∑
j=1Z2
ij =
n
∑
j=1Zij Z|ji =ZZ|ii . (9)
In other words, the kdiagonal entries of the k×kmatrix ZZ|
correspond to the cluster sizes n1,n2, . . . , nk.
But what about the off-diagonal entries of ZZ|? To answer this,
we consider i6=land compute
ZZ|il =
n
∑
j=1Zij Z|jl =
n
∑
j=1Zij Zlj =
n
∑
j=1
0=0. (10)
This holds true because of the constraint
k
∑
i=1
(Z)ij =1 (11)
which implies that, in any column jof Z, no two entries can be 1.
Hence, for i6=l,(Z)ij and (Z)l j are either both 0 or one of them is 0
and the other one is 1. In either case, their product (Z)i j(Z)l j is 0.
In short, what we just found is that matrix ZZ|is a diagonal
matrix
ZZ|=
n1
n2
...
nk
(12)
which is to say that its inverse simply amounts to
ZZ|−1=
1
n11
n2...
1
nk
(13)
kernel k-means 3
Using this new insight, we can write the objective in (7) exclusively
in terms of diagonal elements of matrices
−
k
∑
i=1
1
niZKZ|ii =−
k
∑
i=1ZZ|−1
ii ZK Z|ii . (14)
Moreover, since ZZ|−1is diagonal, the above expression can be
written even more compactly, namely
−
k
∑
i=1ZZ|−1
ii ZK Z|ii =−
k
∑
i=1ZZ|−1ZKZ|ii (15)
=−trhZZ|−1ZKZ|i. (16)
At this point, we have succeeded with our goal of rewriting the
kernel k-means objective in terms of matrices only. However, let us
keep going and try to gain even further insights into the nature of
the kernel k-means problem.
Using the cyclic permutation invariance of traces, we find
−trhZZ|−1ZKZ|i=−trhKZ|ZZ|−1Zi(17)
and recognize this as an expression we know from when we studied
conventional k-means clustering as a matrix factorization problem.44C. Bauckhage. k-Means Clustering via
the Frank-Wolfe Algorithm. In Proc.
KDML-LWDA,2016; and C. Bauckhage.
Lecture Notes on Machine Learning: k-
Means Clustering is Matrix Factoriza-
tion (Part 2). B-IT, University of Bonn,
2019b
Written in terms of (17), the kernel k-means clustering problem
becomes
Z∗=argmin
Z
−trhKZ|ZZ|−1Zi
s.t. Z∈ {0, 1}k×n
k
∑
i=1
(Z)ij =1.
(18)
Another very useful form of the problem results if we express
(the diagonal matrix) ZZ|−1as a product of its (diagonal) roots
ZZ|−1=ZZ|−1
2ZZ|−1
2. (19)
Using the cyclic permutation invariance of traces once again, we then
find that
−trhZZ|−1ZKZ|i=−trhZZ|−1
2ZKZ|ZZ|−1
2i. (20)
In order to reduce notational clutter on the right hand side of (20),
we next introduce the following substitutions55Note again that ZZ|−1
2is diagonal
and therefore equals its transpose.
H=Z|ZZ|−1
2(21)
⇔H|=ZZ|−1
2Z. (22)
which allows us to write
−trhZZ|−1
2ZKZ|ZZ|−1
2i=−trhH|KHi. (23)
4 c.bauckhage
Looking at how we defined Hand H|, we furthermore note that
H|H=ZZ|−1
2ZZ|ZZ|−1
2(24)
=ZZ|−1
2ZZ|1
2ZZ|1
2ZZ|−1
2(25)
=I(26)
where Idenotes the k×kidentity matrix. Together with (23) this
allows us to formalize the kernel k-means clustering problem as
H∗=argmin
H
−trhH|KHi
s.t. H∈Rn×k
H|H=I.
(27)
Finally, we note that the constrained trace minimization problem
in (27) is quadratic in the optimization variable H. Alternatively, it
can therefore also be cast as a trace maximization problem, namely
H∗=argmax
H
trhH|KHi
s.t. H∈Rn×k
H|H=I.
(28)
Summary and Outlook
In this note, we rewrote the kernel k-means objective in terms of
traces of matrix products. Among others, we then found that the
problem of kernel k-means clustering can also be expressed as the
following constrained maximization problem
H∗=argmax
H∈Rn×k
trhH|KHi
s.t. H|H=I.
(29)
Savvy readers may have already spotted where this will lead to.
Indeed, (29) suggests that solving the kernel k-means clustering prob-
lem boils down to computing a truncated spectral decomposition of
the kernel matrix Kand we will discuss further details later on.
Acknowledgments
This material was prepared within project P3ML which is funded by
the Ministry of Education and Research of Germany (BMBF) under
grant number 01/S17064. The authors gratefully acknowledge this
support.
kernel k-means 5
References
C. Bauckhage. k-Means Clustering via the Frank-Wolfe Algorithm.
In Proc. KDML-LWDA,2016.
C. Bauckhage. Lecture Notes on Machine Learning: Kernel k-Means
Clustering (Part 2). B-IT, University of Bonn, 2019a.
C. Bauckhage. Lecture Notes on Machine Learning: k-Means Clus-
tering is Matrix Factorization (Part 2). B-IT, University of Bonn,
2019b.
C. Bauckhage and D. Speicher. Lecture Notes on Machine Learning:
Matrix Inner Products, Norms, and Traces. B-IT, University of Bonn,
2019.