Technical ReportPDF Available

Lecture Notes on Machine Learning: Kernel k-Means Clustering (Part 1)

Authors:

Abstract

In this note, we show that the objective function for k-means clustering can be cast in a form which allows for invoking the kernel trick.
Lecture Notes on Machine Learning
Kernel k-Means Clustering (Part 1)
Christian Bauckhage
B-IT, University of Bonn
In this note, we show that the objective function for k-means clustering
can be cast in a form which allows for invoking the kernel trick.
Setting the Stage
Previously1, we said the kernel trick is 1) to rewrite an algorithm for 1C. Bauckhage. Lecture Notes on Ma-
chine Learning: The Kernel Trick. B-IT,
University of Bonn, 2019
data analysis in such a way that input data only appear in form of inner
products with other input data, and 2) to replace any occurrence of such
inner products by kernel evaluations.
In the following, we demonstrate how to apply this this trick to
the problem of k-means clustering.
Recall2that k-means clustering aims at partitioning a given set of 2C. Bauckhage and O. Cremers. Lecture
Notes on Machine Learning: k-Means
Clustering. B-IT, University of Bonn,
2019
ndata points xjRminto kdistinct clusters Ciwhich are defined in
terms of prototypes µi.
The basic problem, therefore, is finding optimal cluster prototypes
and most k-means algorithms try to accomplish this via minimizing
k-means objective
E=
k
i=1
n
j=1
zij
xjµi
2(1)
with respect to the µiwhere the zij are binary indicator variables33C. Bauckhage and D. Speicher. Lecture
Notes on Machine Learning: Rewriting
the k-Means Objective. B-IT, University
of Bonn, 2019
defined as
zij =
1, if xj∈ Ci
0, otherwise. (2)
Regarding our goal of kernelizing k-means clustering, all of this
is to say that “all we have to do” is to kernelize the minimization
objective in (1). Next, we walk through the steps this involves.
Kernelizing the k-Means Objective Function
The data that enter k-means clustering are the npoints xjRmwe
want to cluster. We therefore have to rewrite the objective function
in (1) such that the xjonly occur in form of inner products.
To begin with, we recall the following elementary identity for the
Euclidean norm
xjµi
2=xjµi|xjµi=x|
jxj2x|
jµi+µ|
iµi. (3)
This immediately allows us to cast the k-means objective in (1) as44Observe that we use the symmetry of
the inner product to write x|
jµias µ|
ixj.
E=
k
i=1
n
j=1
zij x|
jxj2µ|
ixj+µ|
iµi(4)
where all the xjnow appear as factors of inner products.
© C. Bauckhage
licensed under Creative Commons License CC BY-NC
2 c.bauckhage
However, we are not quite there yet. This is because not all inner
products involving data points are inner products of data points only.
Some of them involve the cluster means
µi=1
ni
xj∈Ci
xj(5)
where ni=|Ci|denotes the size of cluster Ci.
We therefore recall that, using the binary indicator variables zij ,
we may also write
mean µiand size niof cluster Cican be
expressed in terms of the indicator vari-
ables zij ∈ {0, 1}defined in (2)
µi=1
ni
n
j=1
zij xj(6)
as well as
ni=
n
j=1
zij (7)
Hence, those inner products in (4) that involve cluster means µi
can also be written as
µ|
ixj=1
ni
n
p=1
zip x|
pxj(8)
as well as
µ|
iµi=1
n2
i
n
p=1
n
q=1
zip ziq x|
pxq. (9)
where we had to introduce additional summation indices pand qto
correctly expand the inner products into (double) sums.
Putting together (4), (8), and (9), we therefore find that (1) can
equivalently be expressed as
the k-means objective function in (1) can
be expressed exclusively in terms of in-
ner products between input data
E=
k
i=1
n
j=1
zij x|
jxj2
ni
n
p=1
zip x|
pxj+1
n2
i
n
p=1
n
q=1
zip ziq x|
pxq!. (10)
At this point, we could conclude our discussion. Looking at (10),
we recognize that the minimization objective for k-means clustering
can be written entirely in terms of inner products between data. This
immediately allows us to invoke the second step of the kernel trick,
where we replace these inner products by kernel functions. However,
before we do so, we will further simplify the result in (10).
Upon closer inspection, we realize that (10) is a sum over three
terms, namely
T1=
k
i=1
n
j=1
zij x|
jxj(11)
T2=2
k
i=1
n
j=1
zij
1
ni
n
p=1
zip x|
pxj(12)
T3=
k
i=1
n
j=1
zij
1
n2
i
n
p=1
n
q=1
zip ziq x|
pxq. (13)
kernel k-means 3
With respect to the first term T1, we note that we can rearrange it
as follows
T1=
k
i=1
n
j=1
zij x|
jxj(14)
=
n
j=1
x|
jxj
k
i=1
zij (15)
=
n
j=1
x|
jxj(16)
where, in the step from (15) to (16), we made use of a crucial property
of (hard) k-means clustering. Since each data point xjis assigned to
exactly one cluster Ci, the indicator variables zij ∈ {0, 1}obey
k
i=1
zij =1. (17)
For the second term T2, it will come in handy to slightly rearrange
it so that it reads
T2=2
k
i=1
n
j=1
zij
1
ni
n
p=1
zip x|
pxj(18)
=2
k
i=1
1
ni
n
j=1
n
p=1
zij zi p x|
pxj. (19)
For the third term T3, we observe that it can also be written as
T3=
k
i=1 n
j=1
zij !1
n2
i
n
p=1
n
q=1
zip ziq x|
pxq(20)
=
k
i=1
ni
1
n2
i
n
p=1
n
q=1
zip ziq x|
pxq(21)
=
k
i=1
1
ni
n
p=1
n
q=1
zip ziq x|
pxq(22)
where the step from (20) to (21) made use of (7). And, comparing the
result in (22) to our previous one in (19), we find T3=1
2T2.
Summing back together (16), (19), and (22), we therefore obtain
the objective in (10) as
the objective in (10) can be written in a
much shorter form
E=
n
j=1
x|
jxj
k
i=1
1
ni
n
j=1
n
p=1
zij zi p x|
pxj. (23)
Given this compact form of the rewritten k-means objective, it
is finally worth our while to proceed to step two of the kernel trick,
namely to replace inner products by kernel functions. This way, we
obtain the minimization objective for kernel k-means clustering
kernel k-means objective
EK=
n
j=1
K(xj,xj)
k
i=1
1
ni
n
j=1
n
p=1
zij zi p K(xp,xj). (24)
4 c.bauckhage
Summary and Outlook
In this note, we saw that k-means clustering allows for invoking the
kernel trick. In particular, we demonstrated that the minimization
objective
E=
k
i=1
n
j=1
zij
xjµi
2(25)
considered in conventional k-means clustering can be kernelized to
become
EK=
n
j=1
K(xj,xj)
k
i=1
1
ni
n
j=1
n
p=1
zij zi p K(xp,xj). (26)
However, while conventional k-means clustering is (typically taken
to be) tantamount to the problem of minimizing Ewith respect to the
cluster means µi, we must point out that there are no means in EK
anymore. This is a consequence of invoking the kernel trick and begs
the following question
Given the kernelized objective function in (26), which minimization
problem do we have to solve in kernel k-means clustering?
This crucial question as well as the equally crucial question of how
to practically solve the kernel k-means problem will be answered in
later notes.
Acknowledgments
This material was prepared within project P3ML which is funded by
the Ministry of Education and Research of Germany (BMBF) under
grant number 01/S17064. The authors gratefully acknowledge this
support.
kernel k-means 5
References
C. Bauckhage. Lecture Notes on Machine Learning: The Kernel
Trick. B-IT, University of Bonn, 2019.
C. Bauckhage and O. Cremers. Lecture Notes on Machine Learning:
k-Means Clustering. B-IT, University of Bonn, 2019.
C. Bauckhage and D. Speicher. Lecture Notes on Machine Learning:
Rewriting the k-Means Objective. B-IT, University of Bonn, 2019.
Technical Report
Full-text available
Earlier, we saw that k-means clustering allows for invoking the kernel trick. Here, we discuss the problem we have to solve in kernel k-means clustering and how it differs from the conventional k-means problem.
  • C Bauckhage
  • O Cremers
C. Bauckhage and O. Cremers. Lecture Notes on Machine Learning: k-Means Clustering. B-IT, University of Bonn, 2019.
Lecture Notes on Machine Learning: Rewriting the k-Means Objective. B-IT
  • C Bauckhage
  • D Speicher
C. Bauckhage and D. Speicher. Lecture Notes on Machine Learning: Rewriting the k-Means Objective. B-IT, University of Bonn, 2019.