Content uploaded by Christian Bauckhage
Author content
All content in this area was uploaded by Christian Bauckhage on Sep 05, 2018
Content may be subject to copyright.
Adiabatic Quantum Computing
for Kernel k= 2 Means Clustering
Christian Bauckhage, Cesar Ojeda, Rafet Sifa, and Stefan Wrobel
Fraunhofer Center for Machine Learning, Sankt Augustin, Germany
Fraunhofer IAIS, Sankt Augustin, Germany
B-IT, University of Bonn, Bonn, Germany
Abstract. Adiabatic quantum computers are tailored towards finding
minimum energy states of Ising models. The quest for implementations
of machine learning algorithms on such devices is thus the quest for
Ising model (re-)formulations of their underlying objective functions. In
this paper, we discuss how to accomplish this for the problem of kernel
binary clustering. We then discuss how our models can be solved on an
adiabatic quantum computing device. Finally, in simulation experiments,
we numerically solve the respective Schr¨odinger equations and observe
our approaches to yield convincing results.
1 Introduction
Quantum computing exploits quantum mechanical phenomena for information
processing and is now becoming practical. Working quantum computers are on
the market [1], industry invests increasing efforts [2,3,4,5,6,7], and further rapid
progress is expected [8]. This will likely impact artificial intelligence and machine
learning because quantum computing promises efficient solutions to many of the
search- or optimization problems encountered in these fields [9,10,11,12,13,14].
Here, we extend our earlier work on quantum computing for unsupervised
learning [15] towards the problem of kernel k= 2 means clustering and discuss
how to solve it via adiabatic quantum computing.
Note that adiabatic quantum computers solve a kind of optimization problem
not unfamiliar in machine learning. Devices such as those produced by D-Wave
Systems [1,16] determine low energy states of Ising models. While they were orig-
inally conceived to describe spin glass systems [17], Ising models occur in other
settings, too. Examples include Boolean satisfiability- or graph cutting problems
[18,19,20] as well as neurocomputing models known as Hopfield networks [21].
If a problem can be formulated as an Ising energy minimization problem,
there are standard procedures for preparing systems of quantum bits (qubits)
and energy operators (Hamiltonians) for processing [18,19]. The solution process
itself relies on the adiabatic theorem [22] which states that if a quantum system
starts in a low energy configuration (ground state) of a Hamiltonian which then
gradually changes, the system will end up in the ground state of the resulting
Hamiltonian. To harness this for problem solving, one prepares a qubit system
in the ground state of a problem independent Hamiltonian and adiabatically
evolves it to a Hamiltonian whose ground state represents a solution to the
problem at hand.
In this paper, we discuss these ideas in detail. First, we elaborate on the
notion of Ising models and their role in adiabatic quantum computing. We then
propose two Ising models for kernel k= 2 means clustering. Given these models,
we discuss how to set them up for computing and review the required quantum
mechanical concepts. Finally, we present several simulation experiments in which
we numerically solve the Schr¨odinger equations which govern the corresponding
quantum mechanical processes; these experiments demonstrate the feasibility of
our approach and illustrate how appropriately prepared systems of qubits evolve
towards a clustering solution.
2 Ising Models
Existing adiabatic quantum computers are designed to find low energy states of
Ising models. In other words, they solve
s∗= argmin
s∈{−1,+1}n
s|Qs +s|q(1)
where the 2nvectors sare possible global states of a system of nentities each
of which can be in one of two local states (+1 or −1). The coupling matrix
Q∈Rn×nmodels interactions within the system and the vector q∈Rnmodels
external influences.
Since Ising models are concerned with bipolar state vectors s∈ {−1,+1}n,
they appear suited to formalize bi-partitioning problems such as binary cluster-
ing of ndata points. This is because, for suitable, problem dependent choices
of Qand q, the entries s∗
i=±1 of the solution to (1) can be thought of as
membership indicators for two distinct clusters. In section 3, we therefore devise
Ising models particularly for this purpose.
Note, however, that the problem in (1) is a quadratic unconstrained binary
optimization problem (QUBO) and therefore generally NP hard. For instance, a
na¨ıve approach to k= 2 means clustering would be to exhaustively evaluate (1)
for each of the 2npossible assignments of ndata points to 2 clusters. For large n,
this becomes of course impractical on a digital computer. On an adiabatic quan-
tum computer, on the other hand, we could prepare a system of nqubits that
is in a quantum mechanical superposition of all the 2npossible solutions. Here,
the challenge is thus to manipulate the system to evolve towards a state that
corresponds to a desired partition. In section 4, we discuss how to accomplish
this.
3 Ising Models for Kernel k= 2 Means Clustering
In this section, we devise Ising models for the problem of clustering a sample X
of ndata points x∈Rminto two disjoint clusters X1and X2where |X1|=n1,
|X2|=n2, and n1+n2=n. Without loss of generality, we assume that the data
in Xare normalized to zero mean so that we have
µ=1
nX
x∈X
x=1
n
2
X
i=1 X
x∈Xi
x=1
nn1µ1+n2µ2=0.(2)
This implies n1µ1=−n2µ2which is to say that the two cluster means µ1and
µ2will be of opposite sign.
Regarding the idea of k= 2 means clustering, our problem would typically
be formalized as having to determine the two minimizers µ1and µ2of the within
cluster scatter
SW=
2
X
i=1 X
x∈Xi
x−µi
2.(3)
Indeed, most of the well known k-means algorithms such as those of Lloyd [23],
Hartigan [24], or MacQueen [25] consider this objective.
However, in this paper, we follow a different route and observe that the
problem of minimizing the within cluster scatter is equivalent to the problem of
maximizing the between cluster scatter
SB=
2
X
i,j=1
ninj
µi−µj
2.(4)
This actually holds true for any k≥1 and follows from Fisher’s analysis of
variance [26,27]. It establishes that the total scatter can be written as
ST=X
x∈X
x−µ
2=SW+1
2nSB(5)
which, since STand nare positive constants, implies that any decrease of SW
entails an increase of SB.
Looking at the two equivalent objective functions in (3) and (4), we remark
that their optimization proves to be NP hard in general [28] because they both
constitute integer programming problems in disguise [29]. Algorithms such as
those in [23,24,25] are therefore mere heuristics for which there is no guarantee
that they will find the optimal solution. In this sense it appears acceptable, that
the Ising models we next derive from (4) involve a heuristic assumption, too.
3.1 An Ambiguous Ising Model
For our problem of k= 2 means clustering, the maximization objective in (4) is
expressed in an overly complicated manner and it is easy to see that it can be
simplified to
SB= 2 n1n2
µ1−µ2
2.(6)
Interestingly, this simplification now provides an intuition as to why k-means
clustering is agnostic of cluster shapes and distances and often produces clusters
of about equal size [30]. In order for SBin (6) to be large, both the distance
kµ1−µ2kbetween the two cluster centers and the product n1n2of the two
cluster sizes have to be large. However, since the sum n1+n2=nis fixed, the
product of the sizes will be maximal if n1=n2=n
2.
This observation provides us with a heuristic argument for how to rewrite
the objective in (6) which in turn will allow us to set up an Ising model.
If we assume that, at a solution, we will likely have n1≈n2≈n
2, we may
consider the approximation
2n1n2
µ1−µ2
2≈2n2
4
µ1−µ2
2
= 2
n
2µ1−µ2
2
≈2
n1µ1−n2µ2
2(7)
which turns the k= 2 means clustering problem into the problem of having to
solve
µ∗
1,µ∗
2= argmax
µ1,µ2
n1µ1−n2µ2
2.(8)
Next, we observe that the norm in (8) can be expressed in a form that does
not explicitly depend on the unknown cluster means µi. To this end, we gather
the given data in a data matrix X= [x1,...,xn]∈Rm×nand introduce two
binary indicator vectors z1,z2∈ {0,1}nwhich indicate cluster memberships in
the sense that entry lof ziequals 1 if xl∈Xiand 0 otherwise. This way, we
can write n1µ1=Xz1as well as n2µ2=Xz2and therefore
n1µ1−n2µ2
2=
Xz1−z2
2=
Xs
2.(9)
Note that sintroduced in (9) is guaranteed to be a bipolar vector because,
in k-means clustering, every given data point is assigned to one and only one
cluster so that
z1−z2=z1−(1−z1)=2z1−1=s∈ {−1,1}n.(10)
This, however, establishes that there is an Ising model for the problem of
k= 2 means clustering of zero mean data.
On the one hand, since kXsk2=s|X|Xs is convex in s, the maximization
problem in (8) is equivalent to the following minimization problem
s∗= argmin
s∈{−1,1}n−s|X|X s (11)
= argmin
s∈{−1,1}n−s|Q s.(12)
On the other hand, because of (2), this will necessarily yield a solution vector s
whose entries are not all equal and thus induce a clustering of the data in X.
Looking at (12), we next observe that the coupling matrix Q=X|Xis
a Gram matrix. The criterion we just derived therefore allows for invoking the
kernel trick and thus leads to kernel k= 2 means clustering. In particular, if we
choose the coupling matrix for the Ising model in (12) to be a centered kernel
matrix
Qij =k(xi,xj)−1
nX
l
k(xi,xl)−1
nX
k
k(xk,xj) + 1
n2X
k,l
k(xk,xl) (13)
where kxi,xjis an appropriate Mercer kernel, our assumption of zero mean
data remains valid in the feature space as well.
3.2 An Unambiguous Ising Model
Looking at (12), we furthermore observe that there is a form of symmetry. This
is because, if s∗solves this optimization problem, then so does −s∗since we did
not specify whether an entry of, say, +1 is supposed to indicate membership to
cluster one or two.
To remove this ambiguity, we may remove a degree of freedom from our
model. W.l.o.g. we can, for instance, fix sn= +1 and solve (12) for the remaining
n−1 entries of s. This way, the problem becomes to solve
s∗= argmin
s∈{−1,1}n−1−
n−1
X
i,j=1
Qij sisj−2
n−1
X
j=1
Qnj sj−Qnn (14)
= argmin
s∈{−1,1}n−1−s|Q0s−s|q0(15)
which we recognize as yet another Ising energy minimization problem.
4 Adiabatic Quantum Kernel k= 2 Means Clustering
To perform adiabatic quantum kernel k= 2 means clustering of ndata points, we
consider a time-dependent system of nentangled qubits that is in a superposition
of 2nbasis states. Using the Dirac notation, this is written as
ψ(t)=
2n−1
X
i=0
ai(t)ψi(16)
where the time dependent amplitudes ai∈Cobey Pi|ai|2= 1. We understand
each of the different basis states
ψ0=000 . . . 000(17)
ψ1=000 . . . 001(18)
ψ2=000 . . . 010(19)
ψ3=000 . . . 011(20)
.
.
.
as an indicator vector which represents one of the 2npossible assignments of
ndata points to 2 distinct clusters and use the common shorthand to express
tensor products, for instance
ψ1=000 . . . 001=0⊗0⊗. . . ⊗1.(21)
If a quantum system such as the one in (16) evolves under the influence of a
time-dependent Hamiltonian H(t), its behavior is governed by the Schr¨odinger
equation
∂
∂t ψ(t)=−i H (t)ψ(t)(22)
where we have set ~= 1. In adiabatic quantum computing, we consider periods
ranging from t= 0 to t=Tand assume the Hamiltonian at time tto be given
as a convex combination of two static Hamiltonians, namely
H(t) = 1−t
THB+t
THP.(23)
HBis called the beginning Hamiltonian whose ground state is easy to construct
and HPis the problem Hamiltonian whose ground state encodes the solution to
the problem at hand.
For Ising models such as the ones in (12) and (15), there are by now stan-
dard suggestions for how to set up a suitable problem Hamiltonian [18,19]. In
particular, we may define
HP=
n
X
i,j=1
Qij σi
zσj
z+
n
X
i=1
qiσi
z(24)
where σi
zdenotes the Pauli spin matrix σzacting on the ith qubit, that is
σi
z=I⊗I⊗. . . ⊗I
| {z }
i−1 terms ⊗σz⊗I⊗I . . . ⊗I
| {z }
n−iterms
.(25)
The beginning Hamiltonian is then typically chosen to be orthogonal to the
problem Hamiltonian, for instance
HB=−
n
X
i=1
σi
x(26)
where σi
xis defined as above, this time with respect to the Pauli spin matrix σx.
To compute a clustering, we then let |ψ(t)ievolve from |ψ(0)ito ψ(T)
where |ψ(0)iis chosen to be the ground state of HB. That is, if λdenotes the
smallest eigenvalue of HB, the initial state |ψ(0)iof the system corresponds to
the solution of
HBψ(0)=λψ(0).(27)
Finally, at time t=T, a measurement is performed on the nqubit system.
This will cause the wave function ψ(T)to collapse to a particular basis state
and the probability for this state to be ψiis given by the amplitude |ai(T)|2.
Yet, since the adiabatic evolution was steered towards the problem Hamiltonian
HP, basis states that correspond to ground states of HPare more likely to be
found.
On an adiabatic quantum computer, this algorithm is carried out physically.
On a digital computer, we may simulate it by numerically solving
ψ(T)=−iZT
0
H(t)ψ(t)dt (28)
which is the approach we adhere to in the next section.
5 Practical Examples
In this section, we present several examples which demonstrate the feasibility
of quantum computing for kernel k= 2 means clustering. Our examples are
of didactic nature and first and foremost intended to illustrate the adiabatic
evolution of nqubit systems. In each experiment, we therefore restrict ourselves
to n= 16 data points xi∈R2which form two clusters. In other words, we
consider data matrices X= [X1,X2]∈R2×16 whose 16 column vectors form
two clusters of size n1and n2, respectively. These simple settings allow us to
comprehensibly visualize the data and the evolution of the the amplitudes of
the 216 or 215 basis states of qubit systems which implement the ambiguous and
unambiguous model derived above.
In each experiment, we simulate quantum adiabatic evolutions on a digital
computer. To this end, we set up the corresponding problem Hamiltonian HP,
the beginning Hamiltonian HB, and its ground state ψ(0)as discussed above
and use the Python quantum computing toolbox QuTiP [31] to numerically
solve (28) for t∈[0, T = 75] where 75 ∈O(√2n).
Experiment 1: In our first experiment, we consider the data in Fig. 1(a) and
minimize the energy of the ambiguous Ising model in (12) where the coupling
matrix Qresults from computing (13) with a Gaussian kernel
kxi,xj= exp −1
2σ2
xi−xj
2.(29)
Figure 1(c) illustrates the temporal evolution of the amplitudes |ai(t)|2of the
216 = 65536 basis states ψithe corresponding 16 qubit quantum system ψ(t)
can be in. At t= 0, all states are equally likely but over time their amplitudes
begin to increase or decrease. At t=T, two of the basis states have an amplitude
considerably higher than the others so that a measurement will likely cause the
system to collapse to either of these equally likely more probable states. These
two basis states are 0000000011111111and 1111111100000000which, when
understood as cluster indicator vectors, both induce the result in Fig. 1(b).
Looking at this result, we can conclude that our approach can cluster the
data in a manner a human observer would expect and deem appropriate.
(a) n= 16 data points in R2
(b) clustering result (c) evolution of amplitudes of basis states
ψi
Fig. 1: Example of adiabatic quantum kernel k= 2 means clustering using the
ambiguous Ising model in (12). (a) sample of 16 data points forming two half-
moons; (b) corresponding clustering result; (c) adiabatic evolution of a system
of 16 qubits. During its evolution over time t, the system is in a superposition
of 216 = 65536 basis states ψieach representing a possible binary clustering.
Initially, it is equally likely to find the system in any of these states. At the
end, two basis states have noticeably higher amplitudes |ai|2than the others
and are therefore more likely to be measured; these are 0000000011111111and
1111111100000000and they both induce the result in (b).
Experiment 2: In our second experiment, we consider the same data as above
where, this time, one of the data points has been manually preassigned to a
cluster (see Fig. 2(a)). This allows for the use of the unambiguous Ising model
in (15). We compute the coupling matrix as above, however, since only 15 data
points still need to be assigned to a cluster, we consider a 15 qubit system ψ(t)
which is in a superposition of 215 = 32768 basis states.
Figure 2(c) visualizes the adiabatic quantum evolution of this system. Again,
at time t= 0, each basis state ψiis equally likely to be measured but the
corresponding amplitudes |ai|2soon begin to increase or to decrease. Since we
are considering the unambiguous Ising model, the process reaches a configuration
at t=Twhere only one basis state has a much higher amplitude than the others.
This one is 000000001111111and induces the result in Fig. 2(b)). Just as in
our first experiment, the result obtained from the unambiguous model considered
here is reasonable and convincing.
Experiment 3: In our third experiment, we investigate whether or not practical
success of our approach critically hinges on the heuristic assumption in (7),
namely, that clusters are of about equal size. The n= 16 data points in Fig. 3(a)
(a) n= 16 data points in R2one
of which has been preassigned to
a cluster
(b) clustering result (c) evolution of amplitudes of basis states
ψi
Fig. 2: Example of adiabatic quantum kernel k= 2-means clustering using the
unambiguous Ising model in (15). (a) sample of 16 data points where one data
point has been manually preassigned to a cluster; (b) corresponding clustering
result; (c) adiabatic evolution of a system of 15 qubits. Throughout, the system is
in a superposition of 215 = 32768 basis states. Upon termination of its adiabatic
evolution, the single most likely basis state for the system to be found in is
000000001111111which induces the assignment of points to clusters in (b).
were sampled from two bi-variate Gaussians where n1= 11 and n2= 5 and thus
form two clusters where the one is more than twice as big than the other.
Since one of the data points in Fig. 3(a) has bee preassigned a cluster label, we
again consider the unambiguous Ising model in (15) using a kernelized coupling
matrix as described above. The evolution of the corresponding 15 qubit system
is shown in Fig. 3(c) and it leads to a configuration where the single most likely
basis state 0000000000001111induces the clustering shown in Fig. 3(b).
As the result in Fig. 3(b) certainly appears reasonable, this experiments
shows that the minimum energy configurations of our Ising model(s) for quantum
clustering do not necessarily have to correspond to equally sized partitions of
a given set of data. This is of course desirable and shows resilience against the
simple heuristic we applied in (7). Of course, the result may have looked less
convincing to the human eye if the two clusters were closer together; but this
caveat would apply to conventional (kernel) k-means clustering, too [30].
6 Summary
After decades of mainly theoretical research, quantum computing is now about
to become practical. Companies such as Google, IBM, Intel, or Microsoft invest
(a) n= 16 data points in R2one
of which has been preassigned to
a cluster
(b) clustering result (c) evolution of amplitudes of basis states
ψi
Fig. 3: Example of adiabatic quantum kernel k= 2-means clustering applied to
a set of data that consists of two clusters of unequal sizes. Nevertheless, using
the unambiguous Ising model in (15), the corresponding 15 qubit system evolves
to a configuration where the single most likely basis state 0000000000001111
partitions the data into two groups a human observer would deem reasonable.
increasing resources into corresponding research and development and further
rapid technological progress is expected. These developments will likely impact
supervised and unsupervised machine learning, because working quantum com-
puters promise fast solutions to the kind of search- or optimization procedures
that are at the heart of many algorithms in these areas.
In this paper, we were thus concerned with the general feasibility of quantum
computing for machine learning and considered adiabatic quantum computing
for the problem of kernel k= 2 means clustering. We discussed that, from an
abstract point of view, the problem of setting up machine learning algorithms
for adiabatic quantum computing can be seen as the problem of expressing their
objective functions in terms of Ising energy minimization problems because adi-
abatic quantum computers are tailored towards minimizing Ising energies.
We therefore devised Ising models for (kernel) k= 2 means clustering of n
data points. The first model was straightforward to derive from an alternative,
less well known objective for k-means clustering but suffers from ambiguities
because if state s∗would minimize the Ising energy, then so would state −s∗.
We addressed this issue and devised a second, slightly more involved Ising model
of n−1 rather than of ndegrees of freedom.
In order for this paper to be as self-contained as possible we then discussed
how to prepare systems of nor n−1 qubits whose adiabatic evolution according
to an appropriate time-dependent Hamiltonian would lead to a solution of our
Ising energy minimization problems and thus to an assignment of data points to
clusters.
Finally, we presented several simulation experiments where we numerically
solved the Schr¨odinger equations governing the dynamics of the corresponding
qubit systems. Our examples demonstrated that adiabatic quantum computing
can indeed perform kernel k= 2 means clustering.
References
1. D-Wave press release: D-Wave announces D-Wave 2000Q quantum computer and
first system order (Jan 2017)
2. Connover, E.: Google Moves toward Quantum Supremacy with 72-qubit Computer.
Science News 193(6) (2018) 13
3. Daimler press release: Daimler joins forces with Google to research the application
of quantum computers (Mar 2018)
4. IBM press release: IBM announces advances to IBM quantum systems & ecosystem
(Nov 2017)
5. Intel press release: Intel advances quantum and neuromorphic computing research
(Jan 2018)
6. Microsoft press release: With new Microsoft breakthroughs, general purpose quan-
tum computing moves closer to reality (2017)
7. VW press release: Volkswagen Group and Google work together on quantum
computers (Nov 2017)
8. Knight, W.: Serious quantum computers are finally here. What are we going to do
with them? MIT Technology Review (February 2018)
9. A¨ımeur, E., Brassard, G., Gambs, S.: Quantum Speed-up for Unsupervised Learn-
ing. Machine Learning 90(2) (2013)
10. Lloyd, S., Mohseni, M., Rebentrost, P.: Quantum Algorithms for Supervised and
Unsupervised Machine Learning. arXiv:1307.0411 [quant-ph] (2013)
11. Wittek, P.: Quantum Machine Learning. Academic Press (2014)
12. Schuld, M., Sinayskiy, I., Petruccione, F.: An Introduction to Quantum Machine
Learning. Contemporary Physics 56(2) (2014)
13. Wiebe, N., Kapoor, A., Svore, K.: Quantum Algorithms for Nearest-Neighbor
Methods for Supervised and Unsupervised Learning. Quantum Information &
Computation 15(3–4) (2015)
14. Dunjiko, V., Taylor, J., Briegel, H.: Quantum-Enhanced Machine Learning. Phys-
ical Review Letters 117(13) (2016)
15. Bauckhage, C., Brito, E., Cvejoski, K., Ojeda, C., Sifa, R., Wrobel, S.: Ising Models
for Biary Clustering via Adiabatic Quantum Computing. In: Proc. EMMCVPR.
Volume 10746 of LNCS., Springer (2017)
16. Johnson, M., et al.: Quantum Annealing with Manufactured Spins. Nature
473(7346) (2011)
17. Ising, E.: Beitrag zur Theorie des Ferromagnetismus. Zeitschrift f¨ur Physik 31(1)
(1925) 253–258
18. Farhi, E., Goldstone, J., Gutmann, S., Sipser, M.: Quantum Computation by
Adiabatic Evolution. arXiv:quant-ph/0001106 (2000)
19. Lucas, A.: Ising Formulations of Many NP Problems. Frontiers in Physics 2(5)
(2014)
20. Ushijima-Mwesigwa, H., Negre, C., Mniszewski, S.: Graph Partitioning Using
Quantum Annealing on the D-Wave System. In: Proc. Int. Workshop on Post
Moores Era Supercomputing, ACM (2017)
21. Hopfield, J.: Neural Networks and Physical Systems with Collective Computational
Abilities. PNAS 79(8) (1982)
22. Born, M., Fock, V.: Beweis des Adiabatensatzes. Zeitschrift f¨ur Physik 51(3–4)
(1928)
23. Lloyd, S.: Least Squares Quantization in PCM. IEEE Trans. Information Theory
28(2) (1982)
24. Hartigan, J., Wong, M.: Algorithm AS 136: A k-Means Clustering Algorithm. J.
of the Royal Statistical Society C 28(1) (1979)
25. MacQueen, J.: Some Methods for Classification and Analysis of Multivariate Ob-
servations. In: Proc. Berkeley Symp. on Mathematical Statistics and Probability.
(1967)
26. Fisher, R.: On the Probable Error of a Coefficient Correlation Deduced from a
Small Sample. Metron 1(1921)
27. Bauckhage, C.: k-Means and Fisher’s Analysis of Variance. Technical report,
researchgate (May 2018)
28. Aloise, D., Deshapande, A., Hansen, P., Popat, P.: NP-Hardness of Euclidean
Sum-of-Squares Clustering. Machine Learning 75(2) (2009)
29. Bauckhage, C.: k-Means Clustering via the Frank-Wolfe Algorithm. In: Proceed-
ings KDML-LWA. (2016)
30. MacKay, D.: Information Theory, Inference, and Learning Algorithms. Cambridge
University Press (2003)
31. Johansson, J., Nation, P., Nori, F.: QuTiP 2: A Python Framework for the Dy-
namics of Open Quantum Systems. Computer Physics Communications 184(4)
(2013)