ArticlePDF Available

Algorithm for Data Clustering in Pattern Recognition Problems Based on Quantum Mechanics


Abstract and Figures

We propose a novel clustering method that is based on physical intuition derived from quantum mechanics. Starting with given data points, we construct a scale-space probability function. Viewing the latter as the lowest eigenstate of a Schrödinger equation, we use simple analytic operations to derive a potential function whose minima determine cluster centers. The method has one parameter, determining the scale over which cluster structures are searched. We demonstrate it on data analyzed in two dimensions (chosen from the eigenvectors of the correlation matrix). The method is applicable in higher dimensions by limiting the evaluation of the Schrödinger potential to the locations of data points.
Content may be subject to copyright.
Algorithm for Data Clustering in Pattern Recognition Problems Based on Quantum Mechanics
David Horn and Assaf Gottlieb
School of Physics and Astronomy, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University,
Tel Aviv 69978, Israel
(Received 16 July 2001; published 20 December 2001)
We propose a novel clustering method that is based on physical intuition derived from quantum me-
chanics. Starting with given data points, we construct a scale-space probability function. Viewing the
latter as the lowest eigenstate of a Schrödinger equation, we use simple analytic operations to derive a
potential function whose minima determine cluster centers. The method has one parameter, determin-
ing the scale over which cluster structures are searched. We demonstrate it on data analyzed in two
dimensions (chosen from the eigenvectors of the correlation matrix). The method is applicable in higher
dimensions by limiting the evaluation of the Schrödinger potential to the locations of data points.
DOI: 10.1103/ PhysRevLett.88.018702 PACS numbers: 89.75.Kd, 02.70. c, 03.65.Ge, 03.67.Lx
Clustering of data is a well-known problem of pattern
recognition, covered in textbooks such as [13]. The prob-
lem we are looking at is defining clusters of data solely by
the proximity of data points to one another. This problem is
one of unsupervised learning, and is in general ill defined.
Solutions to such problems can be based on intuition de-
rived from physics. A good example of the latter is the
algorithm by [4] that is based on associating points with
Potts spins and formulating an appropriate model of sta-
tistical mechanics. We propose an alternative that is also
based on physical intuition, this one being derived from
quantum mechanics.
As an introduction to our approach we start with the
scale-space algorithm by [5] who uses a Parzen-window
estimator [3] of the probability distribution leading to the
data at hand. The estimator is constructed by associating
a Gaussian with each of the Ndata points in a Euclidean
space of dimension dand summing over all of them. This
can be represented, up to an overall normalization, by
where xiare the data points. Roberts [5] views the maxima
of this function as determining the locations of cluster
An alternative, and somewhat related, method is support
vector clustering (SVC) [6] that is based on a Hilbert-space
analysis. In SVC, one defines a transformation from data
space to vectors in an abstract Hilbert space. SVC pro-
ceeds to search for the minimal sphere surrounding these
states in Hilbert space. We will also associate data points
with states in Hilbert space. Such states may be repre-
sented by Gaussian wave functions, whose sum is cx.
This is the starting point of our quantum clustering (QC)
method. We will search for the Schrödinger potential for
which cxis a ground state. The minima of the potential
define our cluster centers.
The Schrödinger potential.We wish to view cas an
eigenstate of the Schrödinger equation
Here we rescaled Hand Vof the conventional quantum
mechanical equation to leave only one free parameter, s.
For comparison, the case of a single point at x1corre-
sponds to Eq. (2) with V
2s2x2x12and Ed2,
thus coinciding with the ground state of the harmonic os-
cillator in quantum mechanics.
Given cfor any set of data points we can solve Eq. (2)
for V:
Let us furthermore require that minV0. This sets the
value of
and determines Vxuniquely. Ehas to be positive since
Vis a non-negative function. Moreover, since the last term
in Eq. (3) is positive definite, it follows that
We note that cis positive definite. Hence, being an eigen-
function of the operator Hin Eq. (2), its eigenvalue Eis the
lowest eigenvalue of H, i.e., it describes the ground state.
All higher eigenfunctions have nodes whose numbers in-
crease as their energy eigenvalues increase. (In quantum
mechanics, where one interprets jcj2as the probability dis-
tribution, all eigenfunctions of Hhave physical meaning.
Although this approach could be adopted, we have chosen
cas the probability distribution because of the simplicity
of algebraic manipulations.)
Given a set of points defined within some region of
space, we expect Vxto grow quadratically outside this
018702-1 0031-90070288(1)018702(4)$15.00 © 2001 The American Physical Society 018702-1
region, and to exhibit one or several local minima within
the region. We identify these minima with cluster centers,
which seems natural in view of the opposite roles of the
two terms in Eq. (2): Given a potential function, it attracts
the data distribution function cto its minima, while the
Laplacian drives it away. The diffused character of the
distribution is the balance of the two effects.
As an example we display results for the crab data set
taken from Ripley’s book [7]. These data, given in a
five-dimensional parameter space, show nice separation
of the four classes contained in them when displayed in
two dimensions spanned by the second and third principal
components [8] (eigenvectors) of the correlation matrix of
the data. The information supplied to the clustering algo-
rithm contains only the coordinates of the data points. We
display the correct classification to allow for visual com-
parison of the clustering method with the data. Starting
with s1p2we see in Fig. 1 that the Parzen proba-
bility distribution, or the wave-function c, has only a
single maximum. Nonetheless, the potential, displayed in
Fig. 2, already shows four minima at the relevant locations.
The overlap of the topographic map of the potential with
the true classification is quite amazing. The minima are the
centers of attraction of the potential, and they are clearly
evident although the wave function does not display local
maxima at these points. The fact that VxElies above
the range where all valleys merge explains why cxis
smoothly distributed over the whole domain.
As sis being decreased more minima will appear in
Vx. For the crab data, we find two new minima as s
is decreased to one-half. Nonetheless, the previous
minima become deeper and still dominate the scene. The
new minima are insignificant, in the sense that they lie
at high values (of order E). Classifying data points to
clusters according to their topographic location on the
321 0 1 2 3
FIG. 1. Ripley’s crab data [7] displayed on a plot of their sec-
ond and third principal components with a superimposed topo-
graphic map of Roberts’ probability distribution for s1p2.
surface of Vx, roughly the same clustering assignment
is expected for a range of svalues. One important
advantage of quantum clustering is that Esets the scale
on which minima are observed. Thus, we learn from
Fig. 2 that the cores of all 4 clusters can be found at V
values below 0.4E. In comparison, the additional maxima
of c, which start to appear at lower values of s, may lie
much lower than the leading maximum and may be hard
to locate numerically.
Principal component analysis (PCA).— In our example,
data were given in some high-dimensional space and we
analyzed them after defining a projection and a metric,
using the PCA approach. The latter defines a metric that is
intrinsic to the data, determined by second order statistics.
But, even then, several possibilities exist, leading to non-
equivalent results.
Principal component decomposition can be applied both
to the correlation matrix Cab xaxband to the covari-
ance matrix
Cab 具共xa2xa兲共xb2xb兲典Cab 2xaxb.
In both cases averaging is performed over all data points,
and the indices indicate spatial coordinates from 1 to d.
The principal components are the eigenvectors of these
matrices. Thus we have two natural bases in which to
represent the data. Moreover, one often renormalizes the
eigenvector projections, dividing them by the square roots
of their eigenvalues. This procedure is known as “whiten-
ing,” leading to a renormalized correlation or covariance
matrix of unity. This is a scale-free representation that
would naturally lead one to start with s1in the search
for (higher order) structure of the data.
21.5 10.5 0 0.5 1 1.5 2 2.5
FIG. 2. A topographic map of the potential for the crab data
with s1p2, displaying four minima (denoted by crossed
circles) that are interpreted as cluster centers. The contours of
the topographic map are set at values of Vx兲兾E0.2, 0.4, 0.6,
0.8, 1.
018702-2 018702-2
The PCA approach that we have used in our example
was based on whitened correlation matrix projections.
Had we used the covariance matrix instead, we would get
similar, but slightly worse, separation of the crab data.
Our example is meant to convince the reader that once a
good metric is found, QC conveys the correct information.
Hence we allowed ourselves to search first for the best
geometric representation, and then apply QC.
QC in higher dimensions.— Increasing dimensionality
means higher computational complexity, often limiting the
applicability of a numerical method. Nonetheless, here we
can overcome this “curse of dimensionality” by limiting
ourselves to evaluating Vat locations of data points only.
Since we are interested in where the minima lie, and since
invariably they lie near data points, no harm is done by
this limitation. The results are depicted in Fig. 3. Here
we analyzed the crab problem in a three-dimensional (3D)
space, spanned by the first three PCs. Shown in this fig-
ure are VEvalues as functions of the serial number of the
data, using the same symbols as in Fig. 2 to allow for com-
parison. Using all data of V,0.3E, one obtains cluster
cores that are well separated in space, corresponding to the
four classes that exist in the data. Only 9 of the 129 points
that obey V,0.3Eare misclassified by this procedure.
Adding higher PCs, first component 4 and then component
5, leads to deterioration in clustering quality. In particular,
lower cutoffs in VE, including lower fractions of data,
are required to define cluster cores that are well separated
in their relevant spaces.
One may locate the cluster centers, and deduce the clus-
tering allocation of the data, by following the dynamics of
gradient descent into the potential minima. By defining
yi0xi, one follows the steps of yit1Dtyit2
0 20 40 60 80 100 120 140 160 180 200
serial number
FIG. 3. Values of Vx兲兾Eare depicted in the crab problem
with three leading PCs for s12. They are presented as a
function of the serial number of the data, using the same symbols
of data employed previously. One observes low lying data of all
four classes.
ht=Vyit兲兲, letting the points yireach an asymptotic
fixed value coinciding with a cluster center. More sophis-
ticated minimum search algorithms (see, e.g., chapter 10
in [9]) can be applied to reach the fixed points faster. The
results of a gradient-descent procedure, applied to the 3D
analysis of the crab data shown in Fig. 3, are that the three
classes of data points 51 to 200 are clustered correctly with
only five misclassifications. The first class, data points
1– 50, has 31 points forming a new cluster, with most of the
rest joining the cluster of the second class. Only 3 points
of the first class fall outside the 4 clusters.
We also ran our method on the iris data set [10], which is
a standard benchmark obtainable from the UC Irvine (UCI)
repository [11]. The data set contains 150 instances, each
composed of four measurements of an iris flower. There
are three types of flowers, represented by 50 instances each.
Clustering of these data in the space of the first two princi-
pal components, using s14, has the amazing result of
misclassification of 3 instances only. Quantum clustering
can be applied to the raw data in four dimensions, leading
to misclassifications of the order of 15 instances, similar
to the clustering quality of [4].
Distance-based QC formulation.— Gradient descent
calls for the calculation of Vboth on the original data
points as well as on the trajectories they follow. An alter-
native approach can be to restrict oneself to the original
values of V, as in the example displayed in Fig. 3, and
follow a hybrid algorithm to be described below. Before
turning to such an algorithm let us note that, in this case,
we evaluate Von a discrete set of points VxiVi.
We can then express Vin terms of the distance matrix
Dij jxi2xjjas
ij e2D2
ij 2s2
with Echosen appropriately so that minVi0. This type
of formulation is of particular importance if the original
information is given in terms of distances between data
points rather than their locations in space. In this case we
have to proceed with distance information only.
By applying QC we can reach results such as in Fig. 3
without invoking any explicit spatial distribution of the
points in question. One may then analyze the results by
choosing a cutoff, e.g., V,0.2E, such that a fraction
(e.g., one-third) of the data will be included. On this sub-
set we select groups of points whose distances from one
another are smaller than, e.g., 2s, thus defining cores of
clusters. Then we continue with higher values of V, e.g.,
0.2E,V,0.4E, allocating points to previous clusters
or forming new cores. Since the choice of distance cutoff
in cluster allocation is quite arbitrary, this method can-
not be guaranteed to work as well as the gradient-descent
Generalization.— Our method can be easily generalized
to allow for different weighting of different points, as in
018702-3 018702-3
with ci$0. This is important if we have some prior in-
formation or some other means for emphasizing or deem-
phasizing the influence of data points. An example of the
latter is using QC in conjunction with SVC [6]. SVC has
the possibility of labeling points as outliers. This is done
by applying quadratic maximization to the Lagrangian
over the space of all 0#b
pN subject to the constraint
Pibi1. The points for which the upper bound of biis
reached are labeled as outliers. Their number is regulated
by p, being limited by pN . Using for the QC analysis
a choice of ci
pN 2b
iwill eliminate the outliers of
SVC and emphasize the role of the points expected to lie
within the clusters.
Discussion.— QC constructs a potential function Vx
on the basis of data points, using one parameter, s, that
controls the width of the structures that we search for. The
advantage of the potential Vover the scale-space proba-
bility distribution is that the minima of the former are
better defined (deep and robust) than the maxima of the
latter. However, both of these methods put the empha-
sis on cluster centers, rather than, e.g., cluster boundaries.
Since the equipotentials of Vmay take arbitrary shapes,
the clusters are not spherical, as in the k-means approach.
Nonethelss, spherical clusters appear more naturally than,
e.g., ring-shaped or toroidal clusters, even if the data would
accommodate them. If some global symmetry is to be ex-
pected, e.g., global spherical symmetry, it can be incorpo-
rated into the original Schrödinger equation defining the
potential function.
QC can be applied in high dimensions by limiting the
evaluation of the potential, given as an explicit analytic
expression of Gaussian terms, to locations of data points
only. Thus the complexity of evaluating Viis of order N2
independent of dimensionality.
Our algorithm has one free parameter, the scale s. In all
examples we confined ourselves to scales that are of order
1, because we have worked within whitened PCA spaces.
If our method is applied to a different data space, the range
of scales to be searched for could be determined by some
other prior information.
Since the strength of our algorithm lies in the easy se-
lection of cluster cores, it can be used as a first stage of
a hybrid approach employing other techniques after the
identification of cluster centers. The fact that we do not
have to take care of feeble minima, but consider only ro-
bust deep minima, turns the identification of a core into an
easy problem. Thus, an approach that drives its rationale
from physical intuition in quantum mechanics can lead to
interesting results in the field of pattern classification.
We thank B. Reznik for a helpful discussion.
[1] A. K. Jain and R. C. Dubes, Algorithms for Clustering
Data (Prentice-Hall, Englewood Cliffs, NJ, 1988).
[2] K. Fukunaga, Introduction to Statistical Pattern Recogni-
tion (Academic Press, San Diego, CA, 1990).
[3] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classifica-
tion (Wiley-Interscience, New York, 2001), 2nd ed.
[4] M. Blat, S. Wiseman, and E. Domany, Phys. Rev. Lett. 76,
3251 (1996).
[5] S. J. Roberts, Pattern Recognit. 30,261 (1997).
[6] A. Ben-Hur, D. Horn, H. T. Siegelmann, and V. Vapnik,
in Proceedings of the Conference on Advances in Neural
Information Processing Systems 13, 2000, edited by Todd
K. Leen, Thomas G. Dietterich, and Volker Tresp (MIT
Press, Cambridge, MA, 2001), p. 367.
[7] B. D. Ripley, Pattern Recognition and Neural Networks
(Cambridge University Press, Cambridge, UK, 1996).
[8] I. T. Jolliffe, Principal Component Analysis (Springer-
Verlag, New York, 1986).
[9] W. H. Press, S. A. Teuklosky, W. T. Vetterling, and B. P.
Flannery, Numerical Recipes — The Ar t of Scient ific Com-
puting (Cambridge University, Cambridge, UK, 1992),
2nd ed.
[10] R. A. Fisher, Ann. Eugenics 7,179 (1936).
[11] C. L. Blake and C. J. Merz, UCI repository of machine
learning databases, 1998.
018702-4 018702-4
... The Quantum clustering [20] is a novel clustering method based on the Schrödinger equation. QC calculates the so-called potential function to reveal the structure of the data. ...
... In this section, we begin with a description of the fundamentals of Quantum Clustering [20,21,31,32]. And then we list the pseudocode of the key steps of the algorithm. ...
The article introduces a new method for applying Quantum Clustering to graph structures. Quantum Clustering (QC) is a novel density-based unsupervised learning method that determines cluster centers by constructing a potential function. In this method, we use the Graph Gradient Descent algorithm to find the centers of clusters. GPU parallelization is utilized for computing potential values. We also conducted experiments on five widely used datasets and evaluated using four indicators. The results show superior performance of the method. Finally, we discuss the influence of $\sigma$ on the experimental results.
... Let's look at the eigenstates generated by this step. They represent Quantum Clustering of the data -which was not apparent before the process of performing quantum time evolution was completed [16]. ...
... As mentioned, previous studies have used the highly subjective method of visual confirmation to determine when the clusters have sufficiently formed and, thus, the point at which the process should be terminated. [16,29] Even though visual confirmation can serve as an acceptable strategy to find some proper clustering solutions, it is not a reliable quantitative method to compare the optimality of clustering associated with each step. Also, we do not know how far we should continue searching for formations in the case of multiple possible clustering data methods. ...
Full-text available
Dynamic quantum clustering (DQC) is a quantum algorithm to find possible data clusters. DQC uses quantum states to represent the clusters and the time evolution of the quantum states to predict different ways to match the data to clusters. There are several advantages to this method, as: (1) there is no need to specify the number of clusters; (2) there is no need to reduce the data; (3) hidden patterns within the main pattern can be studied; and (4) it eliminates the need for an operator to determine the point at which to terminate the process visually through the effective implementation of a fitness function. This paper introduces von Neumann entropy as a valuable metric that can be used to evaluate the DQC algorithm's results. Enhanced DQC can also show if there are different possible ways to cluster the data.
... Quantum approaches provide alternative representations of data sets, thus leading to different, potentially better, solutions. This includes quantum clustering [16,17,21], different quantum versions of classical AI models [11,22,23,24], or quantum Reinforcement Learning (RL) [20,25]. ...
... Let us review briefly the resources required by other quantum clustering approaches. Firstly, the original algorithm in Ref. 9 uses a Schroedinger approach, searching the potential V(x) for which a a wavefunction ψ(x) corresponds to estimator of the probability distribution of the data points. As such, this approach is an inverse optimization problem (i.e. ...
Full-text available
Here we present a quantum algorithm for clustering data based on a variational quantum circuit. The algorithm allows to classify data into many clusters, and can easily be implemented in few-qubit Noisy Intermediate-Scale Quantum devices. The idea of the algorithm relies on reducing the clustering problem to an optimization, and then solving it via a Variational Quantum Eigensolver combined with non-orthogonal qubit states. In practice, the method uses maximally-orthogonal states of the target Hilbert space instead of the usual computational basis, allowing for a large number of clusters to be considered even with few qubits. We benchmark the algorithm with numerical simulations using real datasets, showing excellent performance even with one single qubit. Moreover, a tensor network simulation of the algorithm implements, by construction, a quantum-inspired clustering algorithm that can run on current classical hardware.
... Data clustering (Xu and Wunsch 2005;Na, Xumin, and Yong 2010;Likas, Vlassis, and Verbeek 2003) is one of the best ways to identify the patterns in unlabeled datasets, as it divides a set of objects into different categories. Clustering algorithms have a wide range of use in machine learning (McGregor et al. 2004;Madhulatha 2012), data mining (Mehryar, Afshin, and Talwalkar 2018), image processing (Coleman and Andrews 1979;Dehariya, Shrivastava, and Jain 2010;Sharma and Suji 2016), pattern recognition (Baraldi and Blonda 1999;Diday et al. 1981;Horn and Gottlieb 2001), and banking (Shavlik, Deitterich, and Dietterich 1991). The unsupervised nature of the clustering algorithms and their lack of knowledge of the data types and clustering objectives have led to them not working correctly in some cases. ...
Full-text available
Data clustering has many applications in medical sciences, banking, and data mining. K-means is the most popular data clustering algorithm due to its efficiency and simplicity of implementation. However, K-means has some limitations, which may affect its effectiveness, such as all the features having the same degree of importance. To address these limitations and improve K-means accuracy, we adopt the Biogeography-Based Optimization (BBO) algorithm to select the most relevant features of datasets. Our primary idea is to reduce the intra-cluster distance while increasing the distance between clusters.
... The primary goal of clustering is to group units based on their distinguishing characteristics. This method has the advantage of being the simplest to understand can be used in a variety of fields, such as financial risk analysis [4], pattern recognition [5], and biology [6]. ...
Full-text available
Societies take various initiatives to reduce the impact of natural disasters. Unfortunately, certain nations and regions are better suited than others to finding solutions to the problem, whether for political, cultural, economic, or other factors. This paper deals with the cluster analysis of 170 countries based on world risk index and climate risk index data. We use the k-means approach for clustering in sequential stages of this work. Specifically, we first carry out both the elbow method and silhouette scores to determine the number of clusters. Then clustering analysis is carried out, taking into account the World Risk Index, which includes risks of both exposure and vulnerability. Second, the Climate Risk Index is implemented into the first stage results by clustering countries after determining the number of clusters. Lastly, statistical analyses on the change of clusters for exposure, vulnerability, and climate risk are investigated and discussed in detail. Taken together, each of the risk elements like earthquake, tsunami, socioeconomic development, health care capability, etc. differs by nation. Clusters of countries with similar risks are reported. When the climate risk index is included in the evaluation, the number of clusters increases. The Climate Risk Index has been determined as a variable that cannot be ignored when countries are clustered according to their risk profiles.
The purpose of clustering is to identify distributions and patterns within unlabelled datasets. Since the proposal of the original synchronization clustering (SynC) algorithm in 2010, synchronization clustering has become a significant research direction. This paper proposes a shrinking synchronization clustering (SSynC) algorithm utilizing a linear weighted Vicsek model. SSynC algorithm is developed from SynC algorithm and a more effective synchronization clustering (ESynC) algorithm. Through analysis and comparison, we find that SSynC algorithm demonstrates superior synchronization effect compared to SynC algorithm, which is based on an extensive Kuramoto model. Additionally, it exhibits similar effect to ESynC algorithm, based on a linear version of Vicsek model. In the simulations, a comparison is conducted between several synchronization clustering algorithms and classical clustering algorithms. Through experiments using some artificial datasets, eight real datasets and three picture datasets, we observe that compared to SynC algorithm, SSynC algorithm not only achieves a better local synchronization effect but also requires fewer iterations and incurs lower time costs. Furthermore, when compared to ESynC algorithm, SSynC algorithm obtains reduced time costs while achieving nearly the same local synchronization effect and the same number of iterations. Extensive comparison experiments with some class clustering algorithms demonstrate the effectiveness of SSynC algorithm.
This passage introduces a new clustering approach called BYOL network-based Contrastive Clustering (BCC). This methodology builds on the BYOL framework, which consists of two co-optimized networks: the online and target networks. The online network aims to predict the outputs of the target network while maintaining the similarity relationship between views. The target network is stop-gradient and only updated by EMA of the online network. Additionally, the study incorporates the concept of adversarial learning into the approach to further refine the cluster assignments. The effectiveness of BCC is demonstrated on several mainstream image datasets, achieving impressive results without the need for negative samples or a large batch size. This research showcases the feasibility of using the BYOL architecture for clustering and proposes a novel clustering method that eliminates the problems bring by negative samples and reduce the computational complexity.KeywordsDeep ClusteringContrastive LearningUnsupervised LearningAdversarial Learning
Full-text available
1. To test the mode of functional connectivity in the basal gan- glia circuitry, we studied the activity of simultaneously recorded neurons in the globus pallidus (GP) of a behaving rhesus monkey. The cross-correlograms of pairs of neurons in the GP were com- pared with those of neurons in the thalamus and frontal cortex and to the cross-correlograms of pallidal pairs after 1 -methyl-4-phenyl- 1,2,3,6-tetrahydropyridine (MPTP) treatment. 2. In contrast with cortical and thalamic neuronal activity, al- most all pairs (n = 76/81 pairs; 93.8%, 1,629/1,651 histograms; 98.7%) of GP neurons in the normal monkey were not driven by a common input. 3. The monkey was systemically treated with MPTP until the appearance of parkinsonian signs and an intermittent 7- to 1 l-Hz action/postural tremor. After the MPTP treatment, many pallidal neurons (49/140; 35%) became oscillatory, and 19% (n = 31/ 162) of pallidal pairs had oscillatory cross-correlograms. 4. These results support the model of parallel processing in the basal ganglia of normal monkeys and suggest a breakdown of the
Full-text available
The physical aspects of a recently introduced method for data clustering are considered in detail. This method is based on an inhomogeneous Potts model; no assumption concerning the underlying distribution of the data is made. A Potts spin is assigned to each data point and short range interactions between neighboring points are introduced. Spin-spin correlations (measured by Monte Carlo computations) serve to partition the data points into clusters. In this paper we examine the effects of varying different details of the method such as the definition of neighbors, the type of interaction, and the number of Potts states q. In addition, we present and solve a granular mean field Potts model relevant to the clustering method. The model consists of strongly coupled groups of spins coupled to noise spins, which are themselves weakly coupled. The phase diagram is computed by solving analytically the model in various limits. Our main result is that in the range of parameters of interest the existence of the superparamagnetic phase is independent of the ordering process of the noise spins. Next we use the known properties of regular and inhomogeneous Potts models in finite dimensions to discuss the performance of the clustering method. In particular, the spatial resolution of the clustering method is argued to be connected to the correlation length of spin fluctuations. The behavior of the method, as more and more data points are sampled, is also investigated.
Principal component analysis has often been dealt with in textbooks as a special case of factor analysis, and this tendency has been continued by many computer packages which treat PCA as one option in a program for factor analysis—see Appendix A2. This view is misguided since PCA and factor analysis, as usually defined, are really quite distinct techniques. The confusion may have arisen, in part, because of Hotelling’s (1933) original paper, in which principal components were introduced in the context of providing a small number of ‘more fundamental’ variables which determine the values of the p original variables. This is very much in the spirit of the factor model introduced in Section 7.1, although Girschick (1936) indicates that there were soon criticisms of Hotelling’s method of PCs, as being inappropriate for factor analysis. Further confusion results from the fact that practitioners of ‘factor analysis’ do not always have the same definition of the technique (see Jackson, 1981). The definition adopted in this chapter is, however, fairly standard.
Principal component analysis is central to the study of multivariate data. Although one of the earliest multivariate techniques it continues to be the subject of much research, ranging from new model- based approaches to algorithmic ideas from neural networks. It is extremely versatile with applications in many disciplines. The first edition of this book was the first comprehensive text written solely on principal component analysis. The second edition updates and substantially expands the original version, and is once again the definitive text on the subject. It includes core material, current research and a wide range of applications. Its length is nearly double that of the first edition. Researchers in statistics, or in other fields that use principal component analysis, will find that the book gives an authoritative yet accessible account of the subject. It is also a valuable resource for graduate courses in multivariate analysis. The book requires some knowledge of matrix algebra. Ian Jolliffe is
Much work has been published on methods for assessing the probable number of clusters or structures within unknown data sets. This paper aims to look in more detail at two methods, a broad parametric method, based around the assumption of Gaussian clusters and the other a non-parametric method which utilises methods of scale-space filtering to extract robust structures within a data set. It is shown that, whilst both methods are capable of determining cluster validity for data sets in which clusters tend towards a multivariate Gaussian distribution, the parametric method inevitably fails for clusters which have a non-Gaussian structure whilst the scale-space method is more robust.