Available via license: CC BY 4.0
Content may be subject to copyright.
1
Detection and Evaluation of
Clusters within Sequential Data
Alexander Van Werde, Albert Senen–Cerda, Gianluca Kosmella, Jaron Sanders
F
Abstract
—Motivated by theoretical advancements in dimensionality
reduction techniques we use a recent model, called Block Markov
Chains, to conduct a practical study of clustering in real-world sequential
data. Clustering algorithms for Block Markov Chains possess theoretical
optimality guarantees and can be deployed in sparse data regimes. Despite
these favorable theoretical properties, a thorough evaluation of these
algorithms in realistic settings has been lacking.
We address this issue and investigate the suitability of these clustering
algorithms in exploratory data analysis of real-world sequential data. In
particular, our sequential data is derived from human DNA, written text,
animal movement data and financial markets. In order to evaluate the
determined clusters, and the associated Block Markov Chain model,
we further develop a set of evaluation tools. These tools include
benchmarking, spectral noise analysis and statistical model selection
tools. An efficient implementation of the clustering algorithm and the
new evaluation tools is made available together with this paper.
Practical challenges associated to real-world data are encountered
and discussed. It is ultimately found that the Block Markov Chain model
assumption, together with the tools developed here, can indeed produce
meaningful insights in exploratory data analyses despite the complexity
and sparsity of real-world data.
1 Introduction
Modern data often consists of observations that were obtained
from some complex process, and that became available se-
quentially. The specific order in which the observations occur
often matters: future observations correlate typically with past
observations. By identifying a relation between subsequent ob-
servations within sequential data one may hope to gain insight
into the underlying complex process. The high-dimensional
nature of modern data however can make understanding the
sequential structure difficult. Many algorithms which one
may want to apply to the data namely slow down to an
infeasible degree in a high-dimensional regime. Further, human
interpretation of the data may become tedious. To avoid these
issues it is desirable to identify a latent structure which respects
the sequential structure but has reduced dimensions.
In this paper we are concerned with uncovering lower-
dimensional structures within sequential data when this latent
structure is hidden from the observer but still detectable
from the specific sequence of observations. Practical tools
The authors are with the Faculty of Mathematics and Computer
Science, Eindhoven University of Technology, Groene Loper 5, 5612
AZ Eindhoven, The Netherlands.
E-mail: a.van.werde@tue.nl, a.senen.cerda@tue.nl,
g.k.kosmella@tue.nl, jaron.sanders@tue.nl
for the discovery, evaluation, and presentation of the lower-
dimensional structures from real-world sequential data are
developed and applied. The term “real-world data” is here
in contrast to synthetic data. For synthetic data one namely
often has prior knowledge concerning a ground-truth latent
structure, and this facilitates evaluation of latent structures
which are output by an algorithm. Such prior knowledge is not
available for real-world data, complicating the evaluation.
We focus on a popular class of methods for discovering
latent structure in datasets: clustering algorithms. Clustering
algorithms work by clustering together data points from a
dataset that are “similar” in some sense. If for instance data
has a geometric structure for which a notion of distance
is applicable then one may call two points similar if their
distance is small. This distance-based notion of similarity
can be leveraged with the well-known
K
-means algorithm
for clustering point clouds. If instead the data has a graph
structure then it is natural to call two vertices of the graph
similar if they connect to other vertices in similar ways. This
connection-based notion is made rigorous in the Stochastic
Block Model for random graphs.
A natural notion of similarity for sequential observations,
whose order matters, may similarly be given by the following
informal criterion: “two observations are similar if and only
if they transition into other observations in similar ways.”
This transition-based notion is made formal in Block Markov
Chains (BMCs). Specifically, the BMC-model assumes that the
observations are the states of a Markov Chain (MC) in which
the state space can be partitioned in such a manner that the
transition rate between two states only depends on the parts
of the partition (clusters) in which these two states lie.
The problem of clustering the observations in a single
sequence of observations of a BMC was recently investigated
theoretically [
1
], [
2
]. For example, given a sample path gener-
ated by a BMC, the authors of [
2
] provide information-theoretic
thresholds below which exact clustering is impossible because
insufficient data is available. The authors also provide a
clustering algorithm which can provably recover the underlying
clusters whenever these conditions are satisfied. The clustering
algorithm consists of two steps. First, an initial guess for the
underlying cluster structure is found by means of a spectral
initialization. Here the term spectral refers to the fact that
singular vectors of a random matrix associated to the sample
path are employed. These singular vectors are used to construct
a low-rank approximation of the random matrix after which
a
K
-means algorithm is applied. Second, the initial guess for
arXiv:2210.01679v1 [cs.LG] 4 Oct 2022
2
the cluster structure is refined by means of an improvement
algorithm that reconsiders the sequence of observations and
performs a greedy, local maximization of the log-likelihood
function of the BMC model.
A broad study on the performance of this clustering
algorithm for sequential data obtained from real-world pro-
cesses was however not provided. We tackle this problem
by investigating the clustering algorithm under a variety of
scenarios, and thus demonstrating its applicability in real-
world data. This is the main contribution of the present paper.
Evaluating the actual performance of such clustering algorithm
in a real-world scenario is however a nontrivial task since the
latent cluster structure is unknown, as opposed to a scenario
where data is generated synthetically. Hence, we also propose a
set of tools used for evaluation of the quality of the clusters and
the BMC model. We go, therefore, a step beyond clustering
benchmark evaluation and use statistical tools to assess the
validity of the BMC assumption in the data.
Our goal is here not to compare the performance of the
BMC clustering algorithm relative to other algorithms. A
comparison would namely only be fair if both algorithms have
access to the same amount of training data. Now remark
that the BMC algorithm is explicitly designed to manage in
sparse regimes where the amount of data is small. On the
other hand most model-free algorithms, such as those based
on deep learning, perform optimally in a regime where one has
access to large amounts of training data. A comparison would
consequently be vacuous; its outcome mainly being determined
by the choice of the amount of training data. Our goal is rather
to supplement the theoretical understanding of the BMC-based
algorithm with a practical viewpoint.
In order to practically evaluate the clustering algorithm for
BMCs, we investigate the following questions:
•
How can the BMC model practically aid in data explo-
ration of nonsynthetic sequential data obtained from real-
world data?
•
Can the quality of the found clusters be evaluated when
no ground-truth clustering is known?
•
How can one statistically decide whether the BMC model
is an appropriate model for the sequence of observations?
How can it be detected that either a simpler model than a
BMC would suffice, or a richer model is required?
•
Can the algorithm be expected to give meaningful results
despite the sparsity and complexity of real-world data?
Are the clustering algorithms robust to model violations?
To obtain insight into these questions, we develop practical
tools for evaluating the merit of a detected clustering, which
we then apply to several data sets. These evaluation tools
are designed to cope with the aforementioned fact that
ground-truth clusters are not known to us. The datasets
which we consider are diverse as they come from the fields
of microbiology, natural language processing, ethology, and
finance. Specifically, we investigate sequences of:
a. Codons in human Deoxyribonucleic Acid (DNA).
b. Words in Wikipedia articles.
c.
Global Positioning System (GPS) data describing the
movement of bison in a field.
d.
Companies in the Standard and Poor’s 500 (S&P500) with
the highest daily return.
On each dataset we apply the BMC clustering algorithm to
uncover underlying clusters, and we evaluate these clusters
using the collection of tools which are developed in the present
paper. It is here found that the BMC model indeed aids in
exploratory data analyses of real-world sequential data.
For example, in DNA the algorithm leads us to rediscover
phenomena which are known in the biological community
as codon–pair bias and dinucleotide bias. In the text-based
sequential data we find that the BMC-based improvement
algorithm improves performance on downstream natural lan-
guage processing tasks. The model evaluation tools of the
present paper are here found to be informative. They namely
uncover some model violations which are suggestive for future
methodological expansion. Our findings in the GPS data are
particularly striking. There, a scatter plot of the data gives
rise to a picture which is difficult to interpret. After cluster-
ing, a picture can be displayed which provides significantly
more insight; compare Figure 3 and Figure 6. It is further
notable that the sequential structure of the data gives rise to
geographical features which would have been difficult, if not
impossible, to extract based on solely a distance-based notion
of similarity as is employed in
K
-means. The S&P500 dataset
gives the least clear conclusions out of the four datasets but is a
good illustrative example for our evaluation tools in a difficult
setting. The difficulty of this dataset is due to the combination
of sparsity and a nuisance factor. With these different examples,
along with the evaluation tools developed, we answer positively
the highlighted questions posed previously, and we conclude
that the BMC-based clustering can be successfully applied to
real-world sequential data.
Let us finally contemplate the practical implication of our
findings as it pertains to using clustering within sequential data
as a method to speed up a subsequent optimization procedure.
Consider that canonical machine learning algorithms that one
would like to apply to sequential data, such as
Q
-learning
or
SARSA
-learning, slow down to an impractical degree if
the observations are from a high-dimensional state space.
By solving such learning problems on an accurate lower-
dimensional representation of the state space, the numerical
complexity can be reduced dramatically [
3
], [
4
]. The present
paper suggest that future integration of clustering in sequential
data applications such as natural language processing and
machine learning is indeed practically feasible.
Structure of this paper
We introduce the problem of clustering in sequential data in
Section 2. We describe the BMC as well as other models that
appear in our experiments in Section 3, and briefly discuss
the advantages of a model-based approach. Next, we give an
overview of related literature in Section 4, and we introduce
the clustering algorithm in Section 5. We describe there also
our C
++
implementation of this clustering algorithm, which we
made publicly available as a Python library. Section 6 describes
practical tools to evaluate clusters found in datasets in the
absence of knowledge on the underlying ground truth. Sec-
tion 7 introduces the datasets and explains our preprocessing
procedures; Sections 8, 9 then extensively evaluate the clusters.
Finally, Section 10 concludes with a brief summary.
3
2 Problem formulation
We suppose that we have obtained an ordered sequence of
`∈N+discrete observations
X1:`:= X1→X2→ ·· · → X`(1)
from some complex process. The observations can be real
numbers (one- or higher-dimensional) or abstract system states;
as long as the observations come from a finite set. We assume
specifically that there exists a number
n∈N+
such that the
sequence satisfies
Xt∈
[
n
] :=
{
1
, . . . , n}
for
t∈
[
`
]. Here,
n
can
be interpreted as the number of distinct, discrete observations
that are possible.
Given such ordered sequence of observations, we wonder
whether there exists a map
σn
: [
n
]
→
[
K
]with 1
≤K≤n
an
integer, such that the ordered sequence
σn(X1:`) := σn(X1)→σn(X2)→ ·· · → σn(X`)(2)
captures dynamics of the underlying complex process. Observe
that the map σndefines Kclusters:
Vk:= i∈[n]|σn(i) = k(3)
for
k∈
[
K
]. Furthermore,
Vk∩ Vl
=
∅
whenever
k6
=
l
and
∪K
k=1Vk= [n].
The clusters
V1,...,VK
are particularly interesting when
Kn
. In such a case the clustered process
{σn
(
Xt
)
}t
lives in a much smaller observation space than the original
process
{Xt}t
. The reduction may then prove to be beneficial
for computational tasks since the time complexity of some
algorithms depends on the size of the observation space. If
(2)
furthermore indeed captures the dynamics of the complex
process, then it is not unreasonable to expect that the clusters
Vk
could themselves be meaningful thus allowing for human
interpretation of the data.
3 Models
3.1 Main model: BMC
The main model in this paper is given by BMCs. Formally, a
1st-order BMC is a discrete-time stochastic process
{Xt}t≥0
on
a state space V:= [n]that satisfies not only the MC property
P[Xt+1 =j|Xt=i, Xt−1=it−1,...X0=i0]
=P[Xt+1 =j|Xt=i]∀j, i, it−1, . . . , i0∈[n]; (4)
but also that there exists a cluster assignment map
σn
: [
n
]
→
[
K
]such that there exists a stochastic matrix
p∈RK×K
with
Pi,j := P[Xt+1 =j|Xt=i] = pσn(i),σn(j)
#Vσn(j)
.(5)
Here
Vk
is defined as in
(3)
. The
pk,l ∈
[0
,
1] are called the
cluster transition probabilities and satisfy
PK
l=1 pk,l
= 1 for
k∈
[
K
]. The matrix (
Pi,j
)
n
i=1,j=1
is called the state transition
matrix. Figure 1 depicts a BMC on K= 3 clusters.
The BMC-model can be viewed as an ideal case for the
setup of
(2)
. The reduced process
{σn
(
Xt
)
}t
namely not only
captures some part of the dynamics of the true process but
rather all of the order-dependent dynamics. Indeed, observe
that for any
t >
1it holds that conditional on
σn
(
Xt
) =
k
the
observation
Xt
is chosen uniformly at random in the cluster
Vk
. The previous state
Xt−1
hence influences the next cluster
σn
(
Xt
)but does not provide any further information about
the precise element in Vσn(Xt).
If the MC associated to
p
is ergodic, then the BMC has
a unique state equilibrium distribution Π
∈
[0
,
1]
n
. This state
equilibrium distribution moreover has the symmetry property
that Π
j
only depends on the cluster assignment
σn
(
j
)for any
j∈[n]. That is, for any initial value i0∈[n]
Πj:= lim
t→∞ P[Xt=j|X0=i0](6)
=1
#Vσn(j)
lim
t→∞ P[σn(Xt) = σn(j)|σn(X0) = σn(i0)]
=: πσn(j)
#Vσn(j)
.
Note that
π∈
[0
,
1]
K
is here the cluster equilibrium distri-
bution of the MC on [
K
]which is associated to the cluster
transition matrix
p
. One can characterize
π
as the unique
column vector satisfying πTp=πand PK
k=1 πk= 1.
Fig. 1. A visualization of a BMC with
K
= 3 clusters and
p
=
[[0
.
9
,
0
.
1
,
0]
,
[0
,
0
.
1
,
0
.
9]
,
[0
.
3
,
0
.
7
,
0]]. The thick arrows visualize to the
cluster transition probabilities
pk,l
, while the thin arrows visualize the
transitions of a sample path {Xt}t. Figure courtesy of [5].
3.2 Other models for experimentation
Recall that one of our goals is to develop tools which aid in
evaluating whether the BMC model is an appropriate model.
In this setting it is oftentimes useful to have some alternative
models to compare with. The models that we have used are
collected here for easy reference.
3.2.1 0th-order BMCs
Let
K∈
[
n
]and consider an arbitrary probability distribution
η
: [
K
]
→
[0
,
1]. A 0th-order BMC is then a BMC with
cluster transition matrix
pk,l
:=
ηl
for all
k, l ∈
[
K
]. The
0th-order BMC will serve as a benchmark to assert whether
the structures we find are actually due to the sequential nature
of the process and do not admit a simpler explanation.
Namely, observe that in a 0th-order BMC each next
sample
Xt+1
is independent of the previous sample
Xt
. A
0th-order BMC therefore generates sequences of independent
and identically distributed random variables. This is contrary
to a 1st-order BMC, which generates a sequence of dependent
random variables. The probability of a specific observation
does depend on the cluster of the observation, and specifically
is identical for every observation within that cluster.
3.2.2 rth-order MCs
Conversely, one can also consider models with higher-order
dependencies than the 1st-order BMC has.
Consider a discrete-time stochastic process
{Yt}`
t=1
(not
necessarily a MC) that satisfies Yt∈[n]for some n∈N+.
4
We say that
{Yt}t≥1
is an
r
th-order MC if and only if for
all t∈[`−r], all ir= (i1, . . . , ir)∈[n]rand j∈[n],
P[Yt+1 =j|Yt=ir, Yt−1=ir−1, . . . , Yt−r+1 =i1,(7)
Yt−r=st−r,,...,Y1=s1]
=P[Yt+1 =j|Yt=ir, Yt−1=ir−1, . . . , Yt−r+1 =i1] =: Pr
ir,j
for some transition matrix Pr∈[0,1]nr×n.
3.2.3 Perturbed BMCs
Finally, we consider an alternative model which concerns the
scenario where a BMC captures the dynamics only partially.
Specifically, a perturbed BMC mixes a 1st-order BMC on [
n
]
that has transition matrix
PBMC
with a generic 1st-order MC
on [
n
]that has transition matrix ∆by consideration of the MC
with transition matrix
PPerturbed := (1 −ε)PBMC +ε∆.(8)
The parameter
ε∈
[0
,
1] measures how many transitions are
affected by the non-BMC part ∆.
Concretely: let
{Bt}t≥0
denote a sequence of independent,
identically distributed Bernoulli random variables, each taking
the value 1with probability
ε
. The perturbed BMC cor-
responds to the MC
{Xε
t}t≥0
whose conditional transition
probabilities are given by
P[Xε
t+1 =j|Xε
t=i, Bt=b] = (PBMC,ij if b= 0,
∆ij otherwise.(9)
In other words, a sequence
Xε
0→ ·· · → Xε
`
from the perturbed
BMC is generated by randomly selecting either the transition
matrix
PBMC
of a 1st-order BMC, or the transition matrix ∆
of some other 1st-order MC, for each transition. Whenever we
use a perturbed BMC, we specify ∆on the spot.
3.3 Concerning model misspecification
It is unlikely that the complex process
{Xt}t
in
(1)
is exactly
a BMC. One may consequently wonder about the dangers of
model misspecification:
(a)
Is the clustering algorithm robust to violations of the
model assumption?
(b)
When concerned with executing a downstream task on
X1:`
, does the BMC model assumption provide any benefit
when compared to models with fewer assumptions?
In this regard we would like to point out that the data which
we consider is not only complex but oftentimes also sparse. Let
us illustrate the principle by a numerical experiment whose
precise setup may be found in Supplement 12.
To model a violation of the model assumptions while
retaining a sensible notion of ground-truth communities we
considered the perturbed BMC model as defined in Section 3.2.
Concerning (a), we find that for small perturbation levels
ε
it is still possible to exactly recover the underlying clusters
based on the construction of the algorithm; see Figure 2(a).
Concerning (b), we consider the scenario where the goal is
to estimate the underlying transition kernel
P
of a Markovian
process based on a sample path of length
`
; see Figure 2(b).
We find that clustering worsens performance when
`
is large
because a lack of expressivity: the true kernel
P
is not exactly
a BMC-kernel. On the other hand, when
`
is small, the
clustering improves performance. This is because the reduction
in the number of parameters makes the algorithm less prone
to overfitting. The answer to (b) is thus that it can be
advantageous to rely on the BMC model assumption when
data is sparse.
4 Related literature
This section provides references that theoretically study the
problem of clustering in BMCs and MCs, which have taken
inspiration from references on the problem of community
detection in random graphs. We also provide related references
on clustering and time-series, as well as on state space reduction
in decision theoretical problems. Finally, we give references on
statistical tools that we employ.
Clustering in BMCs and MCs
Cluster detection in BMCs was studied theoretically in [
1
], [
2
].
This research has yielded an algorithm that provably detects
clusters from the shortest sample paths whenever possible [
2
].
As evident from the proofs, studying the spectrum of BMC can
yield sharp bounds for recoverability [
2
]. Spectral properties of
the random matrices constructed from sample paths of BMCs
were investigated further in [6], [5]. The more recent paper [5]
proves convergence of a bulk of singular values to a limiting
distribution in the dense regime
`
= Θ(
n2
); a result that we
use and refine in our experiments (see Section 6.4).
Other related clustering algorithms that also use spectral
decompositions to learn low-rank structures from trajectories
of MCs are studied in [
7
], [
8
]. Finally, for scenarios in which
observations of a dynamical system are switched by a MC with
low-rank structure, [
9
] provides a method for the problem of
recovering a latent transition model. A similar objective is
considered in [10] for long sample paths.
Community detection in random graphs
Community detection in random graphs, such as those pro-
duced by the Stochastic Block Model, is an active area of
research. The distinction with clustering in BMCs is that the
vertices within a single observation of a random graph are
clustered, instead of the observations within sequential data.
We refer the reader to [
11
] for an extensive overview on cluster
recovery within the context of the Stochastic Block Model, and
to [12] for an overview on community detection in graphs.
Other clustering of sequential data
In the reviews [
13
], [
14
], research that relates to both clustering
and time-series/sequential data is divided into three categories.
Clustering between different time-series is called “whole-time-
series clustering.” Survey papers include [
14
], [
15
]. Another
category is clustering of subsequences of a time-series, where
individual time-series are extracted via a sliding window. In
2003 [
16
] it was shown that the algorithms present at that
time extracted essentially random and therefore meaningless
clusters, because they relied on assumptions unlikely to be
met by non-synthetic data. The papers [
17
], [
18
] are contribu-
tions to seeking to obtain meaningful subsequence time-series
clustering. Finally, there is clustering states of within a time-
series, which is called “time-point clustering,” under which for
example problems like segmenting an
n
-element sequence into
k
segments, which can come from
h
different sources, fall; see
e.g. [
19
]. Other examples referenced in this category are [
20
],
[
21
]. This category is closest to the clustering algorithm that
we employ.
5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
Perturbation level ε
Expected misclassification ratio E[E]
Stochastic Uniform
Degree 0
Heavy-Tailed
Sparse
(a)
104105106
10−1
100
Sample path length ℓ
Expected estimation error R∗(ℓ)
Empirical
BMC
Uniform
(b)
Fig. 2. (a) The expected proportion of states which are misclassified in terms of the perturbation level
ε
for four different perturbation models ∆and a
sample path of length
`n
=
b
30
nln
(
n
)
c
with a state space of and of size
n
= 500. (b) Estimated expected estimation error
R∗
(
`
) :=
E
[
kP−ˆ
P∗
(
`
)
k
]
for three different estimators. The ground-truth model in this experiment was a perturbed BMC with a heavy-tailed perturbation of strength
ε
= 0
.
05 on a state space of size
n
= 1000. In red: the empirical estimator
ˆ
PEmpirical
which is the maximum likelihood estimator for a Markov chain
with no additional assumptions. In blue: the BMC estimator
ˆ
PBMC
. In green: the trivial estimator
ˆ
PUniform,ij
:= 1
/n
which does not even use the
data. Let us emphasize that the precise values in (a) and (b) depend on the precise parameters of the BMC which was perturbed. The general shape
of the curves is expected to generalize but the precise numbers, e.g. that a perturbation up to ε≈0.1is tolerable, will depend on the specifics.
State space reduction in decision theoretical problems
Studying clustering in MCs is motivated by the necessity for
effective state space reduction techniques in decision theoretical
problems, for example in Reinforcement Learning, Markov
Decision Processs, and Multi-Armed Bandit problems. It is for
example known that learning a latent space reduces regret in
Multi-Armed Bandit problems [
22
], [
23
]. State aggregation
and low rank approximations methods have been studied
for Markov Decision Processes as well as Reinforcement
Learning, see [
24
] and [
25
], [
26
], [
27
], respectively. The idea to
cluster states in Reinforcement Learning based on the process’
trajectory was first explored in the seminal papers [3], [4].
A few related experiments from the fields of microbiology, natural
language processing, ethology, and finance
The first application of a MC model was by Andrei Markov, as
he investigated a sequence of 20000 letters in A. S. Pushkin’s
poem “Eugeny Onegin” [
28
]. Since, MCs and hidden Markov
models have been used in natural language processing [29].
Clustering in DNA, specifically clustering the sequence
of nucleotides or the sequence of codons as a MC has been
demonstrated [
30
], [
31
], [
32
], the current paper is the first time
that a BMC was used for this task.
Using similar means as in the animal movement data in
this paper, GPS coordinate sequences for New York City taxi
trips are investigated in [
1
], [
8
], [
5
]. For both examples the low-
dimensional representation reveals insight into taxi customer
and animal movement behavior, respectively. The taxi trip
data is however of quite different nature compared to the
animal movement data, because in the taxi trip data far away
entrance and drop-off locations can constitute a transition,
while in the animal movement the transitions between states
are only due to an animal moving from one area to another in
a time intervals of roughly not more than an hour.
In [
33
] the transition between the Dow Jones closing prices
are described as a MC close to equilibrium. Further references
for MC models in finance include [
34
], [
35
]. Other Markovian
models, like Hidden Markov Models, are also often used in the
analysis of financial data; see for example [36].
Related statistical tools
Based on a likelihood ratio test statistic and a chi-square test
statistic, [
37
] tests if two MCs on the same state space have
the same transition matrix. The same problem is considered in
[
38
] by considering a divergence test statistic. Further results
are discussed in [39, §3.4.2].
Testing for the order of a MC can be done by the chi-square-
and likelihood ratio-test statistic [
37
] and using
φ
-divergence
test statistics [
40
]. Further hypothesis tests for Markovian
models with random time observations are considered in [
41
]
and specifically a goodness-of-fit test on the distribution
is described. For selection of the order of a MC, general
information criteria may also be used [42], [43], [44].
5 Clustering algorithm
In this section we describe the clustering algorithm from [
2
]
which was designed to infer the map
σn
from the sample path of
a BMC. The reason we use this particular clustering algorithm
is that it has a mathematical guarantee that it can recover the
clusters of BMCs accurately even if the number of observations
`
is small compared to the number of possible transitions
n2
. This is useful for our purposes because observations are
generally noisy and few in practice.
The clustering algorithm in [
2
] first constructs an empir-
ical frequency matrix
ˆ
N
element-wise from the sequence of
observations X1:`: for i, j ∈[n],
ˆ
Nij :=
`−1
X
t=1
1[Xt=i, Xt+1 =j].(10)
Depending on the sparsity of the frequency matrix character-
ized by the ratio
`/n2
, regularization is applied by trimming:
all entries of rows and columns of
ˆ
N
corresponding to a desired
number of states with the largest degrees, which we denote by
Γ, are set to zero. The clustering algorithm then executes two
steps on the resulting trimmed frequency matrix ˆ
NΓ:
Step 1. Initialize with a spectral clustering algorithm.
Step 2. Iterate with a cluster improvement algorithm.
We provide pseudocode for these algorithms in Supplement 11.
Given some initial guess, here provided by a spectral
algorithm, the cluster improvement algorithm consists of local
6
optimization of a log-likelihood function by a hill climbing
procedure. The state space [
n
]and the number of clusters
K
are kept fixed which means that the free parameters are the
cluster transition matrix
p∈ {q∈
[0
,
1]
K×K
:
∀k, Plqk,l
= 1
}
and the cluster assignment map
σn
: [
n
]
→
[
K
]. Given an
observation sequence
X1:`
, the log-likelihood of the BMC-
model is given by
ˆ
L(X1:`|p, σn) :=
`−1
X
t=1
ln pXt,Xt+1
#Vσn(Xt+1)
.(11)
The reason that this two-step procedure is used instead of
directly maximizing
(11)
, is that finding the global maximizer
of
(11)
is numerically infeasible. Indicative of this numerical
complexity is the fact that the number of possible partitions
σn
, given by the partition function
ζ
(
n
), grows exponentially
as nincreases
ζ(n)∼(4n√3)−1exp (πp2n/3).(12)
The fact that the hill climbing procedure, which is compu-
tationally tractable, succeeds at exactly (resp. accurately)
recovering the true underlying parameters when initialized
with a spectral clustering is formally established in [
2
] in the
asymptotic regime where `=ω(nlog n)(resp. `=ω(n)).
5.1 BMCToolkit: A C++ library and Python module
We have programmed a Dynamic-link library (DLL) in C
++
,
called BMCToolkit, that can simulate and analyse trajectories
of BMCs. Among other functionalities, the DLL is able to
calculate both cluster and state variants of the equilibrium
distribution, frequency matrix, and transition matrix of a
BMC; to compute the difference between two clusters and the
spectral norm; to estimate the parameters of a BMC from
a sample path; to execute the spectral clustering algorithm
and the cluster improvement algorithm; to generate sample
paths and trimmed frequency matrices; and to relabel clusters
according to the size or the equilibrium probability of a cluster.
The DLL utilizes Eigen, a high-level DLL for linear algebra,
matrix, and vector operations; and the Sparse Eigenvalue
Computation Toolkit as a Redesigned ARPACK, a DLL for
large-scale eigenvalue problems built on top of Eigen. The
mathematical components of BMCToolkit were validated
through functional testing using Microsoft’s Native Unit Test
Framework. The performance of the numerical components
of BMCToolkit were finally benchmarked using Benchmark,
Google’s microbenchmark support library. Our source code
can be found at https://gitlab.tue.nl/acss/public/detection-
and-evaluation-of-clusters-within-sequential-data.
We also created a Python module called BMCToolkit, and
made it available at https://pypi.org/project/BMCToolkit/.
This Python module distributes the DLL mentioned above
and includes an easy-to-use Python interface. When compiling
BMCToolkit, we made sure to instruct the Microsoft Visual
C
++
compiler to activate the OpenMP extension to parallelize
the simulation across Central Processing Units and so that
Eigen could parallelize matrix multiplications (/openmp);
to apply maximum optimization (/O2); to enable enhanced
Central Processing Unit instruction sets (/arch:AVX2); and to
explicitly target 64-bit x64 hardware.
This approach of interfacing with a DLL written in C
++
,
and careful parallelization and compilation, outperformed
earlier versions of the module written entirely in Python
considerably. Ultimately, this enables us to tackle larger
sequences of observations with more distinct observations.
6 Methods for evaluating clusters and models
We next discuss methods which can aid in evaluating clusters
and models for sequential data obtained from real-world
processes. These methods have to account for the fact that,
since we are dealing with real-world nonsynthetic data, we
do not know the true process which generated the data. In
particular, we do not have access to a ground-truth clustering.
6.1 Performance on a downstream task
One reason to cluster observations of sequential data, is that
the clusters provide a tool for dimensionality reduction in
subsequent statistical analyses or optimization procedures.
For instance, the running time of a numerical method which
aims to execute some computational task on a sequence
of observations may grow considerably with the number of
distinct observations
n
. In such a case it is clear that one has
to reduce
n
or otherwise use a different algorithm. Reducing
n
can furthermore reduce issues associated with overfitting, and
aid in human interpretability.
On the other hand, clustering naturally removes some
information from the dataset. Thus, in a good clustering,
the data should retain as much useful information as is
possible. The meaning of “amount of useful information” is
here ambiguous and depends on the context.
There are cases, however, in which the amount of useful
information can be made concrete. Suppose for instance
that one has a measure of quality
Qpre-reduction
:=
Q
(
T
),
evaluating performance of a downstream task
T
:=
T
(
X1:`
)
applied to the sequence of observations. For example, if the
algorithm is estimating parameters of some parametric model,
then
Qpre-reduction
may be the accuracy of prediction on a
validation dataset. One can now use this measure of quality
Qpre-reduction
as a proxy for the notion of useful information in
a clustering. Given a clustering
σn
: [
n
]
→
[
K
]that reduces
the number of distinct observations to some 1
≤Kn
, one
can apply the numerical solution method to obtain a solution
˜
T
:=
T
(
σn
(
X1:`
)). The quantity
Qreduced
:=
Q
(
˜
T
)then allows
us to determine the quality of the clusters.
Using
Q
to determine the amount of useful information in
clusters can help compare the quality of a number of different
clusters which are output by different clustering algorithms.
It can also happen that
Qreduced > Qpre-reduction
due to the
reduction of noise within the sequence of grouped observations.
In fact, this effect may occur regardless of whether the task
is numerically challenging. In the scenario where the task is
numerically challenging, then the dimension reduction (from
n
to
K
) by the map
σn
still means that we can expect improved
performance over methods that do not cluster data when fixing
the computational budget.
In the following Sections 6.2 to 6.4 we discuss methods
which can also reveal whether the BMC model is appropriate,
and do not require some data-specific measure of quality.
6.2 Model selection with validation data
Section 6.1 mentioned that prediction of a validation dataset
can serve as a measure of quality
Q
. We now expand on this
idea.
7
6.2.1 Rescaled log-likelihood ratio
Assume we observe sequential data
X1:`
generated by some
ground-truth probability distribution
T
on [
n
]
`+1
. The law
T
can in principle be arbitrarily complex; for example, the
Markov property need not be satisfied. Note that, for nonsyn-
thetic data, we typically do not have access to the ground-truth
T
. Suppose however that we do have two candidate models
P
and
Q
which are also defined on [
n
]
`+1
. We then want to
determine whether
P
or
Q
is a better model based on the
observed sequential data X1:`.
For this purpose, we consider a log-likelihood ratio. Namely,
given x1:`∈[n]`+1, consider the quantity
ˆ
D(x1:`;P,Q) := 1
`ln P[X1,` =x1:`]
Q[X1:`=x1:`](13)
and its expectation
D(T;P,Q) := ET[ˆ
D(X1:`;P,Q)].(14)
Then, if
D
(
T
;
P,Q
)
>
0we consider
P
to be a better ap-
proximation of the ground truth
T
and if
D
(
T
;
P,Q
)
<
0we
consider
Q
to be a better approximation. In practice we can
not compute the expectation
ET
and instead consider the sign
of the empirical estimator ˆ
D(X1:`;P,Q).
In our experiments it is often the case that
P
and
Q
are
MCs on [
n
]whose transition matrices
P, Q ∈
[0
,
1]
n×n
are
known. In this case one can alternatively express (13) as
ˆ
D(x1:`;P,Q) = 1
`
`−1
X
t=1
ln Pxt,xt+1
Qxt,xt+1
.(15)
Confidence bounds for the estimation of
D
(
T
;
P,Q
)by
ˆ
D
(
X1:`
;
P,Q
)in this MC-setting are provided in Supple-
ment 13. It is there additionally assumed that
T
is a, possibly
time-inhomogeneous, MC whose mixing time is known.
6.2.2 Information-theoretic interpretation for D(T;P,Q)
Let us briefly note that
(14)
has an information-theoretic
interpretation. Namely, observe that
D(T;P,Q) = 1
`(KL(T;Q)−KL(T;P)) (16)
where KL denotes the Kullback–Leibler (KL) divergence
KL(T;P) := EZ1:`∼ThlnT(X1:`=Z1:`)
P(X1:`=Z1:`)i.(17)
One can interpret the quantity
KL
(
T
;
P
)as the expected
amount of discriminatory information revealing that
P
is not
quite the ground-truth probability distribution underlying the
sample path
X1:`
; see [
45
]. In many cases, such as when the
ground-truth
T
is an ergodic Markov chain, it further holds
that (17) grows linearly in terms of amount of data `.
Correspondingly, by
(16)
, one can view
D
(
T
;
P,Q
)as
measuring the rate of growth for discriminatory information
revealing that
P
is a better approximation for the ground
truth
T
than
Q
. To emphasize this perspective we will refer to
ˆ
D(X1:`;P,Q)as the KL divergence rate difference estimator.
6.2.3 Estimation when the models are inferred from the data
Our experiments routinely determine two different candidate
models that we wish to compare, from the same one sample
sequence available to us. Let us emphasize this fact by referring
to these candidate models as
ˆ
PX1:`and ˆ
QX1:`.(18)
Observe now that these two candidate models are a function
of the observed data
X1:`
. Substituting
(18)
into
(13)
could
consequently result in a biased estimator and typically favor
models with many parameters; the “optimal” model would be
the degenerate probability distribution assigning probability 1
to the observed X1:`.
To reduce the bias, we will use a holdout method. Specifi-
cally, we will split the trajectory into two parts: the first half
X1:b`/2c
will be used for training, and the second half
Xb`/2c+1:`
for validation. The quantity
ˆ
D(Xb`/2c+1:`;ˆ
PX1:b`/2c,ˆ
QX1:b`/2c)(19)
will likely still be a biased estimator of
D
(
T
;
P,Q
)due to the
dependence in the sequential data. The amount of bias should
be however be significantly reduced when compared to the
estimator obtained when substituting (18) into (13).
Note that
(19)
can be viewed as a measure of quality in the
language of Section 6.1. Eq.
(19)
namely compares whether
ˆ
PX1:b`/2cor ˆ
QX1:b`/2cbetter predicted the validation data.
6.3 Model selection with only training data
As discussed in Section 6.2, the KL-divergence rates can
provide a good rule–of–thumb for assessing what models are
most interesting but are biased towards models with more
parameters if one does not split the data into training and
validation data. Splitting the data is however sometimes
undesirable. Namely, if the data is sparse, the estimated models
will become even less accurate. In order to overcome this
issue, we will use information criteria that compensate the bias
incurred and use it to assess the order of the cluster process.
6.3.1 Problem setting: order of a BMC
Suppose that a sequence
X1:`
was in fact generated by some
r
th-order BMC, but that the order
r∈ {
0
,
1
, . . .}
is unknown.
We will use techniques for model selection to try and determine
rfrom the cluster sequence Y1:`=σn(X1:`).
There are two reasons for using
Y1:`
instead of
X1:`
. First,
the parametric models for higher order MCs without clusters
have a comparable number of free parameters as the sequence
length
`
itself, so estimators for the order will behave poorly.
If we look at the cluster chain instead, the number of degrees
of freedom will depend on the cluster number
K
instead of
the number of states
n
, and fortunately
Kn
. Secondly, we
can also study the robustness of the model selection procedure
depending on the clustering algorithm.
6.3.2 Order selection by minimizing an information criterion
The parameter that determines the rth-order BMC model for
Y1:`
is a transition matrix
Qr
; recall
(7)
. Note here that the
chain
Yr
1:`−r
will be constructed from the chain of clusters
Y1:`=σn(X1:`)for a fixed cluster assignment σn.
To estimate Qrone can consider the log-likelihood
L(Y1:`|Qr) :=
`−r−1
X
t=r
ln Qr
Yt−r+1:t,Yt+1 .(20)
8
The maximum-likelihood estimator associated with
(20)
is
namely given by
(ˆ
Qr,MLE)ir,j := (21)
P`−r−1
t=r
1[Yt−r+1:t=ir,Yt+1=j]
P`−r−1
t=r
1[Yt−r+1:t=ir]if
`−r−1
P
t=r
1[Yt−r+1:t=ir]>0,
0otherwise.
Here
ir, j
run over all possible sequences in [
K
]
r
and [
K
]
respectively. We denote
ˆ
Qr,MLE
for the law of an
r
th-order MC
with Kstates and transition matrix ˆ
Qr,MLE.
To determine what order
r
is the true underlying order
of the data one would like to compare
ˆ
Qr,MLE
and
ˆ
Qs,MLE
for some
s6
=
r
. As has been remarked in Section 3.2, using
(13)
for this purpose would give a biased estimator. Problems
with bias in model selection are well-known in the statistics
literature and to avoid this issue, the so-called information
criteria were developed [
42
], [
43
], [
44
], [
46
], where to a log-
likelihood a penalty term is added to correct the bias.
In our setting we need a penalty term that is sensitive to
sparse data and is also consistent. For this purpose, we have
chosen the Consistent Akaike Information Criterion (CAIC)
[43]: for model ˆ
Qr,MLE,
CAIC( ˆ
Qr,MLE) := −2 ln L(Y1:`|ˆ
Qr,MLE)
+ 2DF(K, r)1 + ln (`−r).(22)
Here,
DF
(
K, r
)the degrees of freedom in an
r
th-order MC
constrained to have fixed parameters Kand r. Specifically,
DF(K, r) = Kr(K−1) (23)
where the factor (
K−
1) is due to the fact that the rows of
Qr
are constrained to add up to one. We will utilize the CAIC to
select the right order as follows. From the collection of models
ˆ
Q0,MLE
,
ˆ
Q1,MLE
,
ˆ
Q2,MLE
,
. . .
, we may determine the order
rCAIC that minimizes the CAIC:
rCAIC := argminr∈{0,1,2,...}CAIC( ˆ
Qr,MLE).(24)
Note that lower-dimensional models are favored since the
degrees of freedom
DF
(
K, r
), and thus the penalty terms in
(22), increase exponentially in K, r.
In order to evaluate how robust the CAIC criterion is, we
will estimate the over- and underfit error probabilities with
error models and draw conclusions on the selected orders.
6.4 The shape of spectral noise for identification of alter-
native models
The methods in Sections 6.1 to 6.2 allow us to compare a BMC
to alternative models. The selection of a good alternative can
however be difficult when a more complex model than a BMC
is desirable. The method described here can aid in the selection
of an alternative model.
The method is based on a result from [
5
] which describes
the histogram of the singular values of
ˆ
N
in the asymptotic
regime
n→ ∞
under the condition that
`
= Θ(
n2
). The results
in [
6
] can further be interpreted as the statement that the
K
nonzero singular values of
E
[
ˆ
N
]correspond to the
K
largest
singular values of
ˆ
N
. In other words, all singular values except
these leading few may be interpreted as being due to the noise
ˆ
N−E
[
ˆ
N
]. The histogram of the nonleading singular values
may thus be interpreted as the shape of the spectral noise.
These results and their interpretation can guide the se-
lection of a good model. One can namely identify clusters in
the data and visually compare the associated BMC-prediction
with the observed histogram. If there is a good match, then
this may indicate that a BMC suits the data well. If there
is a discrepancy, then the nature of the discrepancy can be
informative of the properties that the alternative model should
have. It will for instance be shown in Supplement 14.4 that
a long tail can sometimes be explained using a heavy-tailed
perturbation.
We have, however, found that a strongly inhomogeneous
equilibrium distribution in the data can dominate the spectral
noise in
ˆ
N
. So long as the clustering respects the equilibrium
distribution it then follows that the observations will indeed
resemble the theory. This is an issue since it follows that, in
the case of an inhomogeneous equilibrium distribution, the
spectral noise of
ˆ
N
may not be particularly informative. In
such a case one can consider a different random matrix.
The empirical normalized Laplacian
ˆ
L
associated to the
observation sequence is element-wise given by
ˆ
Lij :=
ˆ
Nij
pPn
k=1
ˆ
NikpPn
k=1
ˆ
Nkj
if ˆ
Nij 6= 0,
0otherwise.
(25)
We argue in Supplement 14.3 that the variance of the entries of
ˆ
L
is approximately independent of the equilibrium distribution.
Consequently, we expect that the spectral noise of
ˆ
L
will
not be dominated by a possibly inhomogeneous equilibrium
distribution. A proposition describing the limiting histogram
of singular values is proved in Supplement 14.3. The precise
statement is technical but a summary may be found in
Proposition 1.
Proposition 1.
Let
X1:`
be a sample path of a BMC. If
`
=
Θ(
n2
), then for almost every
a, b ∈R
the fraction of singular
values in [
a, b
], i.e.,
n−1
#
{i
:
si
(
√nˆ
L
)
∈
[
a, b
]
}
converges in
probability as
n→ ∞
. The limit may be computed explicitly in
terms of the parameters of the BMC.
With Proposition 1, we can characterize the spectral noise
of
ˆ
L
in a BMC and use the spectrum as a tool for data expo-
ration, expectedly even in the presence of an inhomogeneous
equilibrium distribution.
7 Data sets and preprocessing
We are going to apply the clustering algorithm described
in Section 5 to different data sets. First, however, we will
introduce the data sets, specify the sequence
X1:`
, and describe
any preprocessing that we have conducted. The results of
clustering can be found in Section 8.
7.1 Sequence of codons in DNA
We will consider the OCA2 gene in human DNA, and source
the data from [
47
]. The OCA2 gene provides instructions for
making a protein located in melanocytes, which are specialized
cells that produce a pigment called melanin. It must be noted
that the clustering algorithms can be applied to any gene. We
expect similar results in other human DNA; the OCA2 gene
serves here merely as an example.
A typical string of DNA can be viewed as a sequence
composed of four possible nucleotides denoted by A, T, C and
9
G. These nucleotides are moreover naturally clustered together
in three-letter words called codons which are processed in
protein synthesis within a cell. For instance, the codon ACG
corresponds to the addition of the amino acid threonine as the
next building block of a protein. From a sequence of nucleotides
such as
TTT GTA GTT AGA TCT CCT CTA TCC et cetera,
we thus identify the sequence of codons
X1=TTT → ·· · → X8=TCC →et cetera.
The empirical frequency matrix in
(10)
is calculated, and found
to have
`
= 16
×
10
4
transitions and a state space of size
n
= 64.
7.2 Sequence of words in texts
A cleaned corpus, based on the Wikipedia datadump of Octo-
ber 2013, was downloaded from [
48
]. This cleaned corpus filters
on popularity to exclude a large amount of robot-generated
pages. Further preprocessing was standard: we removed all
punctuation and numbers, reduced all words to lower case and
reduced all words to a root word with the Natural Languages
Toolkit’s
PorterStemmer.stem()
[
49
, Section 3.6]. The 100
most visited words were pruned from the data as well as all
words with fewer than 1000 visits. This results in a vocabulary
of n= 16994 words. For example, a paragraph such as
Clustering observations can be very useful!
is converted into the sequence
X1=cluster →X2=observ → ·· · → X6=use.
Each
s
th Wikipedia page results in a sequence
Xs
1:`s
say
that is relatively short when compared to
n
. The corresponding
frequency matrix
ˆ
Ns
, recall
(10)
, is consequently sparse. We
therefore compute and work instead with
ˆ
N:= X
s
ˆ
Ns.(26)
The diagonal of the summed empirical frequency matrix in
(26)
is set to zero. Self transitions are namely common and not
particularly informative for the purposes of clustering. The
remaining total number of transitions is found to be
`≈
2
·
10
8
.
7.3 Sequence of animal positional data
GPS animal movement data, at its core, is a set
Y
=
{y1, . . . , yn}
of observation tuples
yi
= (
yi1, yi2, yi3
), in which
the elements
yi1, yi2, yi3
correspond to a latitude, a longitude,
and a timestamp, respectively. We presume that the latitude
and longitude are measured in decimal degrees.
If we were to assume that every distinct GPS coordinate
(
yi1, yi2
)of the observation tuple corresponds to a distinct
state of a BMC, then the clustering problem will turn out to
be infeasible. The reason is that there would be nearly as many
states as there are observations. We must therefore combine
GPS coordinates into states during preprocessing. Specifically,
we divide the two-dimensional region in which all GPS points
lie (a manifold) into a two-dimensional grid of squares with
width
xkm
. The number
x
is to be chosen according to the
considered data, leading to a desired amount of states. Details
on this step can be found in Supplement 15.2.
Fig. 3. A screenshot from movebank displaying the raw GPS data.
For this investigation we (arbitrarily) choose data from
the “Dunn Ranch Bison Tracking Project,” freely accessible
through [
50
, #8019591]. The data is displayed in Figure 3.
The area is approximately rectangular with length
l≈3.2 km
and width
w≈1.7 km
. For example, the tracking data of one
animal starts with the following observations:
(40.4749,−94.1129,21:00),(40.4748,−94.1130,21:15),
(40.4749,−94.1129,21:30),(40.4751,−94.1130,21:45),
(40.4749,−94.1130,22:00),(40.4749,−94.1129,22:15),
et cetera.(27)
These observations for this particular animal were registered
on October 24, 2012. The study includes movement data from
24 animals, and we concatenate all of the movement data to
create a single long observation sequence.
Some registered GPS coordinates are outside the piece of
land that we consider. This was most likely caused by some
malfunction of the GPS tracking device. We therefore exclude
all observation tuples
yi
for which
yi1/∈
(40
.
47
,
40
.
4861) or
yi2/∈
(
−
94
.
1183
,−
94
.
079121). After this pre-processing, the
single observation sequence that we obtain is of length 337003.
After preprocessing with grid size
x
=
0.04 km
(chosen by ad
hoc parameter tuning), we identify the start of the sequence:
X1=Bin 0 →X2=Bin 1 →X3=Bin 0
→X4=Bin 0 →X5=Bin 0 →X6=Bin 0 →et cetera.
We finally eliminated self-jumps such that resting animals
do not disturb our findings. From this sequence of states,
the empirical frequency matrix defined in
(10)
is calculated.
Ultimately, we were working with
n
= 3155 states and an
observation sequence of length
`
= 193134 when the self-jumps
were excluded. This gives us quite a short path;
`/n2≈
0
.
0194.
7.4 Sequence of companies with the highest daily return
Using Alpha Vantage’s API [
51
], we downloaded 20 years of
daily pricing data for every company within the S&P500 index.
The S&P500 index is a gauge for the performance of large
market capitalization U.S. equities. The index includes 500
companies and covers approximately 80% of available market
capitalization. The top constituents by index weight currently
include, for example, AAPL (Apple), MSFT (Microsoft),
AMZN (Amazon), TSLA (Tesla), and GOOGL (Alphabet).
The pricing data of the constituents turned out to not span
the same one time range. We therefore sorted the pricing data
10
files of the different constituents by file size and kept the data
for the 300 companies with the most complete data. A list of
these constituents can be found in Supplement 16.3. Next, we
determined for each constituent
i∈
[300] when the first data
entry happened,
ti
−
, and when the last data entry happened,
ti
+. We finally restricted our analysis to the time range
max
i∈[300] ti
−=: t0, t0+ 1 ...,t0+`:= min
i∈[300] ti
+(28)
for these 300 companies. It turned out that
t0≡2001–07–26
and
t0
+
`≡2021–10–22
. Days without any pricing data were
ignored, which include the weekends when the stock market is
closed. Ultimately, for every remaining day with some pricing
data, the data was in fact complete: we have all daily opening
and closing prices for each of these 300 constituents.
For
t
=
t0, . . . , t0
+
`
, let
Oi
t
denote the opening price of
company
i
’s stock on day
t
and
Ci
t
the closing price of company
i
’s stock on day
t
. We then considered the daily return of
company i’s stock on day t,
Ri
t=Ci
t
Oi
t−1and let Xt∈arg max
i∈[300] Ri
t(29)
be a company with the highest daily return on day t. Ties are
broken uniformly at random. For example, the preprocessing
output’s start looks like:
On 2001–07–26, maximizer ADI had return 0.12517.
On 2001–07–27, maximizer AES had return 0.08905.
On 2001–07–30, maximizer PVH had return 0.04478.
On 2001–07–31, maximizer HUM had return 0.09852.
On 2001–08–01, maximizer NTAP had return 0.15665.
Et cetera.
so that the sequence of observations starts with
Xt0=ADI, Xt0+1 =AES, Xt0+2 =PVH,
Xt0+3 =HUM, Xt0+4 =NTAP,et cetera.(30)
We again eliminate self-jumps, this time to reduce the effect of
profitable runs by companies that temporarily dominated the
market. That is, we remove any consecutive appearance of a
single constituent in the sequence. Note that this preprocessing
only gives us one sequence, that is furthermore relatively short
(
`/n2≈
0
.
027). Finally, we calculate the empirical frequency
matrix in (10) and proceed with the clustering algorithm.
8 Detected clusters within the data
In this section, we evaluate how well the BMC model can
approximate the structure of the different sequential data
introduced in Section 7 and if it can yield useful insights.
8.1 Sequence of codons in DNA
We consider the sequence of codons occurring in the gene
OCA2 in human DNA. Clustering on this sequence extracts
an interesting pattern displayed in Figure 4(a). The detected
clusters were:
V1=
AAA, AAG, TGT, AGT, CCT, TCT, ACT, CAG, ATT, ATG,
CAT, TAT, AAT, TTG, CTT, TGA, CTG, CAA, TGG, ATA,
TTA, AGG, TAA, ACA, TCA, CCA, AGA
V2=
CAC, GCC, CCC, TCC, ACC, GTC, CTC, TTC, ATC, TGC,
AGC, TAC, AAC, GGC, TAG, CTA, GAC
V3=
GTG, GAG, GGT, GCA, GAA, GTA, GGA, GAT, GGG, GTT,
GCT
V4=CGA, CGC, ACG, TCG, CCG, GCG, CGT, CGG
V5=TTT
8.1.1 Possible detection of codon–pair bias
We observe that all rows and columns associated with the
second-to-last community
V4
have low density. This means
that community
V4
has a small equilibrium distribution. More
interesting is the low-density block in the rows and columns
corresponding to the transitions from the second community
V2
to the third community
V3
. It appears we have rediscovered
a phenomenon known as codon–pair bias in biology [
52
], [
53
],
[
54
]. A brief examination of biology literature suggests that
there is a link between codon–pair bias and gene expression,
which has been used to engineer weakened viruses which could
potentially be used as a vaccine [55], [56], [53].
There is some evidence that codon–pair bias is nothing
more than a consequence of dinucleotide bias [
54
]. Here, the
term dinucleotide bias refers to the fact that the two-letter
pair
CG
is only used infrequently regardless of its position.
This dinucleotide bias can also explain the clusters observed
in Figure 4(a). Indeed, inspection of the clusters
V1,...,V5
reveals that nearly all codons in
V2
end with the nucleotide
C
whereas all codons in community
V3
begin with nucleotide
G
. There are a few exceptions, namely the codons TAG and
CTA in
V2
, but visual inspection of the matrix
ˆ
N
seems to
suggest that these two codons may have been misclassified.
Thus, transitions from
V2
to
V3
would give rise to the two
nucleotides
CG
on the interface. Also remark that the two
leftmost vertical low-density streaks in the block associated
with
V2
corresponds to the codons GCC and GTC which
simultaneously begin with a
G
and end with a
C
. Finally, all
codons in community
V4
contain the two nucleotides
CG
. It
thus appears that all low-density regions in the figure could be
explained as being due to dinucleotide bias. We refer to [
57
]
and the references therein for further discussions concerning
codon–pair bias, dinucleotide bias and their applications.
8.1.2 Comparing the histogram of singular values to the limiting
distribution of singular values of the inferred BMC
It appears from the reasonable clusters in Figure 4(a) that a
BMC could be an appropriate model for this dataset. Let us
now additionally verify whether the shape of the spectral noise
is consistent with a BMC. Note that the matrices
ˆ
N
and
ˆ
L
are
only 64
×
64. Consequently, they only have 64 singular values.
To get a clearer picture we split the observation sequence into
ten equally sized pieces and for each subpath we compute
the singular values. The averaged histogram over these ten
observations is compared to the theoretical BMC-prediction
associated to the clusters in Figure 4(b).
We observe a good match between the theory and the bulk
of the distribution for both
ˆ
N
and
ˆ
L
. Particularly interesting
is that the peak near zero and the triangular tail in the interval
[4
,
5]. The theoretical there matches the observed distribution
for
ˆ
N/√n
. Such features would not be predicted in a simpler
model without communities such as a matrix with i.i.d. entries.
One would then instead expect a quarter-circular law whose
density is proportional to
1
[
x∈
(0
, c
)]
√c2−x2
for some
constant
c >
0. This quarter-circular law is observed in the
empirical Laplacian
ˆ
L
suggesting that the main feature in the
spectral noise of
ˆ
N
is due to the equilibrium distribution. There
are also some rescaled singular values which escape the support
11
(a)
0 0.4 0.8 1.2 1.622.4
0
0.5
1
density
0 2 4 6 8 10 12
0
0.2
0.4
density
√n
ˆ
L
ˆ
N/√n
(b)
Fig. 4. (a) The frequency matrix
ˆ
N
when the codons are sorted by the five detected clusters. (b) Average density-based histogram of singular
values for
√nˆ
L
and
ˆ
N/√n
for the DNA sequential data in blue bars and the theoretical predictions associated with the improved clustering as the
red line. Not displayed is that each observation of
ˆ
N/√n
also has a single singular value near 40 and each observation of
√nˆ
L
has a single singular
value near 8. These extremal singular values are considered to be part of the signal, and consequently not relevant for measuring the spectral noise.
of the limiting singular value distribution. These are most-
likely associated to the signal
E
[
ˆ
N
]and should consequently
not be viewed as a part of the spectral noise.
8.1.3 Conclusion
It appears that the clustering algorithm was able to detect the
phenomena of dinucleotide bias in DNA. The spectral noise
is also consistent with that in a BMC. Moreover, a simpler
model generating a random matrix with independent and
identically distributed entries would not have sufficed to make
the prediction associated to
ˆ
N
’s distribution of singular values.
8.2 Sequence of words on Wikipedia
We consider the sequence of words obtained after the pre-
processing described in Section 7. The clustering algorithm
discussed in Section 5 was executed for
K
= 50
,
100
,
200
,
400
clusters, both with and without the improvement algorithm.
Ten improvement iterations were done whenever we also used
the improvement algorithm. A complete list of the clusters for
K
= 200 with improvement may be found in Supplement 16.2.
8.2.1 Subjective evaluation
At a first glance, the clusters that the clustering algorithm
determines appear to be good. For instance, a small cluster
with six elements has a distinctly football-related theme:
V125
contains the words champion,cup,premier,coach,footbal and
championship. The medium-sized clusters
V50
,
V51
, and
V52
respectively contain words related to public professions, units,
and warfare. That is,
V50
includes stemmed words such as
founder,deputi,formeli,mayor,bishop,meanwhil,successor,
V51
includes tonn,usd,capita,lb, and
V52
includes cavalri,
jet,helicoptr,rifl,warfar,battalion, and raid. The second-
largest cluster
V2
predominantly contains names; these include
alexandr,albrecht,gideon, and jarrett.
We further observe that the improvement algorithm yields
more balanced clusters: before the improvement algorithm the
largest three clusters have size 9192,1279 and 1126 respectively
while after improvement the sizes are 2848,1943 and 1600.
8.2.2 Performance on a downstream task
To evaluate the quality of the clusters more objectively, we
investigate the performance achieved on a downstream task as
discussed in Section 6. We specifically consider the accuracy
achieved by a document classification algorithm.
The goal in document classification is to predict the label
l
(
d
)of a document
d
given some training dataset
{
(
d0, l
(
d0
))
}
.
The document datasets that we investigate are described in
Supplement 16.2.1. For instance, the AG News dataset contains
news articles with four possible labels: World,Sports,Business,
and Sci/Tech.
Given a clustering, one can translate each document into an
K
-dimensional vector by counting the number of occurrences of
each cluster in the document; see Supplement 15.1. Thereafter,
a logistic regression model is trained to learn a mapping
from the
K
-dimensional vectors to the labels. Aside from
spectral and improvement clusters we also consider a random
clustering in which every word was assigned a cluster uniformly
at random. There were some datasets in which neither spectral
nor improvement clustering significantly outperformed the
random clustering. We consider these tests inconclusive, but
still report them in Table 4 for completeness. The performance
on the remaining datasets is displayed in Table 1.
Observe that improvement clustering typically outperforms
plain spectral clustering. Further note that in the AG News,
Yahoo! and Wiki datasets the performance increases with the
dimensionality. The gain in performance from spectral and
improvement clustering as opposed to random clustering is
there comparable with an increase of dimensionality by a factor
4. On the other hand, for Books and CMU it appears that the
performance decreases with the dimensionality although this
pattern is less clear. A possible explanation is that Books and
CMU have less training data so that overfitting may occur
when the dimensionality is large.
12
KAlgorithm AG News Yahoo! Wiki Book CMU
50 Random 48.3% 27.4% 56.9% 31.0% 67.4%
50 Spectral 66.0% 39.8% 71.1% 44.4% 69.5%
50 Improved 68.5% 40.1% 71.5% 44.7% 71.8%
100 Random 55.5% 33.3% 68.4% 30.0% 67.4%
100 Spectral 72.7% 47.2% 81.6% 45.2% 70.0%
100 Improved 76.8% 49.0% 80.1% 46.3% 70.7%
200 Random 64.0% 41.7% 80.8% 28.2% 66.8%
200 Spectral 78.2% 51.7% 85.6% 44.4% 68.7%
200 Improved 80.7% 54.7% 86.5% 43.4% 69.0%
400 Random 72.8% 49.4% 87.8% 28.9% 66.8%
400 Spectral 81.5% 56.3% 88.0% 42.1% 67.9%
400 Improved 83.1% 58.6% 89.0% 44.4% 68.4%
TABLE 1
Performance of clustering before and after improvement as measured by
accuracy in the downstream task of document classification as compared
to a random clustering. Bold added for the best-performing method.
0 5 10 15 20 25 30
0
0.2
0.4
0.6
density
012345678910
0
1
2
3
density
√n
ˆ
L
ˆ
N/√n
Fig. 5. Density-based histogram of singular values for
√nˆ
L
for the words
sequential data in blue bars and the theoretical predictions associated
with the improvement clustering with
K
= 200 as the red line. Not
visible in this figure is that both empirical distributions have long tails.
Still 9% of the singular values of
ˆ
N/√n
exceed 10 and 1% of the singular
values of √nˆ
Lexceed 30.
8.2.3 Comparing the histogram of singular values to the limiting
distribution of singular values of the inferred BMC
One may be tempted to deduce from the reasonable clusters
and the performance reported in Table 4 that the BMC is
an appropriate model for this dataset. The structure in the
spectral noise is however not as one would expect in a BMC.
Consider Figure 5 for a comparison of the empirical singular
value distribution with the theoretical predictions. Observe
that there is a good match for ˆ
Nbut a discrepancy for ˆ
L.
The fact that
ˆ
N
yields a good match can be explained as
being due to a strongly inhomogeneous equilibrium distribution
from Zipf’s law. The empirical Laplacian
ˆ
L
removes this
dominant effect after which it may be observed that the
empirical distribution has a heavy tail which is not present in
the BMC-based prediction. In Supplement 14.4 we demonstrate
by a numerical example that the discrepancy which is observed
in Figure 5 agrees precisely with the type of discrepancy which
is observed for a heavy-tailed perturbation of the BMC. The
fact that the entries of the matrices ˆ
Nand ˆ
Lare heavy-tailed
may also be verified by direct inspection.
8.2.4 Conclusion
The clustering algorithm was able to detect clusters which we
subjectively judge to be interesting. The performance on the
downstream task of document classification further indicated
that the improvement algorithm which is based on the BMC-
assumption improved the quality of the clusters. The spectral
noise indicated that there is some heavy-tailed component
in the matrix which can not be accounted for in the BMC
assumption. It is correspondingly conceivable that a different
model could incorporate the heavy-tailed nature of the entries
and extract even better clusters.
8.3 Animal movement data
We investigate now the GPS animal movement data from the
Dunn Ranch Bison Tracking Project ; recall Section 7.
8.3.1 Subjective evaluation
The results of the clustering algorithm are depicted in Figure 6.
It is subjectively evident that the clusters give more insight
than the scatter plot in Figure 3.
Observe that the clustering algorithm picks up on geo-
graphical features: all clusters are connected regions, except
for the largest two clusters 1(black dots) and cluster 2
(orange
c
’s), however the individual components are connected.
Clusters 1and 2contain the low degree states which explains
their geographical spread. For the other clusters geographical
boundaries are visible. For example, cluster 3is bounded from
below by creeks and cluster 4lies between two creeks. On
satellite imagery one can see a fence north of 7and the part
of 2(orange
c
’s) that is bordering 7and in fact, the northern
border of these two clusters follows that line.
We briefly want to address that the average rate of
transitions within each cluster is 0
.
79. The transitions that are
shown on the map are thus not representing the majority of
transitions, but only the transitions between different clusters
that occur with probability of at least 0
.
01. The cluster
transitions matrix is given in Supplement 16.1.
We summarize the findings: regions where the animals stay
or pass through are determined; barriers such as rivers, lakes
and fences can be seen; and average movement patterns of
the animals are deduced. This is actually surprising, because
the clustering algorithm identifies states by numbers and does
not use geographical information on the state labeling. The
labels of the states are in fact arbitrary to the algorithm,
states labeled e.g. 10 and 11 need not be close to each other
geographically. We remark that geographically mixed clusters
would have been a valid outcome of the algorithm. Instead,
we see that the clustering algorithm clusters states in such
a manner that their geographical positions correspond to
connected regions with clear borders.
In short: the algorithm detects geographical features and
movement patterns, and the algorithm does so based on the
sequential data alone, i.e., it captures behavior of the animals.
8.3.2 Comparing the histogram of singular values to the limiting
distribution of singular values of the inferred BMC
Figure 7 next compares the spectral noise of
(10)
and
(25)
to
the theoretical predictions for BMCs (recall Proposition 2).
Observe that with
K
= 15 clusters, the theoretical prediction
captures the general shape of the distribution, but is inaccurate
13
Fig. 6. The Dunn Ranch with highlighted water, on top of which the detected clusters are mapped. Additionally, the cluster centers are indicated
with boxes containing the cluster number and edges between the boxes indicate the transitions between clusters with probability of at least 1%.
Thicker arrows correspond to higher transition probabilities. Self-transitions and the clusters 1and 2are omitted, because they are non-informative.
for the smallest and largest singular values especially. With
more clusters,
K
= 100, the theoretical prediction for the dis-
tribution of singular values is found to predict the distribution
of singular values better across the entire range. The prediction
however remains imperfect. The peak at zero is probably linked
to the fact that there are many states with a low degree.
0 10 20 30 40 50 60
0
0.02
0.04
0.06
0.08
density
K=15
K=100
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
5
10
density
√n
ˆ
L
ˆ
N/√n
Fig. 7. Density-based histogram of singular values for
√nˆ
L
and
ˆ
N/√n
for the animal movement data in blue bars and the theoretical predictions
associated with the improvement clustering with
K
= 10 as the red line
and with K= 100 as the purple dashed line.
8.3.3 Conclusion
We conclude that a BMC is a useful model for describing animal
movement data. In fact, surprisingly, the clustering algorithm
manages to deduce underlying geographical information (such
as regions, barriers, and movement patterns) from the mere
time dependency within the observation sequence. Because of
this visuo-spatial ability, the algorithm may have a potential
use as a tool for spatial recognition.
At the same time however, we also conclude that a BMC
does not describe the underlying complex process in its entirety.
For example, the distribution of singular values depicted in
Figure 7 is not predicted perfectly. This is likely caused by the
symmetry assumption between states within a BMC, which is
at odds with the geographical structure of the data. Indeed,
if we cut the region into more but smaller clusters and thus
reduce the amount of symmetry within the BMC modeling the
observation sequence, the BMC’s prediction of the distribution
of singular values improves.
8.4 Companies with the highest daily returns
We turn now to the last analyzed dataset: the sequence of
companies with the highest daily returns. The analysis for this
dataset was particularly delicate to conduct and we ultimately
arrive at the conclusion that a 0th-order BMC could already
be sufficient to explain the found clusters. This conclusion may
appear disappointing; it namely means that the clusters may
not encode any order-dependent dynamics of the process. It
is however important for a practitioner to be able to arrive at
this conclusion when appropriate. The fact that the evaluation
methods from Section 6 are able to suggest a 0th-order BMC is
correspondingly a good feature of the methods: the comparison
method would not be informative in the alternative scenario
where one always concludes in favor of the 1st-order BMC.
The main goal of this section is hence to demonstrate how
the methods of Section 6 can be used for the comparison of
different models; even in a difficult, sparse, regime.
Let us note that there are two main reasons why this dataset
is difficult to analyse. First, the data is sparse. It namely holds
that
`/n2≈
0
.
027. The sparsity can also be observed by direct
inspection of
ˆ
N
; see Figure 12 in Supplement 16.3.3. This
sparsity makes recovery of the clusters a hard problem, even
if the data-generating-process is truly a BMC, and moreover
makes evaluation of the found clusters more difficult since the
associated confidence bounds are large. Second, it turns out
that the data contains a strong 0th-order BMC component.
This strong 0th-order component could potentially serve as a
nuisance factor and could conceal a 1st-order BMC component
14
even if it exists. This second factor may explain why the animal
movement data in Section 8.3 did not suffer from a difficult
analysis despite it being even sparser;
`animal/n2
animal ≈
0
.
0194.
8.4.1 Subjective evaluation of the clusters
After some ad hoc experimentation, we fix K= 3.
The S&P500’s factsheet labels every constituent with a
sector. The sector breakdown within the dataset is shown in
Table 5 in Supplement 16.3.1. We can use this labeling to
obtain “fingerprints” of clusters, and to compare the labeling
of the clustering algorithm against.
Such comparisons are plotted in Figure 8 for the three
largest clusters. The black bars show the relative percentages
of constituents in each sector within the clusters detected by
the spectral algorithm followed by the cluster improvement
algorithm. Observe the absence of most utilities constituents
within the 2nd and 3rd cluster; more than twice as many
are assigned to the 1st cluster than may be expected from a
random assignment. Industrial and health care constituents
are also mostly absent within the 3rd cluster. Similarly, note
the negligible number of consumer discretionary constituents
within the 1st cluster; most are assigned to the 2nd and
3rd cluster. Finally, consider that the 3rd cluster consists
for 29% out of information technology constituents. These
contents suggest that the clusters are not entirely random. The
subsequent experimentation aims to determine what type of
information has been encoded in the clusters.
As a subjective way to evaluate the meaning of the clusters,
let us inspect the relative cluster sizes
ˆαk
:= #
ˆ
Vk/n
, cluster
equilibrium distribution
ˆπ
, and cluster transition matrix
ˆp
of
the associated BMC:
ˆαT≈
0.45
0.45
0.10
,ˆπT≈
0.49
0.10
0.41
,ˆp≈
0.50 0.10 0.40
0.54 0.11 0.35
0.46 0.10 0.44
.
Note that the rows of
ˆp
are close to but not quite equal; it
namely holds that
ˆpkl ≈ˆπl
for every
k, l
. This observation may
suggest a strong 0th-order BMC component. One can however
not immediately conclude that all the deviations from constant
columns are due to noise: the data is sparse relative to
n2
but
not when compared to K2= 9.
8.4.2 Comparing against alternative models
Recall that, using validation data, we can compare the
performance of different models by the KL divergence rate
difference estimator (19). Consider the following models:
ˆ
P:
A1st-order BMC with
m
= 3 clusters determined by
the spectral algorithm followed by the improvement
algorithm.
ˆ
Q1:
A1st-order BMC with
m
= 11 clusters given by the
constituents’ sector labels.
ˆ
Q2:
A1st-order BMC with
m
= 3 clusters, determined by the
spectral algorithm.
ˆ
Q3:
A0th-order BMC with
m
= 3 clusters, determined by sort-
ing according the state’s sample equilibrium distribution
and determining clusters of equal probability mass.
ˆ
Q4:
A0th-order BMC with
m
= 3 clusters, determined by the
spectral algorithm followed by an improvement algorithm
(appropriately modified for a 0th-order BMC).
One may also wonder about the relevancy of the number of
parameters. By keeping the number of clusters fixed it namely
also follows that the 0th-degree models
ˆ
Q2,ˆ
Q3
have fewer
parameters than the 1st-degree models
ˆ
P,ˆ
Q4
. To this end,
consider for any positive integer k≥1the following model:
ˆ
Q3,k
A0th-order BMC with
k
clusters, determined by sorting
according the state’s sample equilibrium distribution and
determining kclusters of equal equilibrium probability.
Note that the degrees of freedom
DF1
(
n, K
)within a 1st-
order BMC constrained to have fixed parameters (
n, K
)equals
DF1
(
n, K
) =
n
+
K
(
K−
1), whereas the degrees of freedom
DF0
(
n, K
)within a 0th-order BMC constrained to have fixed
parameters (
n, K
)equals
DF0
(
n, K
) =
n
+
K−
1. The model
ˆ
P
therefore has
n
+ 6 degrees of freedom whereas
ˆ
Q3,k
has
n
+
k−
1degrees of freedom. In terms of number of degrees of
freedom,
ˆ
P
is thus comparable to
ˆ
Q3,7
. Observe furthermore
that
ˆ
P
allows for more inhomogeneity within the columns of
the transition matrix and less in the rows, whereas
ˆ
Q3,7
allows
for no inhomogeneity within the columns but more in the rows.
Observe in Figure 9(a) that the difference in KL divergence
rate on the validation data is positive when comparing
ˆ
P
against
ˆ
Q1
,
ˆ
Q2
, barely positive when comparing against
ˆ
Q3
,
and near-zero when comparing against
ˆ
Q4
. The 0th-degree
models
ˆ
Q3
,
ˆ
Q4
perform comparable to the 1st-degree model
P
.
Regarding the comparison with
ˆ
Q3,k
we may observe in
Figure 9(b) that the sign of the KL divergence rate difference
is probably positive for
k
= 1
,
2
,
4, possibly positive for
k
=
3
,
11
,
12 but not much, possibly negative for
k
= 6
,
7
,
8but not
much, and inconclusive for
k
= 5
,
9
,
10. The downward trend
for small
k
suggests that a strictly positive number of free
parameters in the cluster transition matrix are necessary to
accurately represent the data. Judging from the KL divergence
rate difference at
k≈
7, in which case the number of degrees of
freedom in both models are equal, it appears that the specific
freedoms allowed in
ˆ
P
give a performance comparable to that
attained by the specific freedoms allowed in ˆ
Q3,7.
8.4.3 Comparing the histogram of singular values to the limiting
distribution of singular values of the inferred BMC
Figure 9(c) depicts histograms of singular values, and the mod-
els
ˆ
P
,
ˆ
Q1
,
ˆ
Q2
,
ˆ
Q3
,
ˆ
Q4
’s corresponding theoretical predictions.
All theoretical predictions were calculated from the training
data, while the histograms of singular values were calculated
from the validation data.
Observe that all theoretical predictions give a fair descrip-
tion of the laws. Models
ˆ
P
,
ˆ
Q4
outperform models
ˆ
Q1
,
ˆ
Q2
,
ˆ
Q3
when it comes to describing the distribution of singular values
of
ˆ
Nvalidation/√n
. Observe that the empirical observations for
√nˆ
Lvalidation
as well as the predictions associated to
ˆ
P
,
ˆ
Q1
,
ˆ
Q2
,
ˆ
Q3
,
ˆ
Q4
all appear to be quarter-circular. This quarter-circular
law is consistent with our suspicion of a strong 0th-degree
model component: in a 0
th
-degree BMC, the limiting law of
√nˆ
L
is known to be quarter-circular. The peek at zero in the
empirical observations is likely due to the sparsity.
8.4.4 Conclusion
In all considered performance measures we saw that the 1st-
degree BMC model
P
performed approximately equally well as
the 0th-degree models
ˆ
Q3
,
ˆ
Q4
. The consideration of the models
ˆ
Q3,k
suggested that one further requires a certain number of
parameters to achieve sufficient model expressivity.
The sparsity of the data makes it difficult to come to a
definitive conclusion since the confidence bounds remain rather
15
Fig. 8. The relative percentages of constituents in each sector within the 300 constituents under consideration for models
ˆ
P,ˆ
Q1
and
ˆ
Q2
corresponding
to the black, blue and orange bars respectively. The left, middle, and right plots correspond to the 1st largest, 2nd largest, and 3rd largest detected
cluster, respectively. A bar’s color is more saturated when the difference in relative percentage exceeds 5% when compared to the black bars.
Observe for instance that for the black bars representing
P
the absence of consumer discretionaries in the 1st cluster is noteworthy, just like the
absence of most utilities constituents in the 2nd and 3rd cluster, as well as the absence of industrial and health care constituents in the 3rd cluster.
0 500 1,000 1,500 2,000 2,500
0
0.2
0.4
0.6
h [arb.]
KL-divergence rate difference [arb.]
Pvs Q1(1st-degree BMC, Sectors)
Pvs Q2(1st-degree BMC, Spectral)
Pvs Q3(0th-degree BMC, Degree-based)
Pvs Q4(0th-degree BMC, Improvement)
(a)
2 4 6 8 10 12 14
−0.1
0
0.1
0.2
k [arb.]
KL-divergence rate difference [arb.]
h= 600
h= 1200
h= 1800
h= 2400
(b)
0 2 4 6 8 10 12
0
0.2
0.4
Density for √nˆ
L
Model P(1st-degree BMC, Improvement)
Model Q1(1st-degree BMC, Sectors)
Model Q2(1st-degree BMC, Spectral)
Model Q3(0th-degree BMC, Degree-based)
Model Q4(0th-degree BMC, Improvement)
0 0.1 0.2 0.3 0.4 0.5
0
5
10
15
singular value [arb.]
Density for ˆ
N/√n
(c)
Fig. 9. (a) The KL divergence rate difference estimator
D
(
Xb`/2c+1:b`/2c+h
;
ˆ
PX1:b`/2c,ˆ
QX1:b`/2c
i
)on the validation data. The 95% confidence
bounds were estimated using
(44)
in Supplement 13. The mixing time was (somewhat arbitrarily) guessed to be 20 market days. (b) The KL
divergence rate difference estimator
ˆ
D
(
Xb`/2c+1:b`/2c+h,ˆ
PX1:b`/2c,ˆ
QX1:b`/2c
3,k
)for different sample path lengths
h∈N+
, and as a function of
k
. The
95% confidence bounds were estimated using
(44)
in Supplement 13. The mixing time was (somewhat arbitrarily) guessed to be 20 market days. (c)
The top and bottom figures display the singular values of the Laplacian
√nˆ
L
and the empirical frequency matrix
ˆ
N/√n
respectively. Both figures
were calculated from all n= 300 constituents and exclude the K= 3 leading singular values.
large. Still, one generally prefers models with fewer parameters.
Hence, in our opinion, a 0th-order BMC would be a suitable
model for this dataset.
9 Detected orders within the data
In this section, we consider the sequence of clusters
Y1:`
=
σn
(
X1:`
)provided by the clustering algorithm and select the
order of the MC that best fits the data. We look at model
selection described in Section 6.3 for the order in the chain
using an information criterion. We will focus on the DNA,
GPS, and the S&P500 dataset. The Wikipedia dataset is not
considered due to its impractical size, and because it does not
consist of a single sample path but rather a number of small
sample paths. We compute information criteria for all datasets,
except for when we study the power of the test. For this task we
then focus just on the DNA dataset and the S&P500 dataset.
9.1 Results
We compute (22) for r= 0,1,2,3,4of the following models:
ˆ
Qr,MLE
: The Maximum-Likelihood Estimator of an
r
th-order
MC estimated from the observation sequence Y1:`.
The result are in Table 2. We see that the magnitude of
the CAIC in Table 2 depends strongly on the observation
rDNA incr. (%) GPS(×103) incr. (%) S&P500 incr. (%)
0 432650 n.a. 960.63 n.a. 9853 n.a.
1 431502 -0.27 626.54 -34.8 9860 +0.07
2431263 -0.06 571.49 -40.5 9940 +0.81
3 435228 +0.69 1121.90 +16.8 10253 +3.1
4 458512 +5.3 9789.27 +1019 11162 +8.9
TABLE 2
The CAIC in (22) for the different datasets. Note that the relative
difference between the values pertaining to different orders is often
small. For example, the differences are less than 0
.
1% between orders 1,
2for the DNA data, and between orders 0,1for the stock market data.
This is not the case, however, with the animal data.
sequence and the number of clusters. In particular, for the
GPS dataset the differences are notable for most orders due
to the large number of clusters
K
= 15, where higher orders
become highly penalized. For the DNA dataset, the criterion
suggests that orders
r∈ {
1
,
2
}
best approximate the data.
For the S&P500 dataset, on the other hand, orders
r∈ {
0
,
1
}
appear to be the best. We expect that there is a large variance
in Table 2 and some over or underfitting the order is possible.
The criterion indicates nonetheless that the transitions of the
found clusters, except maybe for the S&P500 dataset, can be
16
better approximated by a nonzero order Markovian process.
We will now support this conclusion empirically with the error
models for the DNA and S&P500 datasets.
9.2 Evaluation of the CAIC criterion
In order to probe how significant the information criteria are,
we will use the empirical transition law
ˆ
Pr,MLE
from the original
data
X1:`
, which we remark is on the full state space [
n
]. With
ˆ
Pr,MLE
for
r∈ {
0
,
1
}
we will consider two data-generating
models with errors determined by a parameter
ε∈
[0
,
1),
and then look at the clustered process
Yr
ε
generated by these
models. The models are:
W1
ε:
A perturbed 1st-order BMC with probability distribution
ˆ
P1,MLE
and a perturbation given by a heavy-tailed 0th-
order perturbation defined in Section 3.2. In contrast
to the perturbed models described in Section 3.2, the
perturbation here is assumed to be a 0th-order MC.
W0
ε:
A perturbed 0th-order BMC with probability distribution
ˆ
P0,MLE
and a perturbation given by a heavy-tailed 1st-
order MC defined in Section 3.2.
Assume that we are interested in a particular criterion
C
: [
K
]
`+1 ×
[0
,
1]
K×K→R
such as the CAIC. Denote
Yr,ε
1:`
=
σn
(
Xε
1:`
)the cluster process if
Xε
1:`∼Wr
ε
. We will study the
robustness of the criterion by examining how often it over-
and underfits when selecting
s
for the models
ˆ
Qs,MLE
with the
clustered sequence
Yr,ε
1:`
. To study this aspect, we will consider
two targets for the CAIC and restrict to the orders
r∈ {
0
,
1
}
.
The first target is the overfit error probability
eover(ε) := PXε
1:`∼W0
ε(argminr∈{0,1}CAIC(Yr,ε
1:`) = 1),(31)
that is, the probability that the criterion selects a 1st-order
process for the chain of cluster transitions when the underlying
generating process is
ˆ
P0,MLE
and the only higher order con-
tributions come from perturbations. The second target is the
underfit error probability defined as
eunder(ε) := PXε
1:`∼W1
ε(argminr∈{0,1}CAIC(Yε
1:`) = 0),(32)
that is, the probability that the criterion selects a 0th-order
process as the best-candidate while
ˆ
P1,MLE
is the actual
underlying data-generating process.
We will look at parameters retrieved from the DNA and
S&P500 datasets. Because the S&P500 dataset is the least clear
dataset, we also consider a synthetic observation sequence. This
synthetic observation sequence is generated using the same
model
Wr
ε
as is obtained for the stock market, but will be five
times as long: 5
`SM
, where
`SM
is the length of the path of the
S&P500 dataset. We will refer to this synthetic observation
sequence as the “extended stock market model.” In this manner
we can see the effect of sparsity on the criterion robustness as
if we could have access to more data.
In Figure 10, the error probabilities as well as centered
CAIC values can be seen. We see that both the underfit
e1
(
ε
)
and overfit error
e2
(
ε
)are usually small for small
ε
. The overfit
error is, however, considerable larger for the DNA dataset
than for the S&P500 dataset. This supports the claim that the
CAIC chooses the model with fewest parameters for the same
amount of information, that is, the criterion is less prone to
overfit when the data is sparse. The underfit error is on the
contrary small for the DNA dataset, also for
ε∈
[0
.
1
,
0
.
2]. This
suggests that order selection via information criteria is robust
to misclassification error.
The case of the S&P500 dataset is especially interesting.
In Table 2, the criterion chooses
r
= 0 whereas in the
W1
ε
model in Figure 10(a)–(b), the criterion selects
r
= 1 up
to
ε∼
0
.
1. Afterwards, deviating from the BMC model by
just 1out of 10 jumps in the S&P500 dataset will make the
criterion behave similarly as in Table 2. This is also supported
by Figure 10(b), where the difference between the criterion for
r
= 0 and
r
= 1 in the S&P500 dataset takes values in [0
,
10],
which we coincidentally also see in Table 2. This suggests that
there may be a 1st-order Markovian structure in the S&P500
dataset but also a strong degree dependence (or 0th-order
process). Alternatively, the data may simply be too sparse for
the CAIC to select a suitable order. This hypothesis is also
supported by the stock market extended dataset, where model
selection with five times more data has fewer such problems.
We finally remark that looking at information criteria for
the unclustered observation sequences
X1:`
provides no useful
insights due to the large dimensionality of the models. In
particular, the CAIC criteria for the unclustered observation
sequences for order
r∈ {
0
,
1
}
can be seen in Table 3. As
the data shows, the CAIC criteria just picks the model with
smallest number of parameters. This is even more extreme in
the GPS and S&P500 datasets, where on top of large model
dimension we have sparse data.
rDNA GPS S&P500
01339.5 ×1032943 ×10354.27 ×103
1 1361.9 ×103≈1×108882 ×103
TABLE 3
The CAIC in (22) for the sequence X1,...,X`for different datasets.
9.3 Conclusion
The main takeaways of this section are as follows:
•
Model selection is feasible if we use the clustered sequence
Y1:`
=
σn
(
X1:`
)obtained after the clustering algorithm.
This namely reduces the amount of free parameters of the
models considerably.
•
For the DNA and GPS datasets, the CAIC selects a
nonzero order MCs.
•
For the S&P500 dataset the CAIC shows that the data
is too sparse for selecting a specific order with certainty.
However, there are indications that the values obtained in
the CAIC for the S&P500 dataset are consistent with a 1st-
order BMC model with a strong 0th-order MC baseline.
10 Conclusions
We have found that using a BMC model for exploratory
data analysis in unlabeled observation sequences does in
fact produce useful insights. Although there is no guarantee
that there are clusters or that a cluster structure is actually
revealing of a ground truth model we can still evaluate the
clusters and associated models. The DNA example uncovered
known, nontrivial and biologically relevant structure. In the
text-based example, the improvement algorithm enhanced
performance on down-stream tasks and the spectral noise
17
05·10−20.1 0.15 0.2
0
0.2
0.4
0.6
0.8
ε
eunder(ε)
Stocks Extended
Stocks
DNA
(a)
05·10−20.1 0.15 0.2
−50
0
50
ε
Centered Averaged CAIC(ε)
Stocks r = 0
Stocks r = 1
Stocks Extended r = 0
Stocks Extended r = 1
(b)
05·10−20.1 0.15 0.2
0
0.2
0.4
0.6
0.8
1
ε
eover(ε)
Stocks Extended
Stocks
DNA
(c)
Fig. 10. (a) Underfit error probability
eunder
(
ε
)in
(32)
depending on the perturbation
ε
for the DNA, S&P500 dataset, and extended S&P500
datasets assuming the data-generating process is
W1
ε
for their respective datasets. Note the effect of reducing sparsity for the extended S&P500
dataset in reducing the underfit error probability. The plot suggests that the probability of underfitting for small ε, while nonzero, is less than the
probability of selecting the correct order. (b) Centered average of CAIC for the S&P500 dataset and S&P500 dataset extended datasets assuming
the data-generating process is
W1
ε
. Note that the criterion cannot already select the correct order for the S&P500 dataset after
ε∼
0
.
08. However,
by increasing the dataset size 5-fold, it can very robustly select the correct order even for larger error
ε
. We remark that the empirical variance is an
order of magnitude too large to be represented in the plot (
Var
(
CAIC
(
Y1:`
(
ε
))
'O
(10
2
)). Despite this very large variance, the selection process is
robust for small error
ε
. (c) Overfit error probability
eover
(
ε
)depending on the perturbation
ε
for the DNA, S&P500 dataset, and extended S&P500
datasets assuming the data-generating process is
W0
ε
. Compared to the underfit error probability, the CAIC is robust at selecting a model with lower
order, provided there is enough information. In particular, for the DNA, the 1st-order perturbation becomes dominant in the criterion fairly quickly
after ε > 0.05. In all tests the number of repetitions was R= 30.
identified the heavy-tailed nature of some model violations.
The animal movement example uncovered features which could
not have been extracted from only the GPS coordinates. For
the daily best performing stocks in the S&P500, we saw that a
0th-order BMC can describe its statistical aspects, but there
are indications that a 1st-order BMC is also a suitable model.
Acknowledgments
This publication is part of the project Clustering and Spec-
tral Concentration in Markov Chains (with project number
OCENW.KLEIN.324) of the research programme Open Com-
petition Domain Science – M which is (partly) financed by the
Dutch Research Council (NWO).
The authors also acknowledge support by the European
Union’s Horizon 2020 research and innovation programme
under the Marie Skłodowska-Curie grant agreement no. 945045,
and by the NWO Gravitation project NETWORKS under grant
no. 024.002.003.
We would finally like to thank Mike van Santvoort for useful
discussions while writing this paper.
References
[1]
A. Zhang and M. Wang, “Spectral state compression of Markov
processes,” IEEE Transactions on Information Theory, 2019.
[2]
J. Sanders, A. Proutière, and S.-Y. Yun, “Clustering in Block
Markov Chains,” The Annals of Statistics, 2020.
[3]
S. Singh, T. Jaakkola, and M. Jordan, “Reinforcement Learning
with soft state aggregation,” Advances in Neural Information
Processing Systems, 1994.
[4]
R. Ortner, “Adaptive Aggregation for Reinforcement Learning
in Average Reward Markov Decision Processes,” Annals of
Operations Research, 2013.
[5]
J. Sanders and A. Van Werde, “Singular value distribution of
dense random matrices with block Markovian dependence,” arXiv
preprint arXiv:2204.13534, 2022.
[6]
J. Sanders and A. Senen-Cerda, “Spectral norm bounds
for Block Markov Chain random matrices,” arXiv preprint
arXiv:2111.06201, 2021.
[7]
Y. Duan, T. Ke, and M. Wang, “State aggregation learning
from Markov transition data,” Advances in Neural Information
Processing Systems, 2019.
[8]
S. Bi, Z. Yin, and Y. Weng, “A low-rank spectral method for
learning Markov models,” Optimization Letters, 2022.
[9]
Z. Du, N. Ozay, and L. Balzano, “Mode clustering for Markov
jump systems,” in 2019 IEEE 8th International Workshop on
Computational Advances in Multi-Sensor Adaptive Processing
(CAMSAP), 2019.
[10]
Z. Zhu, X. Li, M. Wang, and A. Zhang, “Learning Markov models
via low-rank optimization,” Operations Research, 2021.
[11]
C. Gao, Z. Ma, A. Y. Zhang, and H. H. Zhou, “Achieving optimal
misclassification proportion in Stochastic Block Models,” The
Journal of Machine Learning Research, 2017.
[12]
S. Fortunato, “Community detection in graphs,” Physics Reports,
2010.
[13]
S. Zolhavarieh, S. Aghabozorgi, and Y. W. Teh, “A Review
of Subsequence Time Series Clustering,” The Scientific World
Journal, 2014.
[14]
S. Aghabozorgi, A. S. Shirkhorshidi, and T. Y. Wah, “Time-series
clustering – A decade review,” Information Systems, 2015.
[15]
T. W. Liao, “Clustering of time series data – A survey,” Pattern
Recognition, 2005.
[16]
J. Lin, E. Keogh, and W. Truppel, “Clustering of Streaming Time
Series is Meaningless,” in Proceedings of the 8th ACM SIGMOD
workshop on Research Issues in Data Mining and Knowledge
Discovery, 2003.
[17]
T. Rakthanmanon, E. J. Keogh, S. Lonardi, and S. Evans, “Time
series epenthesis: Clustering time series streams requires ignoring
some data,” in 2011 IEEE 11th International Conference on Data
Mining, 2011.
[18]
S. Rodpongpun, V. Niennattrakul, and C. A. Ratanamahatana,
“Selective subsequence time series clustering,” Knowledge-Based
Systems, 2012.
[19]
A. Gionis and H. Mannila, “Finding Recurrent Sources in
Sequences,” in Proceedings of the seventh annual international
conference on Research in computational Molecular Biology, 2003.
[20]
A. Ultsch and F. Mörchen, ESOM-Maps: Tools for clustering,
visualization, and classification with Emergent SOM. 2005.
[21]
F. Mörchen, A. Ultsch, and O. Hoos, “Extracting interpretable
muscle activation patterns with time series knowledge mining,”
International Journal of Knowledge-based and Intelligent Engi-
neering Systems, 2005.
[22]
O.-A. Maillard and S. Mannor, “Latent bandits.,” in Interna-
tional Conference on Machine Learning, 2014.
[23]
M. G. Azar, A. Lazaric, and E. Brunskill, “Sequential transfer
18
in Multi-Armed Bandit with finite set of models,” Advances in
Neural Information Processing Systems, 2013.
[24]
L. Li, T. J. Walsh, and M. L. Littman, “Towards a unified theory
of state abstraction for MDPs,” in AI&M, 2006.
[25]
H. Y. Ong, “Value function approximation via low-rank models,”
arXiv preprint arXiv:1509.00061, 2015.
[26]
K. Azizzadenesheli, A. Lazaric, and A. Anandkumar, “Reinforce-
ment Learning in rich-observation MDPs using spectral methods,”
arXiv preprint arXiv:1611.03907, 2016.
[27]
Y. Yang, G. Zhang, Z. Xu, and D. Katabi, “Harnessing structures
for value-based planning and Reinforcement Learning,” arXiv
preprint arXiv:1909.12255, 2019.
[28]
A. A. Markov, “Primer statisticheskogo issledovaniya nad tekstom
“evgeniya onegina”, illyustriruyuschij svyaz’ispytanij v cep’,”
Izvestiya Akademii Nauk, 1913.
[29]
C. Manning and H. Schutze, Foundations of Statistical Natural
Language Processing. 1999.
[30]
H. Almagor, “A Markov analysis of DNA sequences,” Journal of
Theoretical Biology, 1983.
[31]
R. Jorre and R. Curnow, “A model for the evolution of the
proteins: Cytochrome c: mammals, reptiles, insects,” Biochimie,
1976.
[32]
S. Robin, F. Rodolphe, and S. Schbath, DNA, words and models:
statistics of exceptional words. 2005.
[33]
I. Gialampoukidis, K. Gustafson, and I. Antoniou, “Time opera-
tor of Markov chains and mixing times. Applications to financial
data,” Physica A: Statistical Mechanics and its Applications,
2014.
[34]
D. Zhang and X. Zhang, “Study on forecasting the stock mar-
ket trend based on stochastic analysis method,” International
Journal of Business and Management, 2009.
[35]
J. van der Hoek and R. J. Elliott, “Asset pricing using finite state
Markov chain stochastic discount functions,” Stochastic Analysis
and Applications, 2012.
[36]
R. S. Mamon and R. J. Elliott, Hidden Markov Models in Finance.
2007.
[37]
P. Billingsley, “Statistical methods in Markov chains,” The Annals
of Mathematical Statistics, 1961.
[38]
M. Menéndez, D. Morales, L. Pardo, and K. Zografos, “Statis-
tical inference for finite Markov chains based on divergences,”
Statistics & Probability Letters, 1999.
[39]
L. Pardo, Statistical inference based on divergence measures. 2018.
[40]
M. Menéndez, J. Pardo, and L. Pardo, “Csiszar’s-divergences for
testing the order in a Markov chain,” Statistical Papers, 2001.
[41]
F. Barsotti, A. Philippe, and P. Rochet, “Hypothesis testing for
Markovian models with random time observations,” Journal of
Statistical Planning and Inference, 2016.
[42]
H. Akaike, “A new look at the statistical model identification,”
IEEE Transactions on Automatic Control, 1974.
[43]
H. Bozdogan, “Model selection and Akaike’s Information Cri-
terion (AIC): The general theory and its analytical extensions,”
Psychometrika, 1987.
[44]
D. Anderson and K. Burnham, “Model selection and multi-model
inference,” Second. NY: Springer-Verlag, 2004.
[45]
S. Kullback and R. A. Leibler, “On information and sufficiency,”
The Annals of Mathematical Statistics, 1951.
[46]
J. Ding, V. Tarokh, and Y. Yang, “Model selection techniques:
An overview,” IEEE Signal Processing Magazine, 2018.
[47]
National Library of Medicine, “OCA2 melanosomal transmem-
brane protein homo sapiens (human).” https://www.ncbi.nlm.
nih.gov/gene/4948, 2021. Accessed in October 2021, RefSeq
Accession NC_000015.10.
[48]
B. Wilson, “The Unknown Perils of Mining Wikipedia.”
https://www.lateral.io/resources-blog/the-unknown-perils-of-
mining-wikipedia, June 2015.
[49]
S. Bird, E. Klein, and E. Loper, Natural language processing with
Python: analyzing text with the natural language toolkit. 2009.
[50]
D. L. Stephen Blake, Randy Arndt, “Movebank.”
https://www.movebank.org/cms/webapp?gwt_fragment=
page=studies,path=study8019591, 2017. Accessed: 2022-08-16.
[51] Alpha Vantage Co, “Stock data API,” 2021.
[52]
G. A. Gutman and G. W. Hatfield, “Nonrandom utilization of
codon pairs in Escherichia coli,” Proceedings of the National
Academy of Sciences, 1989.
[53]
J. R. Coleman, D. Papamichail, S. Skiena, B. Futcher, E. Wimmer,
and S. Mueller, “Virus Attenuation by Genome-Scale Changes in
Codon Pair Bias,” Science, 2008.
[54]
D. Kunec and N. Osterrieder, “Codon Pair Bias Is a Direct
Consequence of Dinucleotide Bias,” Cell Reports, 2016.
[55]
B. Irwin, J. D. Heck, and G. W. Hatfield, “Codon Pair Utilization
Biases Influence Translational Elongation Step Times,” Journal
of Biological Chemistry, 1995.
[56]
G. Moura, M. Pinheiro, R. Silva, I. Miranda, V. Afreixo, G. Dias,
A. Freitas, J. L. Oliveira, and M. A. Santos, “Comparative context
analysis of codon pairs on an ORFeome scale,” Genome Biology,
2005.
[57]
A. Alexaki, J. Kames, D. D. Holcomb, et al., “Codon and codon-
pair usage tables (CoCoPUTs): facilitating genetic variation
analyses and recombinant gene design,” Journal of Molecular
Biology, 2019.
19
Supplementary material of “Detection and Evaluation of Clusters within Sequential Data”
11 Pseudo-code describing the clustering procedure
Algorithm 1: Spectral clustering algorithm.
Input: n, K and ˆ
N.
Output: Cluster assignment ˆ
V1,...,ˆ
VK
1ˆ
NΓ←Trim( ˆ
N);
2ˆ
R←K-rank approximation of ˆ
NΓ;
3ˆ
V1,...,ˆ
VK←K−means([ ˆ
R, ˆ
RT]);
The spectral clustering in Algorithm 1 is used to obtain a good initial estimate for the clusters. A k-means algorithm is
used along a
K
-rank approximation of
ˆ
N
(or
ˆ
NΓ
for the trimmed version of
ˆ
N
) to yield an initial guess for the clusters. It can
be proved, however, that this step yields a number of misclassified states that is sublinear in
n
but not of constant order [
69
].
A second step is then required to attain exact recovery. In Algorithm 2, we see that a procedure similar to a likelihood ratio
maximization is used to improve the cluster assignment. With this extra step it can be proven that the misclassified states will
be order constant in expectation.
Algorithm 2: Cluster improvement algorithm (for 1st-order BMCs)
Input: n,K,`,ˆ
Nand initial cluster assignment guess ˆ
V1,...,ˆ
VK.
Output: New cluster assignment ˆ
V0
1,...,ˆ
V0
K
1for a←1to Kdo
2ˆπa←ˆ
Nˆ
Va,[n]/`,ˆαa←#ˆ
Va/n;
3ˆ
V0
a← ∅;
4for b←1to Kdo
5ˆpa,b ←ˆ
Nˆ
Va,ˆ
Vb/ˆ
Nˆ
Va,[n];
6end
7end
8for x←1to ndo
9c←argmaxl∈[K]PK
k=1ˆ
Nx, ˆ
Vkln(ˆpl,k ) + ˆ
Nˆ
Vk,x ln(ˆpk,l /ˆαl)−`
n
ˆπl
ˆαl;
10 ˆ
V0
c←ˆ
V0
c∪ {x};
11 end
12 Robustness of the clustering procedure to model violations
Recall that the asymptotic consistency of the clustering procedure has been theoretically studied in [
69
] under the assumption
that the data-generating process is a BMC. In this section we aim to study the robustness of the clustering procedure to violations
of this model assumption. That is, we investigate the performance of the clustering procedure when the data-generating process
is not actually a BMC. We study two main measures of performance. First, in Supplement 12.1, we consider the number of
misclassified states. Second, in Supplement 12.2, we consider the approximation error in a parameter estimation problem where
the objective is to estimate the true transition matrix Pof a Markovian data-generating process which need-not be a BMC.
The first measure of performance requires that the notion of misclassification is sensible even though the data-generating
process is not a BMC. To this end we restrict ourselves to models where communities are still well-defined. More precisely, we
consider the perturbed BMC model which was defined in Section 3.2 and assign as ground-truth communities those of the
BMC-kernel which was used to construct the perturbed model. Recall that the definition of a perturbed BMC requires to specify
the nature of the perturbation kernel ∆. The following kernels are used for this purpose to model different types of model
violations:
(i)
Uniform Stochastic: The matrix ∆is sampled uniformly at random in the set of stochastic matrices. This is accomplished
by sampling each row independently from a Dirichlet(1/n, . . . , 1/n)distribution.
(ii)
Degree 0: Fix some
π1, . . . , πn>
0with
Pn
i=1 πi
= 1 and let ∆
ij
=
πj
for all
i, j ∈
[
n
]. We construct the
πi
by sampling
independent exponential random variables e1, . . . , en∼Exponential(1) and normalizing πi=ei/(Pn
j=1 ej).
(iii)
Heavy Tailed: Let
X
be a random matrix whose entries
Xij
are i.i.d. positive random variables with a heavy-
tailed distribution. The kernel ∆is then found by normalizing the rows in order to achieve a stochastic matrix
∆ := diag(PjXij )−1n
i=1X. We sample the heavy-tailed entries Xij from a Zipf distribution with exponent s= 3/2.
(iv) Sparse: Consider constants d > 0and c > 0and construct a random matrix X=A+cJ where Ais the adjacency matrix
from a directed Erdös–Rényi random graph with average outgoing degree
d
and
J
is a constant matrix
Jij
= 1
/n
. The
20
kernel ∆is then found by rescaling the rows in order to achieve a stochastic matrix ∆ =
diag
(
PjXij
)
−1n
i=1X
. We take
d= 5 and c= 0.1.
In our subsequent experimentation we take
n
= 2
m
to be an even integer. The BMC which is perturbed is chosen to have two
equally-sized clusters (K= 2) and cluster transition matrix given by
p=0.6 0.4
0.4 0.6.
12.1 Misclassification ratio for perturbed BMCs
This section concerns the number of misclassified states when clustering on a perturbed BMC model. Recall that we chose
the BMC model to have two equally-sized clusters which means that we may pick the cluster assignment map to be given by
σn
(
i
) = 1 +
1
[
i > n/
2]. Let
ˆσn
: [
n
]
→ {
1
,
2
}
be an estimated cluster assignment which is output by the clustering procedure.
Then, the misclassification ratio Eis defined as
E:= 1
nmin
ρ∈S2
#{v∈[n] : σn(v)6= (ρ◦ˆσn)(v)}.(33)
Here S2denotes the set of permutations of {1,2}.
Recall from
(9)
that the parameter
ε
of the perturbed BMC measures the fraction of transitions which are affected by the
perturbation. In other words,
ε
measures the strength of the perturbation. The estimated expected misclassification ratio
E
[
E
]is
displayed as a function of the perturbation level
ε
for a numerical experiment in Figure 2(a). Up to
ε≈
0
.
1the algorithm succeeds
in recovering the exact cluster assignment for all four models. The exact number will naturally depend on the parameters of
the BMC which was perturbed and will consequently be different in different contexts. At any rate, we conclude from this
experiment that the algorithm appears to be robust with regards to small to medium-sized model violations.
The observation that some model violations can be tolerated may be understood theoretically in terms of the construction of
the algorithm. This robustness is namely natural at the level of the spectral step of the algorithm. Consider that in a perturbed
BMC one has the following decomposition:
ˆ
NPerturbed =E[ˆ
NBMC]+(E[ˆ
NPerturbed]−E[ˆ
NBMC]) + ( ˆ
NPerturbed −E[ˆ
NPerturbed])
=: E[ˆ
NBMC] + EPerturbation +ENoise.
The sampling noise
ENoise
is small in operator norm relative to
E
[
ˆ
NBMC
]when the sample path is sufficiently long. It may
further be expected that
EPerturbation
is small in operator norm whenever the perturbation level
ε
is small. Now recall that
spectral step in the algorithm relies on singular value decomposition to compute a rank-
K
approximation. The purpose of this
rank-
K
approximation, when the process is truly a BMC, is to separate the sampling noise
ENoise
from the low-rank signal
E
[
ˆ
NBMC
]. In a small perturbation of a BMC the singular value decomposition will however also regard
EPerturbation
as an error
term. Consequently, for small perturbations, the spectral step has the beneficial effect that it separates the perturbative error
EPerturbation from the low-rank signal E[ˆ
NBMC].
12.2 Bias–variance tradeoff for parameter estimation in a perturbed BMC
It may occur in some cases that one is not interested in the clusterings themselves but rather views them as a means to an end.
Consider the scenario where one desires to estimate the transition kernel of a Markovian process which need not be a BMC.
Assume that one has prior reason to suspect that there could be some underlying clusters in the data but also that there could
be parts of the dynamics which do not respect the clusters. In such a case a perturbed BMC would be a suitable model for the
data. Let us emphasize that one is here not intrinsically interested in the BMC-component
PBMC
but rather desires to estimate
the ground-truth
PTrue
:= (1
−ε
)
PBMC
+
ε
∆. It could however be the case that one can exploit the underlying clusters to
improve the performance of estimation.
Assume that one knows the number of underlying clusters
K
and has access to a sample path
Xε
0, . . . , Xε
`
of length
`
of a
perturbed BMC. Let
ˆ
N
also denote the associated empirical frequency matrix. A natural general-purpose estimator for the
transition matrix, which does not rely on the existence of clusters, is given by the empirical transition matrix
ˆ
P
(
`
). The entries
of the empirical transition matrix are given by
ˆ
PEmpirical(`)ij :=
ˆ
Nij
Pn
k=1
ˆ
Nik ,if ˆ
Nij 6= 0
0,if ˆ
Nij = 0.
(34)
Another estimator may be found by first computing a clustering
ˆ
V1,...,ˆ
VK
. One can then hope that, since
PTrue ≈PBMC
for
ε≈0, it would be sufficient to consider an estimator ˆ
PBMC for PBMC whose entries are given by
ˆ
PBMC(`)ij :=
1
#ˆ
Vˆσn(j)Px∈ˆ
Vˆσn(i),y∈ˆ
Vˆσn(j)
ˆ
Nx,y
PK
m=1 Px∈ˆ
Vˆσn(i),y∈ˆ
Vm
ˆ
Nx,y
,if Px∈ˆ
Vˆσn(i),y∈ˆ
Vˆσn(j)
ˆ
Nx,y 6= 0
0,if Px∈ˆ
Vˆσn(i),y∈ˆ
Vˆσn(j)
ˆ
Nx,y = 0.
(35)
21
Finally, for comparison we also consider the following trivial estimator which does not even use the data
ˆ
PUniform(`)ij =1
n.
We measure the performance of these estimators as a function of the length of the sample path using the expected estimation
error:
R∗(`) := E[kPTrue −ˆ
P∗(`)k]where ∗∈{Empirical,BMC,Uniform}.(36)
Here, k·kdenotes the operator norm kMk= supkvk2=1 kM vk2.
We conduct a numerical experiment with a state space of size
n
= 1000 and a heavy-tailed perturbation model of perturbation
strength
ε
= 0
.
05. Figure 2(b) displays estimated values of the expected estimation error
R∗
(
·
)as a function of the length
`
of the sample path. A number of different regimes may be identified. First, the regime where the sample path is very short
meaning that
`≈
10
4
. Here the empirical estimator
ˆ
PEmpirical
and the BMC estimator
ˆ
PBMC
are both unable to outperform the
trivial estimator
ˆ
PUniform
. The empirical estimator even performs significantly worse than the trivial estimator in this regime.
Second, the regime where sample path is medium-sized meaning that
`≈
10
5
. Here the clustering procedure succeeds and
ˆ
PBMC
becomes the best-performing estimator. Finally, the regime where the sample path grows long meaning that
` >
10
6
.
Here the empirical estimator becomes the best-performing estimator. These different regimes can be understood in terms of a
bias–variance tradeoff. Namely, consider that for short to medium-sized sample paths the BMC estimator
ˆ
PBMC
has significantly
less variance than the empirical estimator
ˆ
PEmpirical
due to depending on fewer parameters. This decreased variance is the
dominant consideration for the approximation error in this regime. On the other hand, for long sample paths both estimators
ˆ
PBCM and ˆ
Phave low variance and the bias incurred by the approximation PTrue ≈PBMC becomes dominant.
13 Confidence bounds when estimating D(T;P,Q)
We here state a concentration inequality from which we deduce the confidence interval in
(45)
. Recall that these confidence
intervals are used in Figure 9. The proof is based on a result from [67] whose assumptions we first verify.
Assume that the true process
{Xt}t≥0
generating the sequential data
X1, . . . , X`
is a MC, which need not be time-
homogeneous. Let us refer to {Xt}t≥0’s law as T. The mixing time of {Xt}t≥0is defined as
τmix := mint≥1 : d(t)≤1
2,(37)
where
d(t) := max
1≤i≤`−tsup
x,y∈[n]
dTVT[Xi+t=· | Xi=x],T[Xi+t=· | Xi=y].(38)
Here, dTV denotes the total variation distance:
dTVT[Xi+t=· | Xi=x],T[Xi+t=· | Xi=y]:= 1
2X
z∈[n]T[Xi+t=z|Xi=x]−T[Xi+t=· | Xi=y].(39)
We claim that the MC of transitions
{EX,t}t≥0
, where
EX,t
:= (
Xt, Xt+1
), then has mixing time at most
τmix
+ 1. Indeed,
observe that for any t≥τmix + 1,x1, x2, y1, y2∈[n]and 1≤i≤`−t−1,
1
2X
z1,z2∈[n]|P[EX,i+t= (z1, z2)|EX,i = (x1, x2)] −P[EX,i+t= (z1, z2)|EX,i = (y1, y2)]|
=1
2X
z1,z2∈[n]
P[Xi+t+1 =z2|Xi+t=z1]P[Xi+t=z1|Xi+1 =x2]−P[Xi+t=z1|Xi+1 =y2](40)
=1
2X
z1∈[n]]P[Xi+t=z1|Xi+1 =x2]−P[Xi+t=z1|Xi+1 =y2]≤1
2.(41)
Here, the Markov property was used to conclude
(40)
. The fact that
P
(
Xi+t=1
=
· | Xi+t
=
z1
)defines a probability distribution,
together with the assumption that t≥τmix + 1 and the property that d(t)is nonincreasing in t, was used to arrive at (41).
Now suppose that we are given two MCs with fixed transition matrices
P
and
Q
, whose laws we will refer to as
P
and
Q
,
respectively. Assume furthermore that
maxi,j∈[n]|ln
(
Pi,j /Qi,j
)
| ≤ δ
for some
δ >
0. For any two sample paths
X1, . . . , X`
and
Y1, . . . , Y`, it then holds that
|ˆ
D(X1, . . . , X`;P, Q)−ˆ
D(Y1, . . . , Y`;P, Q)| ≤ 2δ
`
`−1
X
t=1
1[EX,t 6=EY,t ].(42)
Consequently, [67, Corollary 2.10] applied to the MC {EX,t }t≥0yields the desired concentration inequality:
P|ˆ
D(X0, . . . , X`;P,Q)−D(T;P,Q)|> t≤2 exp −t2`2
18δ2(τmix + 1) .(43)
22
In conclusion: if we are given two MCs with fixed transition matrices
P
and
Q
for which
maxi,j∈[n]|ln Pi,j /Qi,j |>
0, together
with an estimate for τmix, we can then construct for z∈[0,1] a100(1 −z)% confidence intervals of size
cz:= 1
`max
i,j∈[n]ln Pi,j
Qi,j r18(τmix + 1) ln 2
z.(44)
This is to say that
PhD(T;P,Q)∈ˆ
D(X0, . . . , X`;P,Q)−cz,ˆ
D(X0, . . . , X`;P,Q) + czi≥1−z. (45)
14 Shape of the spectral noise
Recall that it was stated in Section 6.4 that the spectral noise in
ˆ
N
can be dominated by an inhomogeneous equilibrium
distribution. It was further claimed that the Laplacian
ˆ
L
does not suffer from this issue. The main goal in this section is to argue
that this claim is true.
Some preliminary notation and concepts are introduced in Supplement 14.1 after which a theoretical result concerning
the limiting singular value distribution of
ˆ
L
is established in Supplement 14.2. A model with an inhomogeneous equilibrium
distribution is introduced in Supplement 14.3. The claim that
ˆ
L
can also detect violations to the model assumptions in the
presence of an inhomogeneous equilibrium distribution is verified in Supplement 14.4 by a simulation experiment.
14.1 Preliminaries
The empirical singular value distribution
νM
of a matrix
M∈Rn×n
with singular values
s1
(
M
)
≥. . . ≥sn
(
M
)is the probability
measure on R≥0defined by
νM(A) := 1
n#{i∈[n] : si(M)∈A}(46)
for every measurable set
A⊆R
. A sequence of random probability measures
{µn}n≥1
on the real line is are said to converge
weakly in probability to a probability measure
µ
if for every continuous bounded function
f
:
R→R
it holds that
Rf
d
µn
converges weakly in probability to
Rf
d
µ
. The symmetrization of a probability measure
µ
on the positive real line
R≥0
is the
probability measure µsym on Rgiven by
µsym(A) := 1
2µ({a:a∈A, a ≥0}) + µ({−a:a∈A, a ≤0})(47)
for any measurable
A⊆R
. Note that
µ
can be recovered from its symmetrization since for any measurable
A⊆R≥0
it holds
that
µ(A)=2µsym(A\ {0}) + µsym ({0}).(48)
The Stieltjes transform of a probability measure
µ
is the analytic function
s
:
C+→C−
given by
s
(
z
) =
R
1
/
(
z−x
)d
µ
(
x
).
Here,
C+
:=
{z∈C
:
Im
(
z
)
>
0
}
denotes the upper half-plane and
C−
:=
{z∈C
:
Im
(
z
)
<
0
}
denotes the lower half-plane.
The Stieltjes inversion formula [
59
, Theorem B.8] allows one to recover
µ
from its Stieltjes transform: for any continuity points
a<bof µ,
µ([a, b]) = −1
πlim
ε→0+Zb
a
Im(s(x+√−1ε))dx. (49)
14.2 Limiting law of singular value distribution of the Laplacian ˆ
L
Fix some positive integer
K≥
1and a transition matrix
p∈RK×K
of an ergodic MC on [
K
]. Denote
π∈
[0
,
1]
K
for the
equilibrium distribution of the MC associated to
p
. For every
n≥
1consider a partition
V1∪. . . ∪ VK
= [
n
]of the state space
into
K
nonempty groups
Vi
. The subsequent results are concerned with the asymptotic regime where
n→ ∞
. We here assume
that there are α1, . . . , αK>0such that #Vi=αin+o(n)and PK
i=1 αi= 1.
Proposition 2.
Let
ˆ
L
be the empirical normalized Laplacian associated to a sample path
X1, . . . , X`
of the above BMC. Assume
that as
n
tends to infinity it holds that
`
=
λn2
+
o
(
n2
). Then, the empirical singular value distribution
ν√nˆ
L
converges weakly in
probability to a compactly supported probability measure
ν
on
R≥0
. Moreover, the symmetrization
νsym
has Stieltjes transform
s
(
z
) =
PK
i=1 αi
(
ai
(
z
) +
aK+i
(
z
))
/
2where
a1, . . . , a2K
are the unique analytic function from
C+
to
C−
such that the following
system of equations is satisfied
ai(z)−1=z−
K
X
j=1
λ−1π(j)−1αjpij aK+j(z),(50)
ai+K(z)−1=z−
K
X
j=1
λ−1π(i)−1αjpj,iaj(z)(51)
for i= 1, . . . , K.
23
The proof of Proposition 2 is similar to the proof of [
70
, Theorem 1.2] which is there given below [
70
, Proposition 4.7]. The
intermediate [
70
, Lemma 4.4(ii)] should however be replaced by Lemma 1 below, and the role of [
70
, Equation (22)] is taken over
by Lemma 2 below.
Lemma 1. Let ΠX∈[0,1]ndenote the equilibrium distribution of the BMC, and define
ˆ
Q:= diag((`+ 1)ΠX)−1/2(ˆ
N−E[ˆ
N]) diag((`+ 1)ΠX)−1/2.(52)
Assume that
ν√nˆ
Q
converges weakly in probability to some probability measure
ν
on
R≥0
. Under the assumptions of Proposition 2,
it then holds that ν√nˆ
Lconverges weakly in probability to ν.
Proof. Consider the following notation:
Cn:= diag((`+ 1)ΠX)−1/2E[ˆ
N] diag((`+ 1)ΠX)−1/2,
Dn,l := diag n
X
k=1
ˆ
Nikn
i=1−1/2
diag((`+ 1)ΠX)1/2,
Dn,r := diag((`+ 1)ΠX)1/2diag n
X
k=1
ˆ
Nkj n
j=1−1/2
.(53)
Observe that
ˆ
L
=
Dn,l ˆ
QDn,r
+
Cn.
Furthermore,
maxn
i=1|
(
`
+ 1)
−1
Π
−1
X,i Pn
k=1 ˆ
Nik −
1
|
converges to zero in probability by
[
70
, Corollary 6.11]. Since
x7→
1
/√x
is continuous in the neighborhood of 1and the operator norm of a diagonal matrix is the
maximal value on its diagonal, it follows that kDn,l −Id kop converges to zero in probability.
Note that transitions coming into state
i
are almost in bijection with the outgoing transitions out of state
i
. The only possible
exceptions occur when i=X1or i=X`. This is to say that for every i
n
X
k=1
ˆ
Nik −
n
X
k=1
ˆ
Nkj ≤2.(54)
Hence, using that (
`
+ 1)Π
X,i
= Θ(
n
)and the fact that we already know that
maxn
i=1|
(
`
+ 1)
−1
Π
−1
X,i Pn
k=1 ˆ
Nik −
1
|
converges to
zero in probability, it follows that
maxn
i=1|
(
`
+ 1)
−1
Π
−1
X,i Pn
k=1 ˆ
Nki −
1
|
converges to zero in probability. By the continuity of
1/√xnear 1we may now also conclude that kDn,r −Id kop converges to zero in probability.
By two applications of [70, Lemma 6.8.(iii)] we conclude that ν√nDn,l ˆ
QDn,r converges weakly in probability to ν.
Further, by the fact that the BMC starts in equilibrium it holds that
rank
(
E
[
ˆ
N
])
≤K
. Hence, using the general fact that
rank(AB)≤rank(A)for any two matrices A, B of compatible size, we find that
rank(√nCn)≤rank(E[ˆ
N]) ≤K. (55)
An application of [70, Lemma 6.8.(ii)] now yields the desired result, since ν√nˆ
L=ν√n(Dn,l ˆ
QDn,r+Cn).
Lemma 2. Under the assumptions of Proposition 2 and with notation as in Lemma 1 it holds that as ntends to infinity
max
ij=1,...,nVar[ ˆ
Qij ]−λ−1π(σn(j))−1pσn(i)σn(j)=o(1).(56)
Proof.
This is immediate from [
70
, Corollary 4.6] using the fact that
Var
[
cX
] =
c2Var
[
X
]for any real random variable
X
and
scalar c∈R.
14.3 Inhomogeneous equilibrium distribution: Degree-corrected block Markov chain (DC-BMC)
In order to allow for an inhomogeneous equilibrium distribution we consider the following model which is inspired by the
analogous degree-corrected stochastic block model for communities in graphs with inhomogeneous degrees. Let
K≥
1be a
positive integer, consider a transition matrix
p∈RK×K
for an ergodic MC on [
K
]and equip the state-space with a group-
assignment map
σn
: [
n
]
→
[
K
]. As was the case for BMCs we define the groups
V1,...,VK
by
Vi
=
{v∈
[
n
] :
σn
(
v
) =
i}
.
Assume moreover that every group
Vi
is equipped with a probability distribution
µi
:
Vi→
[0
,
1]. Then, a MC
Xt
on [
n
]is called
a DC-BMC if
P(Xt+1 =j|Xt=i) = pσn(i)σn(j)µσn(j)(j).(57)
Recall that in a BMC it holds that conditional on
σn
(
Xt
) =
k
for
t >
1the observation
Xt
is chosen uniformly at random in
the cluster
Vk
. In a DC-BMC it instead holds that conditional on
σn
(
Xt
) =
k
the observation
Xt
is chosen from the cluster
Vk
according to the probability measure µi.
Note that the usual BMC is recovered when all
µi
are taken to be the uniform measures on their respective groups
Vi
.
Furthermore, by taking a larger number of groups
e
K
=
MK
one can still approximate a DC-BMC model by a BMC-model.
This is to say that one can use the additional clusters to separate each true group
Vi
of the DC-BMC model into
M
subgroups
e
Vi,1,·· · ,e
Vi,M such that µiis approximately constant on every e
Vi,j .
24
We expect that the limiting measure for
ν√nˆ
L
in a DC-BMC is equal to the limiting measure of a BMC with the same cluster
transition matrix
p
and the same cluster ratios
αi
provided that
maxi=1,...,n µσn(i)
(
i
) = Θ(1
/n
)and
mini=1,...,n µσn(i)
(
i
) =
Θ(1
/n
). If this conjecture is true then the limiting measure does not depend at all on the
µi
since these do not occur in
Proposition 2. The insensitivity to the
µi
allows to ensure that the spectral noise in
ˆ
L
is not dominated by an inhomogeneous
equilibrium distribution. The main reason for this conjecture is that the proof of Proposition 2 implicitly relies on a universality
principle of [
70
] which states that the limiting singular value distribution in a (sufficiently well-behaved) random matrix only
depends on the variance of its entries. We will namely subsequently argue that the variance profile of
ˆ
L
is approximately
independent of distributions µk; see (64).
Denote
π
for the cluster equilibrium distribution of the Markov chain associated to
p
and note that the state equilibrium
distribution of a DC-BMC is then given by Π
X,i
=
π
(
σn
(
i
))
µσn(i)
. Correspondingly, up to approximation errors on the order of
√`,
n
X
k=1
ˆ
Ni,k ≈#{t= 1, . . . , ` :Xt=i} ≈ `ΠX,i =`π(σn(i))µσn(i).(58)
Therefore, by the continuity of x7→ √xit may be expected that
v
u
u
t
n
X
k=1
ˆ
Nik ≈q`π(σn(i))µσn(i)(i)and v
u
u
t
n
X
k=1
ˆ
Nkj ≈q`π(σn(j))µσn(j)(j).(59)
The variance of a sum of independent random variables is equal to the sum of the variances. If we write
ˆ
Ni,j
=
P`−1
t=1 1
[
Xt
=
i, Xt+1
=
j
]then these summands are not independent but nonetheless we do expect the variance to approximately distribute
over the sum. Therefore, it is expected that
Var[ ˆ
Ni,j ]≈(`−1) Var[1[Xt=i, Xt+1 =j]] = (`−1)π(σn(i))µσn(i)(i)pσn(i)σn(j)µσn(j)(j).(60)
By combining (59) and (60) it follows that
Var[ ˆ
Lij ]≈Varˆ
Nij
p`π(σn(i))µσn(i)(i)p`π(σn(j))µσn(j)(j)(61)
≈`π(σn(i))pσn(i)σn(j)µσn(i)(i)µσn(j)(j)
(`π(σn(i))µσn(i)(i))(`π(σn(j))µσn(j)(j)) (62)
=`−1π(σn(j))−1pσn(i)σn(j)(63)
≈(λn2)−1π(σn(j))−1pσn(i)σn(j)(64)
Observe that this agrees with the variance profile which was used in Lemma 2.
14.4 Simulation experiment
We here measure the sensitivity of the spectral noise in
ˆ
L
and
ˆ
N
to violations of the model assumptions in the presence of an
inhomogeneous equilibrium distribution by means of a perturbation to a DC-BMC model, defined in Supplement 14.3. The
experiment is done by means of a simulation.
For the DC-BMC model we take
K
= 2 and we consider clusters of size #
V1
= #
V2
= 1000. The cluster transition matrix
p
is defined by
p11
=
p22
= 0
.
8and
p12
=
p21
= 0
.
2. The probability measures
µi
are found for
i
= 1
,
2by sampling a vector of
i.i.d. exponentially distributed random variables of rate 1and normalizing this vector to have L1-norm equal to 1.
We may further consider a perturbation of this DC-BMC. Let ∆be a heavy-tailed transition matrix as defined in Section 3.2
and denote
Pperturbed
:= 0
.
95
PDC-BMC
+ 0
.
05∆. Recall that the DC-BMC component
PDC-BMC
can be approximated with
a BMC with more groups but note that such an approximation is not possible for ∆. Consequently, we may think of the
decomposition for
Pperturbed
as splitting the ground truth model into a main part which can be approximated with a BMC and
a second part which requires a different explanation.
In the subsequent experiment we consider observation sequences
{Xt}t=1,...,`
and
{Yt}t=0,...,`
with length
`
= 2000
2
from
the DC-BMC-model and the perturbed model respectively. The singular value densities of the
ˆ
N
-matrix constructed from
X
and
Y
are displayed in Figure 11 (a). Also displayed in Figure 11 is the theoretical prediction corresponding to a BMC found by
executing the clustering algorithm with
e
K
= 4 clusters. Recall that taking
e
K > K
allows for the algorithm to split the groups to
ensure that
µi
is roughly constant. We observe that the empirical densities associated to the DC-BMC-model and the perturbed
model look quite similar apart from the fact that the perturbed model has a longer tail. The theoretical prediction associated
to the BMC further provides an acceptable match for the DC-BMC model but there is also some small part of the tail of the
DC-BMC model which escapes the theoretical prediction. Here the issue regarding the sensitivity of
ˆ
N
becomes apparent: there
are at least two plausible explanations why in empirical data some part of the tail may escape the support of the theoretical
density. A first explanation is the presence of a perturbation ∆which we view as a violation of the model assumptions. A second
explanation is that the ground truth is a DC-BMC and one should take
e
K
to be larger. These two explanations are difficult to
distinguish from the spectral noise in
ˆ
N
. In the current example one may argue that the amount of the tail which escapes the
25
0 1 2 3 4 5 6 7
0
0.5
1
1.5
2
density
Singular value density of ˆ
N/√n
Perturbed
DC-BMC
Theory
(a)
02468
0
0.2
0.4
0.6
density
Singular value density of √nˆ
L
Perturbed
DC-BMC
Theory
(b)
Fig. 11. (left) The singular value density of
ˆ
N
for a simulated DC-BMC (blue bars) as compared to the theory (blue line) and a perturbed model
(red bars). (right) The singular value density of ˆ
L.
theoretical density is larger in the perturbed model. Such a judgement regarding the size of the tail is however undesirable since
it is vague and subjective.
The singular value density of
ˆ
L
for the two sample paths
X1:`
and
Y1:`
is displayed in Figure 11 (b). Here we observe that the
empirical densities of the DC-BMC model and the perturbed model are severely different. The theoretical prediction associated
to the BMC moreover provides a good match to the DC-BMC model as was expected by the conjecture of Supplement 14.3. We
conclude that the spectral noise in the Laplacian
ˆ
L
is more sensitive to violations of the model assumptions than the spectral
noise in ˆ
N, particularly in the presence of an inhomogeneous equilibrium distribution.
15 Extra tools for some of the different data sets
15.1 The cf-idf vectorization method
Let
V
denote the vocabulary, which is a set of words, and fix a clustering
σn
:
V →
[
K
]. In order to turn documents into
vectors we make use of a straightforward modification of the common term frequency-inverse document frequency document
vectorization method, we refer to the modification as cluster frequency-inverse document frequency (Cluster Frequency–Inverse
Document Frequency). Let
D
denote a collection of documents. Every document
d∈ D
is here viewed as a sequence of words
meaning that d∈Q`d
t=1 Vfor some `d>0. For every cluster k∈[K]and document d∈ D we define
cf(k, d) := ln 1+#{t= 1, . . . , `d:σn(di) = k},
idf(k, D) = ln PK
k=1(1 + Pd∈D #{t= 1, . . . , `d:σn(di) = k}
1 + Pd∈D #{t= 1, . . . , `d:σn(di) = k},
cf-idf(k, d) := cf(k, d)·idf(k, D).
Observe that
cf-idf
(
·, d
)assigns a
K
-dimensional vector to any document
d∈ D
. A word which contributes multiple times
in a document also yields a higher contribution of the corresponding cluster to
cf
(
k, d
). Finally, words which are not in the
vocabulary of the clustering are not counted at all in our processing.
15.2 Algorithm for creating a grid for animal movement data
Given a desired grid side length
x
(in kilometers), we can calculate the (regional) latitudinal and longitudinal degree corresponding
to that distance
x
(assuming the earth to be a near perfect sphere). Latitudinal differences amount to the same distance in
kilometers. One degree of latitude is 1/360th of the earths circumference (
40 075 km
) so one degree of latitude is equivalent to
110.574 km. Accordingly, xkm are represented by x/110.574 degrees of latitude. One degree of longitude however represents a
different amount of kilometers, depending on the latitude: one degree of longitude is
|
111
.
320
·cosdd
(
latitude
)
|
km and
x
km
are represented by
|x/111.320 ·cosdd(latitude)|
degrees of longitude, at a specific latitude. Here
cosdd
is the cosine acting on
decimal degrees. We are making use of a small angle approximation, which breaks down near the poles. The process of creating
squares and assigning the units yito them is done in the following Algorithm 3.
The algorithm assigns all the GPS points to squares. This procedure may not work well if the animal is moving close to the
poles, because the small angle approximation is not justified anymore. This procedure is also not particularly well suited for
regions where the earth is not behaving like a sphere, for example if the animal in moving on mountains.
26
Algorithm 3: GPS data to transitions between squares of side length xkm
Input: Grid size x, GPS data (yi1, yi2, yi3)i=1,...,n.
Output: Sequence of transitions between states s= (s1, . . . , sn)
1jlat =by11 ·110.574/xc
2jlong =by12 ·111.320 · | cosdd(jlat ·110.574/x)|/xc
3S1:= jlat ·x
110.574 ,(jlat + 1) ·x
110.574 ×jlong · | x
111.320·cosdd(jlat ·110.574/x)|,(jlong + 1) · | x
111.320·cosdd(jlat ·110.574/x)|
4S ← {S1}
5s←(1)
6for indexcoord ←1to ndo
7for indexsquare ∈ {1,2, . . . #S} do
8added ←False
9if (yindexcoord1, yindexco ord2)∈Sindexsquare then
10 s←(s1, . . . , sindexcoord−1,indexsquare)
11 added ←True
12 end
13 if added = False then
14 jlat =byindexcoord1·110.574/xc
15 jlong =byindexcoord2·111.320 · | cosdd(jlat ·110.574/x)|/xc
16 S#S:=
jlat ·x
110.574 ,
(
jlat
+ 1)
·x
110.574 ×jlong ·| x
111.320·cosdd(jlat ·110.574/x)|,
(
jlong
+ 1)
·| x
111.320·cosdd(jlat ·110.574/x)|
17 S ← {S1, . . . , S#S}
18 s←(s1, . . . , sindexcoord−1,#S)
19 end
20 end
21 end
16 Raw data of, and extra material on, some of the datasets
16.1 Transition matrix for bison clusters
Below is the cluster transition matrix for the improvement clustering depicted in Figure 6, the numbers are rounded to the
second decimal place.
0.82 0.05 0 0.01 0.02 0.01 0.01 0.01 0.02 0.02 0 0.01 0.01 0 0
0.07 0.76 0.02 0.03 0.05 0 01 0 0.01 0 0 0.02 0.02 0 0.01
0.01 0.05 0.87 0.03 0 0 0 0 0 0 0.03 0 0 0 0
0.02 0.06 0.03 0.85 0.03 0 0 0 0 0 0 0 0 0 0.01
0.04 0.07 0 0.02 0.77 0 0 0 0 0 0 0 0 0 0.1
0.04 0 0 0 0 0.88 0.02 0 0 0 0 0 0.03 0.02 0
0.03 0.01 0 0 0 0.01 0.86 0 0.01 0 0 0 0.04 0.03 0
0.06 0 0 0 0 0 0 0.84 0.07 0 0 0.01 0 0 0
0.09 0.01 0 0 0 0 0.01 0.06 0.72 0.01 0 0.1 0 0 0
0.06 0 0 0 0 0 0 0 0.01 0.79 0.1 0.03 0 0 0
0.01 0 0.04 0 0 0 0 0 0 0.09 0.83 0.02 0 0 0
0.03 0.06 0 0 0 0 0 0.01 0.1 0.04 0.02 0.73 0 0 0
0.03 0.06 0 0 0 0.04 0.05 0 0 0 0 0 0.77 0.05 0
0 0 0 0 0 0.05 0.07 0 0 0 0 0 0.14 0.72 0
0.02 0.03 0 0.01 0.31 0 0 0 0 0 0 0 0 0 0.63
16.2 Groups of words for improvement with 200 groups
16.2.1 Document classification datasets
We here describe the datasets which are used to construct Table 1 and report on some other datasets where our findings are
inconclusive in Table 4.
16.2.1.1 AG News.: This dataset provided by [
71
] consists of tuples (
x, y, z
)where
x
is the title of a news article,
y
is a description of the news article and
z
is an assigned class. There are here four possible classes which
z
can take as values
namely World,Sports,Business and Sci/Tech. For each such class the dataset contains precisely 30 000 training samples and 1
900 testing samples. In our processing we concatenated
x
and
y
into a single string and the task is to predict the class label
z
based on this string.
27
16.2.1.2 Yahoo!.: This dataset provided by [
71
] contains questions and answers from Yahoo! answers. The dataset
consists of tuples (
x, y1, y2, z
)where
x
is a question,
y1, y2
are answers to this question and
z
is category to which the question
belongs. It can here also occur that the question has fewer than two answers in which case
y1
or
y2
is the empty string. There
are ten possible classes which
z
can take as values namely Society & Culture,Science & Mathematics,Health,Education &
Reference,Computers & Internet,Sports,Business & Finance,Entertainment & Music,Family & Relationships and Politics
& Government. For each such class the dataset contains precisely 140 000 training samples and 5 000 testing samples. In our
processing we concatenated x,y1and y2into a single string and the task is to predict the class label zbased on this string.
16.2.1.3 Wiki.: This dataset comes from the DBPedia ontology project [
65
] and the precise version used here is
constructed by [
71
]. The dataset consists of tuples (
x, y, z
)where
x
is a title of a Wikipedia page,
y
is the abstract of the page
and
z
is the category to which the page belongs. There are 14 possible classes which
z
can take as values. For each such class the
dataset contains precisely 40 000 training samples and 5 000 testing samples. In our processing we did not use the title
x
so the
task is to predict the class label zbased on the abstract y.
16.2.1.4 Book.: This dataset is constructed based on books from Project Gutenberg and their genres are assigned on
GoodReads, the dataset was obtained from [
68
]. The dataset contains tuples (
x, y, z
)where
x
is the tile of a book,
y
is the full
text of this book and
z
contains a set of genres. We only retained those data points for which
z
is a set with a single element
from one of the following six categories: cookbooks,fantasy,horror,politics,religion or science-fiction. We further randomly
selected 2 000 training samples which left 387 samples for testing. In our processing we did not use the title
x
so the task is to
predict the genre zbased on the text y.
16.2.1.5 CMU.: The CMU Book Summary Dataset contains plot summaries for books which are extracted from
Wikipedia by [
60
]. The dataset contains tuples (
x, z
)with
x
a plot summary and
z
the category to which the book belongs.
We retain all datapoints whose category
z
occurs at least 50 times which leaves us with two genres namely Fantasy and
Science-Fiction. We randomly select 1 138 datapoints for training which leaves us with 380 testing samples. The task is to
predict the genre zbased on the summary x.
16.2.1.6 20news.: This dataset contains newsgroup postings for 20 different newsgroups which are collected by [
64
]. The
dataset is accessed using the function
fetch_20newsgroups
from
sklearn.datasets
. The dataset contains tuples (
x, z
)where
x
is a message sent to the newsgroup and
z
is the label of the newsgroup. There are 20 possible classes which
z
can take as
values. There are 11 314 training samples and 7 532 testing samples. The task is to predict the newsgroup
z
given the message
x
.
16.2.1.7 Spam.: This dataset contains text messages which are either legitimate or spam, collected by [
63
], [
62
], [
61
].
The dataset is accessed from [
58
] and contains tuples (
x, z
)where
x
is a text message and
z
is a label indicating if the message is
spam. The possible values for
z
are spam or ham. The task is to predict
z
given the message
x
. Unfortunately, due to a mistake,
the experiment is executed without splitting the dataset in training and testing samples. This means that the 4 179 available
samples are used both during training and testing. Splitting in testing and training would however not change the inconclusive
conclusion from Table 4 and another experiment with a test-train split is not executed.
16.2.1.8 Reuters.: The Reuters RCV1 corpus [
66
] consists of a collection of news stories and was accessed using
nltk.download(ŕeuters´
)
. The dataset contains tuples (
x, z
)with
x
a news article and
z
the category to which it belongs.
There are 58 possible values for
z
. There are 6 577 training samples and 2 570 testing samples. The task is to predict the category
zbased on the text in x.
Algorithm 20news Spam Reuters
Random K= 50 23.2% 86.4% 65.4%
Spectral K= 50 23.0% 86.1% 63.0%
Improved K= 50 25.2% 86.1% 63.3%
Random K= 100 31.0 % 86.7% 67.7%
Spectral K= 100 31.1% 86.9% 66.2%
Improved K= 100 33.6% 87.2% 68.9%
Random K= 200 38.0% 87.2% 68.4%
Spectral K= 200 36.2% 87.0% 69.0%
Improved K= 200 40.2% 87.2% 70.8%
Random K= 400 44.7% 87.6% 87.8%
Spectral K= 400 41.4% 87.7% 88.0%
Improved K= 400 43.9% 87.7% 89.0%
TABLE 4
Results for performance on document classification where neither method significantly outperformed a random clustering.
16.2.2 Detected groups
Here are the detected groups when using the cluster improvement algorithm:
28
V1=mauser, blackout, wari,yak, sprite, puff, nightlif, capitalis, vhf, shroud, athena, featurelength, workflow, fright, grasshopp, misunderstood, aeroplan, farreach, prequel, ascii,
veterinarian, heyday, metalwork, timeout, nod, cavern, nf, utilitarian, chevi, ting, aphrodit, unsatisfactori, pieti, inund, heist, cl, fullfledg, autopsi, intang,deregul, hyphen,
hdtv, gild, majest, nasti, discretionari, computer, ambival, invinc, ide, rt, radiohead, brainwash, slur, teaser, pl, indepth, outag, pak, rebirth, pun, barksdal, crypt, outtak,
crosscultur, blaster, spit, simplist, amt, bn, bd, bikini, dea, miscarriag, reenact, makeshift, synergi, uninterrupt, surrealist, toolkit, pervas, shini, dismal, dizzi, spire, dm,
lancia, hadrian, viacom, foal, hippi, bonnet, subplot, cfa, poseidon, inhuman, ecstasi, drawer, subaru, diminut, til, amiga, beggar, yoke, twinengin, redefin, stomp, giraff,
elisa, preambl, servitud, ridership, thoroughbr, miser, lingua, medusa, unreal, gl, delinqu, garuda, equit, earmark, tesco, §, nb, bogi, dod, sx, impromptu, balconi, fastbal,
ingam, coerciv, adjunct, carib, embezzl, disrespect, smuggler, bitch, freestand, slipper, netscap, textual, vp, glam, highestr, falsifi, facetofac, shadi, yamaha, cradl, sceptic,
londonbas, weari, utopian, sigmund, contenti, counterfeit, medley,vigilant, weakest, superimpos, fg, retel, solstic, vibrant, tapestri, martian, illustri, lander, reevalu, kneel,
involuntari, jug, hl, dingo, eel, mn, wealthier, earner, affection, learnt, mermaid, tempest, nz, rhapsodi, astrophys, wiki, vaudevil, wager, leaflet, dazzl, approx, bloodsh,
ode, rung, shovel, lorri, indiscrimin, proudli, xs, ata, pastim, bane, dar, unidentifi, usda, climber, idiom, sabl, gogo, purs, klan, threeday, withhold, remington, cling,
shouldnt, agoni, delin, applaus, plagiar, toughest, meta, mingl, profan, tc, interdict, requiem, shutdown, participl, synth, jab, meme, misl, gaia, hightech, buoy, contriv,
despis, sceneri, mimick, labyrinth, larval, fsa, panelist, tangibl, ns, geophys, simplif, geforc, selfhelp, mv, amulet, hurdl, midsiz, delimit, underag, animos, hegel, goofi,
afi, preoccupi, sinner, cheetah, ict, waiver, panda, timet, ate, wand, kar, levit, foreshadow, voucher, batsmen, strabo, interscop, styliz, charisma, exploratori, hyundai,
kitten, reaper, redress, tabul, terrif, vindic, warranti, hitch, manx, blink, hiro, boomerang, fantasia, hieroglyph, gloss, galley, jumper, remit, industrialis, idiot, safari,
crunch, linger, wrc, macaqu, emplac, biker, xxx, retort, uh, fao, carp, endus, playground, michelin, viewership, meteor, fw, sf, toon, greenpeac, flamingo, sleepi, ordinarili,
overton, fd, gamer, pluto, shill, primal, rattl, tg, kawasaki, northrop, ser, unseen, cola, backfir, tricki, confound, slick, cfr, purportedli, knuckl, erp, amish, shrew, cad,
deterr, sco, ju, spaceship, godzilla, bitten, supervillain, intrud, thunderbird, boogi, mammoth, hg, tt, annot, sway, quip, splinter, videotap, vj, namepl, scarf, mf, teleport,
panorama, treacher, downplay, aircrew, p edigre, brahman, loki, gmt,cordial, gal, souvenir, asiat, limp, virtuou, rebuk, barrag, unfavor, extort, exhort, kde, vr, manhunt, hiss,
memorabilia, rsa, conglomer, twodoor, woodwork, expend, gauntlet, leech, acp, stallion, elv, appendix, artefact, dime, prophesi, rambler, anteced, expropri, overtur, curli,
bm, richer, wellreceiv, devour, oc, expressli, glyph, gull, reemerg, swimsuit, puppi, nu, stumbl, overrid, ub, narcissist, selfdestruct, ontolog, geni, tekken, nec, mya, payabl,
threequart, thug, utterli, latenight, fallout, worthwhil, cpm, superpow, swiftli, eco, jellyfish, usn, joystick, spoon, grim, gimmick, irc, clarif, figurin, caregiv, hesh, nearer,
dsm, laughter, hum, slider, hannabarbera, easiest, sg, caricatur, honesti, humankind, calypso, constru, irrespect, utmost, deceit, dislodg, antic, finalis, subconsci, surfer,
alterc, maze, azur, ska, packard, sari, conduc, sar, reconnect, stagnat, undead, starscream, sculpt, », poach, psp, geopolit, seaplan, fallaci, untitl, unbalanc, sticker, fiasco,
dentistri, intimaci, stormi, wc,ui, outcrop, friez, triplet, bald, vertigo, bot, courtship, sup erb, rl, ansi, sash, footwear,interrel, gemston, doorway, cynic, evoc, kodak, insecur,
alchemi, disobey, aac, devalu, neo, cohort, longestrun, hairstyl, jtwc, stasi, ipo, bh, descriptor, oratori, atf, porcelain, foolish, briberi, vase, fireplac, nourish, electronica,
institution, widest, cisco, allnew, cute, seduct, dummi, msdo, sweater, rss, witti, mallet, mazda, unfit, snare, oem, anoint, raspberri, scorn, highaltitud, debtor, habitu,
peril, rockstar, reckless, sturgeon, combo, theyv, unravel, townspeopl, bf, wildli, offlin, bleak, chameleon, dogg, lsd, videogam, pon, fret, spaciou, spaniel, allround, garnet,
livelihood, info, savior, kelvin, wither, arson, twa, lte, lowbudget, taker, ova, attende, lamborghini, stalem, sikorski, funerari, harem, pillow, forfeit, middleearth, embroid,
cv, interdepend, underestim, lawless, seamless, sag, shrapnel, thee, sci, plummet, rollsroyc, pythagorean, olympu, expound, incens, centralis, ipcc, airtoair, rake, pantomim,
queer, atv, vaniti, nsa, repercuss, asp, transpos, internship, domesday, hypnosi, encycl, lifeboat, hibern, cloak, hyena, lager, sic, unsolv, worldli, chomski, vulcan, stealth,
pep, blasphemi, entrepreneuri, msc, paw, burlesqu, nuisanc, eta, backyard, dumb, freemason, clearer, dreamcast, icbm, heartbreak, lego, sewn, shun, decrypt, miscellan,
feral, sarcast, toru, gunmen, omiss, lad, forefront, timbr, cdrom, dread, authorit, brink, feign, plutarch, oct, squat, gunfir, grecoroman, naiv, stagger, toast, derid, antler,
apa, loophol, misdemeanor, tunic, tnt, cpr, issuanc, magnolia, fearless, preposit, nitro, hid, ua, rioter, nr, chiropract, dagger, notoc, mist, pixar, kosher, enron, encor, carnat,
thirdperson, gunman, cliché, minion, unreason, palladium, ticker, overtli, republish, notif, vacanc, punit, kraft, b osch, salsa, tracker, diy, geek, adjourn, sup ercomput,
harp, goblin, hatchback, tub, whichev, eyebrow, octagon, unspecifi, folli, bombardi, questionnair, pf, undevelop, bop, hardtop, regal, haplogroup, salesman, harrier, quak,
willingli, okay, crumbl, salient, infrequ, bailout, britannica, choreographi, shit, npc, phobia, reborn, groundwork, octopu, rum, fric, airship, gr, hone, luger, humil, homo,
onesid, echelon, viet, wardrob, gazel, nonn, opel, crossbord, unlicens, locust, catchi, revolution, cherish, rh, façad, inconclus, exalt, hex, towel, mummi, hostess, prowess,
monograph, centaur, mozilla, bidder, deepwat, moog, impass, lifes, partak, aristotelian, unheard, fad, gw, ornat, seren, rhino, rove, awak, motorist, chilli, underdog,
downgrad, illiter, apprenticeship, gadget, californian, sportsman, makeov, dissatisfact, eman, chopper, isa, wikileak, coalesc, deepest, fullscal, tiein, wig, pathologist,
melodrama, empower, centerpiec, telepath, lite, protract, deu, minigam, robber, twentytwo, jeopardi, buff, sodomi, unsur, valiant, mute, powerless, amg, stump, pygmi,
volley, lavish, naa, energ, penc, afterlif, ax, dogmat, weasel, playlist, apparel, moos, ascertain, bi, oss, deliver, epitaph, «, newslett, klingon, snatch, raccoon, unbroken,
iliad, trivia, disillus, ol, flick, dude, twoweek, solitud, montag, uncut, howl, everlast, tl, phonograph, beforehand, rp, aol, proverb, crucifixion, audiovisu, hb, topless, sutra,
overweight,retro, toad, spar, distinctli, minimalist, jerk, laps, sear, license, synthesis, goon, ntsc, arbitrarili, drape, wrought, borderlin, selector, necklac, mileag, reap, lick,
wrapper, nymph, orc, peerreview, pip, destabil, hurri, courtesi, biospher, fax, reassur, surreal, therein, bra, cock, statur, handi, sentri, upsid, intermediari, sensual, iucn,
publicis, acdc, tranquil, glanc, biplan, eloqu, backlash, focuss, dismount, coloss, extracurricular, widescreen, topdown, roar, technicolor, pictori, quiz, hyster, neapolitan,
oneman, rebroadcast, polygami, underscor, vedanta, funki, shorthand, interdisciplinari, tamper, spelt, pdp, perch, reexamin, jingl, voc, subpoena, exhum, kettl, elektra,
culprit, hallway, ital, silo, lovecraft, divest, flamenco, budgetari, fuzzi, almighti, assail, decoy, aptitud, septuagint, turntabl, impal, underwrit, prepaid, intro, leve, dice,
glare, nurtur, nirvana, gunfight, readership, indigo, maniac, concoct, wow, minaret, pelt, miracul, excurs, transnat, behold, witchcraft, sleeper, mirag, mercedesbenz, fest,
personifi, lastminut, daimler, stud, dada, limousin, baffl, platon, confuciu, symposium, objectori, cx, whiski, parrot, troublesom, swarm, biometr, seabird, preclud, lunat,
speedi, sunglass, attic, merlin, closeup, sober, sha, resuppli, categoris, blight, maximis, rg, inr, unfamiliar, payoff, devoid, bonus, acrobat, mash, rook, nyse, utensil, coercion,
soyuz, twig, champagn,enigmat, imparti, colossu, mindset, allig, detractor, prepro duct, hump,deadliest, immor, skinni, tuner, irresp ons, odysseu, vo, hitter, cybertron, geo,
ara, backstori, sh, righteous, tester, disorgan, kingship, ipa, explanatori, thrash, selfcontain,o o, nk, annuiti,af, handson, resel, themat, electrif, twentyon, scam, superhuman,
compuls, dissatisfi, clinician, onslaught, indign, disdain, sublim, shove, handler, kippur, payout, panoram, mana, habea, hedgehog, joker, pg, totem, myriad, straighten,
allegor, urgenc, calf, outgo, quilt, acacia, apprais, subgenr, curtail, pol, pup, unicorn, decri, misti, leibniz, mardi, autist, galact, vibe, collag, junip, symbolis, breakout,
embellish, propens, dire, shortfal, rudimentari, paralyz, meticul, riddl, machinegun, snp, fi, lupin, quorum, disparag, fetish, masquerad, monologu, oar, copul, dl, theyd,
spec, highdefinit, unnot, bernoulli, lowercas, dx, slump, countercultur, barbecu, impregn, inaccess, intellect, hoop, ppp, windmil, junk, anu, catchphras, wilt, isoiec, hideout,
¥, lash, curtiss, terra, tighter, someday, trebl, brawl, pandora, behindthescen, stag, pejor, existenti, debit, kc, cyclop, firefox, nvidia, grudg, epithet, fuell, mela, ancillari,
wrongli, †, lastli, guillotin, firstgener, sadli, gtr, solemn, roadsid, ur, shopper, oneshot, finder, slew, tramp, sl, mutil, ri, linnaeu, hoist, gorgeou, esteem, boar, cider, sled,
whistleblow, lowpow, handmad, surrog, electro, whereupon, peugeot, abstain, lavend, endear, instil, verizon, priestli, embroil, horsesho, tenyear, monolith, obvers, outset,
limbo, feroci, capcom, cowl, shaker, wreckag, implicitli, bracelet, homeown, optimu, twentythre, tripod, inocul, goodi, uplift, rejoic, sideway, overdub, jackpot, aclu, ee,
chore, permeat, sevenyear, psychiatri, tranc, immacul, codex, bsa, cbi, mono, newfound, sparrow, nightli, cessna, coca, hospic, priestess, advert, nil, monstrou, ridden,
lute, anthrax, yam, unhealthi, luthor, cosmopolitan, directv, superstit, mink, catfish, captor, imax, dwindl, sprung, veda, grotto, indispens, tauru, uhf, dt, grail, parabl,
gopher, bayonet, longev, enix, pepsi, honorif, barter, centauri, inquest, dy, maharishi, aton, pud, pulsar, mic, outcast, poincaré, reinvent, bk, voiceless, voiceov, pumpkin,
britney, underlin, nike,zo diac, nostalg, fokker,rejuven, magneto, tabloid, standoff, fang, authorship, toddler, acl, rapist, clumsi, glossi, goliath, avi, deconstruct, rhinocero,
subterranean, gpl, dlc, ·, stride, cgi, entangl, firefli, tor, humanoid, postproduct, vane, blew, bootleg, bluegrass, hawker, kickstart, flea, tortois, kr, helper, prerequisit,
entrylevel, thinli, sorcer, philanthrop, pegasu, rein, hacker,jive, binocular, ovid, interlac, kin, enigma, dat, transpir, suv, punctuat, parlor, pinch, preset, alchemist, wouldb,
pharmacist, piti, sentient, gangsta, spectacl, mattress, plough, melancholi, sender, adida, subchannel, stigma, watcher,valor, trespass, accru, windi, infanc, konami, passer,
whiskey, billiard, interraci, hord, yeah, applaud, hera, medallion, antelop, pax, paranorm, starbuck, elf, tlc, kepler, hysteria, dusk, landbas, passov, nostalgia, snoop, tr,
issuer, bl, anxiou, kite, fairchild, scifi, raisin, entrepreneurship, protestor, mahogani, womb, perl, hk, vegan, frantic, bastard, covet, turboprop, urn, corvett, stela, sentinel,
agnost, winfrey, youngster, pelican, requisit, coop, odyssey, tacitu, unharm, hypocrisi, wellestablish, upanishad, bodywork, parenthes, rj, puberti, misspel, reckon, auspici,
sapien, curfew, cymbal, twopart, loom, blacksmith, pancak, multilingu, egalitarian, dp, unambigu, viper, assemblag, turnaround, rn, cull, weep, peach, autobot, impati,
participatori, volvo, awesom, ericsson, derail, mayhem, atrium, recuper, hen, oprah, leftov, omnibu, saab, frenzi, oedipu, circumv, marconi, xmm, unfairli, summaris, puck,
reassembl, paranoid, whatsoev, intertwin, department, stargat, mx, sip, namco, unbeliev, sig, facelift, hype, relentless, chakra, jewelleri, supplementari, chime, markup,
torrent, generos, shortcom, deciph, chaser, tartan, guevara, carriageway, extraordinarili, spaceflight, disengag, impetu, farc, comprehend, shipwreck, galileo, poorer, bw,
execution, unmark, compliant, winemak, din, horseback, clockwis, stupa, creas, tulip, strangl, conscienti, outperform, lexu, glamor, tougher, stricter, sandal, inexperienc,
com, hourli, payrol, jag, secondhand, suffic, reconfigur, diaper, groundbreak, neat, secondgener, rko, effigi, tweak, tonnag, cryptic, fetch, hsv, supergroup, affidavit, bg,
cun, grumman, uncanni, unleash, disallow, jg, braveri, pervers, folio, ratchet, impedi, um, orat, mnemon, holist, disgrac, earnest, gall, ati, newt, tata, webbas, rune, hog,
brandi, bandag, baccalaur, gallup, fowl, priorit, plank, ku, misfit, eo, csa, rooftop, bonfir, draught, quirki, hydra, vm, paranoia, loco, quan, reliant, wd, pitchfork, sprinkl,
lakh, coda, heap, hasten, harshli, xl, materialist, miseri, orderli, mundan, sprang, dreamwork, emu, iaea, patienc, heron, twoday, lax, undoubtedli, unison, stutter, barcod,
hooligan, longlast, hitherto, ass, sadist, staircas, exoner, dupont, saucer, nou, dolbi, psychopath, cellar, mele, volta, preorder, bodhisattva, pap, voltair, splash, techno,avert,
gallop, fieri, char, amazoncom, smear, aura, wildfir, suitcas, discord, worshipp, gnome, dissimilar, inconveni,substant, scrambl, detour, overcrowd, mismanag, gin, treacheri,
quadrupl, cambrian, extravag, cookbook, splendid, derelict, masteri, breastfeed, sixmonth, scari, stubborn, triton, domino, tempt, resuscit, standardis, mca, fn, consumm,
thunderbolt, apron, cctv, diphthong, cajun, nuditi, esperanto, monochrom, decidedli, reboot, flop, commonplac, overs, nra, masturb, shank, josephu, barren, kinship,
acquitt, rosari, mediocr, assort, unprotect, lien, apparit, cg, liar, multitud, gpa, factual, proprietor, typifi, rx, pimp, emphat, transgress, outdat, cauldron, esquir, astound,
tug, biscuit, handicraft, rescuer, novic, ls, sacrifici, broom, extraterrestri, cinemat, primu, contemporan, lightheart, nontradit, ddt, heroism, slant, handtohand, deepen,
fictiti, yamato, bale, mug, metaanalysi, papyru, incapacit, concis, rewritten, subvert, blueprint, oneoff, taglin, engulf, atc, attrit, pretens, loath, recordbreak, typewrit,
bun, interceptor, selfish, shack, indec, bharat, peta, upstair, mace, louder, fy,coup on, genitalia, malic, gambit,salamand, untouch, hottest, retitl, fisherman, tyrannosauru,
flamboy, handwrit, appal, matador, rendezv, solari, coproduct, sapphir, astra, reclassifi, hindustani, aoc, uri, progenitor, threemonth, apocalypt, untru, astral, rv, twoseat,
stricken, scotch, bullion, improperli, heracl, outburst, tp, onehalf, hasnt, antagon, ruse, icao, neptun, rad, sparkl, nypd, mart, asa, ero, discriminatori, grotesqu, javelin,
beginn, macroeconom, housew, etho, clover, ramadan, iss, invalu, disprov, afloat, typefac, rebat, worldview, freighter, gees, hermit, iaf, lame, haze, dynamit, eclect, lingeri,
interlud, tout, careless, precari, discover, reinterpret, peacetim, equinox, unplug, inflight, monik, racket, oldfashion, reputedli, tack, orb, unnatur, troll, nam, interspers,
burglari, digger, wallet, tame, uneth, galactica, shard, ingeni, misinterpret, apocryph, cannonbal, pn, precept, shaken, thale, compassion, iec, prologu, epistemolog, shuffl,
buyout, stare, drinker, pinnacl, booklet, mta, reintroduct, dilig, selfconsci, gs, extrapol, slack, ebook, citroën, refund, stardom, turban, gorilla, upbeat, spinner, cautiou,
bellow, decca, viennes, hardcov, stave, scooter, stretcher, euthanasia, anti, recast, operat, pli, buick, remad, eb, tesla, acorn, conjur, forgeri, onetim, pleistocen, needi,
resurfac, onehour, cinematographi, undetect, ama, solidifi, fabul, stat, backpack, epitom, motorola, mouthpiec, diversif, swastika, wheelchair, brightest, yom, phalanx,
multidisciplinari, lore, transpond, profoundli, pd, anarchi, internation, chinook, albatross, mu, westinghous,tombston, timeless, scoop, exquisit, gunneri, funniest, contextu,
midday, nighttim, pluck, ×mm, tuck, lampoon, concur, orchid, sire, worthless, kia, tore, foreclosur, mainstay, corset, calligraphi, antidot, cryptographi, disarma, xerox,
hastili, resumpt, castor, fugu, cuckoo, retribut, glorifi, apc, peg, foray, birch, lieu, introspect, mau, chequ, wreath, hitchhik, pew, spreadsheet, dropout, bulldoz, iu, abrupt,
loft, lucif, oa, caption, pe, beret, uneasi, penthous, lü, supplant, dh, entic, watchdog, neanderth, americana, taint, ot, jargon, pu, reclus, pinki, eater, silhouett, nov, féin,
rampag, snapshot, gass, pri, flatter, carousel, msa, forprofit, deadlock, seclud, fiddl, brokerag, skit, dualiti, mahal, ture, derogatori, stout, odin, chai, evas,fleetwood, adulter,
cyber, rehab, muppet, lex, elus, nonverb, raptur, martyrdom, aug, unjust, falun, keyston,nuanc, ordeal, headphon, chaotic, brillianc, penanc, saffron, hh, gentli, edific, blaze,
plung, slipperi, lexicon, standpoint, dubiou, unintent, gunshot, forgot, ro, keynot, underdevelop, preemptiv, futil, succe, aqua, burgeon, fourwheel, reprimand, rye, eboni,
asham, cadenc, append, multi, dionysu, glimps, blizzard, stripper, stylu, misconcept, éireann, rarer, roost, unsign, hasbro, gc, shook, lust, priu, orion, megatron, enchant,
rem, lg, fanfar, dike, keynesian, polem, sciencefict, eyesight, mag, irrat, var, opengl, unorthodox, ito, rampant, downturn, coerc, popularis, bohr, selfproclaim, ohm, eid,
artemi, sonnet, glitter, cocktail, disassembl, outcri, jeep, trash, promiscu, hypnot, gown, layoff, reconsid, recollect, netflix, misfortun, ubuntu, melon, dinar, fukushima,
tabernacl, hertz, brute, codec, dáil, kingfish, psych, renegad, infidel, parisian, elucid, picnic, paraphras, freemasonri, booti, firstperson, ak, pap erwork, foo, subvers, vc,
wrongdo, steak, insolv, rocker, interplay, scholast, looney, amic, rancher, cracker, tn, thaw, yearlong, colli, barbar, obelisk, scoreboard, overt, plow, loudli, vf, corona, illfat,
gita, moonlight, twoway, firebal, ichigo, couch, abod, gundam, symphon, chute, lush, crate, kd, acm, sed, psa, magellan, tyrant, sow, pedest, gambler, craze, embroideri,
vt, karaok, apprehens, adept, lr, botani, hindustan, cu, unexplain, transsexu, grit, yahweh, youll, orangutan, slander, valkyri, diner, alt, calv, meander, preempt, tutori,
rustic, mover, polari, sloth, ponder, introductori, reshap, kierkegaard, tm, gigant, archeri, crisp, zedong, stairway, uav, skid, rariti, zeta, interestingli, checker, eschew, htc,
ramayana, kiwi, standbi, glaciat, reindeer, notforprofit, causat, ecommerc, repaint,pilat, juror, anvil, abridg, sauron, trident, quad, magpi, catapult, franca, indetermin, viz,
footnot, shortcut, horrif, meaningless, pinpoint, neoliber, werewolf, audiobook, accordion, tith, disclaim, wiener, backer, triumphant, biomed, thistl, showroom, curricula,
manslaught, crossbow, roadster, postcard, cockroach, blackandwhit, glitch, drm, psychotherapi, cactu, klux, kanji, kiosk, npr, moratorium, breez, overdr, dentist, bystand,
workout, underpin, psychoanalysi, dreadnought,ta j, delphi, woe, woodpeck, syncop, ib, dialog, counterpoint, expressionist, adulteri, cohabit, overpow, lancer, dunlop, elud,
abyss, adverb, retroact, guis, whenc, conflat, dab, fragranc, childbirth, handbook, firsthand, unrestrict, smoothli, disrepair, barefoot, impressionist, ici, poke, contradictori,
glad, quarantin, castrat, lest, br, decepticon, phish, overtak, omen, crucifi, tangl, shapeshift, surmount, qa, ig, zenith, refit, tango, cyborg, widerang, talon, subprim, bland,
bugl, chernobyl, alias, phenomen, gmc, stylish, unrealist, silli, unearth, fanbas, sling, healer, accentu, remors, ww, mania, gratitud, vignett, tantra, maraud, atheism, bac,
signag, sunken, pti, corrobor, epilogu, mil, mbc, sidebysid, downsiz, kangaroo, dormant, midi, slid, hoc, kt, backstag,inton, stringent, delici, scribe, nikon, comma, pr, bhakti,
spectr, rediscov, promo, pixi, helpless, hive, sociopolit, stray, apt, fanat, trapper, boxoffic, vultur, infal, mantra, subcultur, mime, desol, proactiv, fullsiz, skype, autograph,
keel, overcam, idealist, fourdoor, heartbeat, mattel, pasteur, emerald, csi, hopeless, elaps, envisag, rabi, mahabharata, hug, vulgar, delus, outsourc, satisfactori, scuttl,
noncommerci, boomer, maru, remiss, lm, nexu, mane, esp, psychoanalyt, ale, suffoc, pois, almanac, etiquett, transcendent, euclid, restat, bondag, tintin, morph, inaccuraci,
unauthor, noncombat, falter, earthli, repaid, pda, henceforth, obliter, cadillac, swirl, dd, fledg, epoch, manoeuvr, displeas, vehicular, forese, allegori, oat, widget, maximu,
miocen, aegi, fend, kabbalah, manic, skunk, gaze, piraci, chevron, stifl, shabbat, improb, clap, unconvent, np, bmg, nasdaq, sucker, apprehend, sae, molotov, withheld, poss,
fsb, alevel, zebra, vip, horu, tusk, hs, messerschmitt, elk, goos, harlequin, reced, hourlong, drank, usbas, invoc, ebay, dichotomi, potion, subsum, electra, teas, cardb oard,
ironi, gcse, awe, wutang, envi, insofar, jackal, grappl, incest, hobbyist, sampler, vet, starship, soc, cliqu, typolog, casket, isp, bmi, cumbersom, schemat, insignific, delicaci,
beech, grung, misrepres, paramed, overshadow, closet, torment, asu, laplac, stork, panason, rag, gamecub, phenomenolog, expon, policymak, mojo, esa, che, cybernet, afp,
nestl, walmart, exemplari, twohour, xx, handwritten, nerd, polka, ethnograph, antisubmarin, portmanteau, notwithstand, freshli, toc, courier, referr, stewardship, kei,
worldclass, foreground, disqualif, loneli, torchwood, enquiri, havoc, unpublish, unsettl, hobbit, swung, selfsuffici, beagl, gra, fab, accustom, druid, vodka, foundri, crave,
abound, rubl, preschool, malici, righteou, obligatori, reimburs, condon, millennia, yesterday, hoard, centurion, sinist, junker, personif, sli, decapit, adc, voodo o, herm,
waterg, kindl, attir, abelian, dowri, greed, yearbook, endgam, throwback, msn, camper, nag, cappella, ia, hardest, adjud, glamour, undo, hearth, semicircular, monopol, dew,
ks, rubbl, utopia, greedi, dsp, ccc, futurist, courtroom, rr, medicaid, blacklist, telegram, lazi, kink, sinn, symbiot, hade, php, outing, nazareth, scissor, minstrel, unresolv,
subsect, belliger, naughti, siren, uranu, dissuad, veer, prometheu, restitut, stateoftheart, sprinter,prerecord, cleanli, maimonid, exceedingli, overtaken, coupé, ddr, barricad,
anthropomorph, ope, empathi, neuter, notebook, cessat, nullifi, ox, shortwav, suitor, bandai, scorpion, startl, richli, underwear,bae, daredevil, horsemen, tumbl, doomsday,
cong, arisen, pinbal, visionari, kayak, thirst, peertop, graveyard, diacrit, oldsmobil, spaghetti, bitterli, poppi, displeasur, manli, tardi, ei,
V2=cher, bj, constanc, dani, bartlett, rene, melvil, rowl, barth, ryder, stephenson, hitchcock, kendal, brahma, elisabeth, burt, polli, foss, craven, manni, kerr, berni, klau,
alexandr, benoit, petri, bernstein, upton, rei, parri, maci, russ, townshend, fei, elop, baptis, gabe, melvin, osbourn, osman, sita, meng, stepmoth, dusti, ein, wheeler, b eckett,
elain, becki, jai, descart, slade, khrushchev, bigg, weir, kaplan, bingham, elena, pahlavi, kissing, adolph, guthri, dramatist, griev, mayfield,thornton, eliot, kang, sisterinlaw,
voldemort, susi, mcgrath, cynthia, rahul, ame, xiang, chong,lazaru, mahmoud, priscilla, b erg, paterson, fatima, silverman,cale, smokey, jakob, eno, sampson, cassidi, baum,
elmer, partridg, tong, mildr, guan, harlan, mcbride, cabaret, mccoy,karim, serg, eastwood, reilli, gee, jacobi, croft, neumann, sgt, joann, meg, reggi, zane, bongo, cullen, vito,
sax, trotski, bhatt, gilmour, nan, pam, lamont, braxton, ryu, vaughan, kobe, wesson,tyson, mcintyr, lea, hillman, capt, roth, vicki, boyl, emeri, brandt, marquess, earnhardt,
zappa, olaf, sutton, kaiser, astor, nikolai, ringo, ashok, minogu, bard, stacey, mcclellan, calvert, kramer, hagen, cartwright, modi, farrel, walden, bai, ella, anand, ulyss,
exchequ, sargent, kamal, fiancé, jacquelin, godfrey, custer, jacobson, brezhnev, childless, louie, madden, jawaharl, busch, humphri, jonni, anwar, donaldson, filmographi,
draper, goddard, kirbi, huey, grimm, sherri, cbss, flair, nath, aerosmith, begum, lowel, milo, zhu, emmanuel, abbott, jamess, fatherinlaw, patterson, linu, seaman, deacon,
woodward, shortlist, heartbroken, abram, rori, vera, horowitz, lorrain, joli, indra, mehm, reluctantli, kabir, rosenth, ste, mckenna, coleridg, boo, scarlett, aguilera, jinnah,
bellami, dent, remarri, pei, lam, carla, meredith, wilbur, dodd, courtney, rishi, sanford, newsweek, ell, henrik, annul,gregg, mcgovern, lott, poe, bridget, hale, joplin, giuliani,
gerhard, ramon, heidegg, biopic, serena, ulrich, herod, illegitim, tiffani, braun, dorian, concubin, ridley, dre, robbin, trombon, bundi, hanson, tun, epstein, sylvia, hooker,
josef, titu, obituari, nme, dariu, vanc, miln, ty, rudolph, exclaim, sadi, thatcher, dion, hodg, prima, ra jah, henrietta, davenport, dowag, dorsey, rousseau, manu, taunt,
kubrick, mahmud, radha, herzog, cecilia, sigismund, tobago, hayn,maha, djokov, reza, daryl, kathleen, johan, beal, obo, valentino, pandava, armand, whitlam, arjun, orton,
hui, ewe, lulu, heinz, sardar, shakira, rani, tinker, trier, pia, kern, jp, hai, reev, unita, lal, sumner, ke, bint, coltran, cabot, weston, hine, fullback, anastasia, jefferi, anita,
aloud, britten, gordi, whereabout, woolf, quintet, halen, robson, picker, tilli, stafford, olivi, vikram, ek, yadav, aurangzeb, eisner, maher, puri, honeymoon, cartman, tal,
woodrow, lindsey, leann, eliza, flynn, durant, netanyahu, asimov, shamrock, keenan, shaun, happili, konstantin, hendrix, namesak, hume, courtier, diva, orson, himmler,
welch, becker, yate, cinderella, valeri, mae, zack, naomi, cromwel, andretti, corneliu, sidekick, sanjay, cutler, burnett, teuton, barnard, davey, vick, clapton, brigham,
winthrop, khalid, gradi, handel, shapiro, norma, brodi, ritter, connor, bequeath, schneider, hammerstein, collier, samantha, marlon, whitehead, barkley, eyr, yeat, ritchi,
mastermind, waugh, rosi, olson, madelein, desmond, alicia, tolstoy, bei, gail, adler, buster, daw, josiah, chandra, abigail, dmitri, shelbi, ramsay, michelangelo, harmonica,
schmidt, mcgraw, liszt, wolff, lev, gideon, holliday, finch, gomez, sheldon, englishman, tung, evelyn, magdalen, cowrit, jun, howel, galen, infatu, mckenzi, priya, muller,
corsair, lyricist, halfsist, lindbergh, edna, shea, goldberg, germain, streisand, theodosiu, christen, raphael, rutherford, austen, prescott, senna, maclean, alban, uncredit,
29
hain, buckley,bianca, organist, kung, reginald, ramsey, nightingal, macfarlan, boyz, entourag, guo, ripper, sj, peyton, favr, slept, horton, landi, chun, blanch,zachari, timur,
cello, offbroadway, barrymor, trey, bain, lu, gough, ping, menzi, gladston, menon, muir, barlow, nguyen, ganesha, murad, adi, cedric, bentley, ing, richter, grayson, pearc,
nana, bree, cassandra, wilk, brigg, mullen, varma, helmut, aj, doherti, olli, ruskin, hubbard, moran, dicken, stonewal, nemesi, hershey, stoog, snyder, hendrick, bate, hari,
napier, yin, ingram, duff, staffer, protégé, peck, mcdonnel, palin, sergei, nakamura, ja, jenna, hansen, lau, romano, papa, ric, slater, leonid, winger, rockwel, hollyoak,
nevil, duan, albrecht, cinematograph, spector, cantor, irwin, gaiu, sweeney, hutchinson, harley, kellogg, choi, dow, nikola, stein, maureen, narayana, sylvest, spielberg,
hartman, ander, arya, leah, lucil, siegfri, clemen, geffen, blackwel, tanner, jing, ayer, igor, melani, bartend, jolli, saxophonist, howe, taft, claudia, nat, picard, dobson,
carmichael, monti, mulder, carver, duran, grover, flo, moodi, natalia, nathaniel, gabl, brando, kimbal, wainwright, maynard, pj, dunham, alfa, gilli, parton, tendulkar,
coowner, baird, blanchard, jang, springsteen, sati, markham, miriam, berat, thierri, rous, hernandez, sharif, patsi, carolyn, anjou, ang, dyer, houghton, pauli, oppenheim,
underwood, novella, nader, clarinet, jb, damian, waltz, tennant, cohn, og, mustafa, kemal, saul, beyoncé, omalley, freder, dutt, beaumont, mckinley, minh, greatgrandfath,
ayr, gan, malon, oti, tao, mcpherson, rabin, donovan, huffington, agatha, gueststar, cobain, dun, rollin, pir, rae, benton, clau, kyli, karan, gaga, relent, linden, fulton, jj,
marcel, cato, tutelag, salvator, orr, compton, canning, ruben, nolan, sila, mcgregor, bernhard, sinatra, chaplin, ao, hector, engel, priestley,gibb on, forsyth, mugab,mcdowel,
melinda, pamela, burr, merl, ashton, lawler, virtuoso, ripley, yamamoto, chu, erni, prasad, dalton, paisley, narayan, brutu, cara, housem, arden, dil, vasili, barrow, ala, nicki,
bandlead, luciu, hick, cicero, ellison, steiner, hayek, cbe, hubert, reuter, odonnel, compatriot, molest, violinist, andersen, tomlinson, footstep, famer, phoeb, obe, foreword,
kuhn, pollard, eusebiu, akira, teller, iqbal, duffi, leela, katharin, kaufman, thorp, iyer, dhabi, bea, shu, ozzi, pickett, gottfri, bender, orléan, carlyl, bono, alexi, göring, fisk,
kean, dustin, schumann, lister, cass, oconnor, snl, donni, keegan, ail, benefactor, letterman, kamen, unmarri, darryl, bonham, syke, stefani, sham, madoff, kala, layton,
konrad, dixi, yusuf, aur, erich, popper, garrett, merri, philanthropist, mansfield, acharya, rowan, brennan, luka, loretta, jeremiah, tj, cush, darrel, babu, skipper, lacey,
hester, kimberli, kazan, bryce, hepburn, mercer, sinha, jovi, graf, asher, burgess, om, fielder, tudor, zelda, lori, zimmerman, greenwood, xu, ballard, terrel, addam, ballerina,
putnam, rai, kobayashi, martini, fowler, wiley, brock, alec, massey, kitt, cunningham, julien, loeb, bourn, villeneuv, rubin, slain, squir, gorbachev, bhai, schwarzenegg,
harald, mara, persh, romney, simeon, connolli, alf, frazier, rolf, ich, guido, bertrand, doesn, juda, metallica, waitress, foley, spade, mather, oconnel, playbyplay, mohan, abd,
cena, hallow, blyth, atkin, tanya, louisa, dolli, surya, zu, yao, gertrud, mandir, rigg, yan, seinfeld, georgi, gareth, chow, inferno, ava, merton, forster, b ede, brenda, annett,
shakur, larsen, huang, mai, mahler, butch, stefan, skye, smiley, gale, metcalf, ezekiel, bradman, claudiu, hobb, tex, denis, plini, mcguir, dickinson, baxter, vern, mandi,
edda, pavel, maximilian, keaton, rhi, chloe, coppola, lillian, liddel, khanna, amar, bachchan, flanagan, jedi, payn, mcqueen, sasha, damon, goldsmith, marian, mccormick,
alain, bess, garland, accomplic, émile, caldwel, cosbi, sheen, skinner, eduard, shakti, nair, supper, mosley, raoul, cowork, jamal, burrough, bran, soninlaw, leung, gillard,
irvin, megadeth, garth, kendrick, pryor, dandi, majorgener, moffat, booker, derrida, hadley, sheppard, marlow, behest, mariann, nawab, ajay, tweed, beatric, laurent, yd,
bhutto, mcgee, phylli, guggenheim, ravi, dirk, curt, wittgenstein, mage, gogh, bliss, allman, prem, derrick,unbeknownst, keller, ching, antoinett, keat, kart, epistl, crockett,
ellington, taco, luciano, siva, trudeau, mabel, reuben, boyer, socialit, axel, terenc, hayley, hanna, saviour, eastend, exhusband, holt, barr, baro, babe, dong, larkin, mehta,
walton, playmat, scroog, dem, beckham, stravinski, conway, hussain, minni, foreman, peng, matti, godfath, chaucer, rashid, warden, shin, crowley, brewster, messi, fitch,
andrei, olsen, knox, simm, frankenstein, barbi, montagu, maa, clayton, lilli, godwin, rees, rosemari, galloway, sweetheart, royc, mori, editorinchief, darci, salim, payton,
deborah, kingsley, zia, sharma, brabham, hooper, infuri, ida, flirt, oneal, gillett, unborn, gonzalez, dietrich, nur, ó, mors, finley, sal, waller, higgin, bandmat, foucault,
archduk, chand, osullivan, genghi, reddi, roxi, coe, hark, emil, sadler, duma, paddi, edith, slay, addison, jessi, mandela, angi, didn, theresa, leno, duli, devin, clarkson,
gerard, perez, jericho, brent, quinci, strauss, davidson, nilsson, rawl, sheila, nico, ignatiu, nelli, cochran, yuri, musa, yoko, clifford, mackenzi, lola, friedman, johansson,
mogul, bose, peacock, gemma, tiberiu, hewitt, mariu, housekeep, haig, jona, hess, mirza, moe, eminem, christensen, tobi, gong, larson, prost, lyle, ike, mahesh, vaughn, luc,
drummond, lawson, bloch, yong, eastman, hearst, mansel, kwan,oreilli, brisco, gerrard, frenchman, everett, ariel, kathryn, bauer, sexiest, hemingway,o ctavian,hilari, zheng,
publicist, yun, stewi, manson, ghulam, hoffman, padma, mbe, demetriu, qc, erasmu, graem, wordsworth, viscount, ursula, shackleton, selena, berger, vettel, maud, bogart,
boa, barnaba, garfield, abbi, grandmast, murdoch, swann, mcconnel, costello, rascal, tyron, middleton, dina, policeman, kat, berman, tak, sandman, ying, gaston, puja, gina,
tess, mobster, hoffmann, caleb, schwartz, cassi, oakley, opin, ji, sculli, manfr, b eatti, cheng,dyke, lana, liang, eaton, greer, faraday, spitzer, bowman, mei, timberlak, clifton,
arti, hoyt, jensen, schulz, chappel, spock, audrey, bullock, titular, trajan, roberta, buffett, dewey, rana, paig, salman, hathaway, lai, erwin, whitak, tristan, salomon, damien,
shepard, crouch, arun, merril, virgil, marcia, ezra, gifford, bismarck, shearer, rosen, wynn, arjuna, alumnu, aubrey, b eau, colbi, goeth, patton, mackay, aziz, cleopatra, angu,
sutcliff, stringer, stevenson, mukherje, oswald, dai, debra, vijay, eugèn, edi, zoe, ohara, b ey, mayer, gama, goodwin, heali, ramakrishna, wren, ness, schultz, faust, spenc,
betsi, dahl, olympian, aldin, abe, waldo, jare, gilmor, rowland, hopper, morley, wonderland, dane, albright,stoner, camil, wendel, bene, traver, vinci, fabian, parvati, wilkin,
bett, hu, penelop, ahmadinejad, russo, mendelssohn, goebbel, burnham, loren, coolidg, lehman, fleme, distraught, brecht, mein, xiao, maguir, abi, osama, zach, tweet, milli,
hindenburg, williamson, nero, lear, josephin, alvin, newel, munro, dominiqu, jock, joachim, bloomberg, warhol, joanna, tagor, kidd, sabrina, saunder, watkin, lowri, tammi,
baudelair, nietzsch, hayden, seward,ada, rohan, paleontologist, sherwood, cobb, kai, walther, kara, meek, erica, rudi, holloway, yogi,für, atkinson, henley, yvonn, lara, fiona,
deng, haydn, springer, js, merritt, veronica, adel, lerner, joshi, hawthorn, sima, theo, lew, mia, kri, charley, rajiv, samson, juliet, archangel, hayward, meteorologist, weiss,
sloan, eileen, alistair, randal, shan, farley,dimitri, hasan, p ott, twain,wyatt, mariah, matthia, conqueror, roe, weinstein, hilda, ginsberg, colbert, bakr, heidi, gage, nicholson,
overhear, heller, grandpar, marguerit, nugent,jonah, woo, cori, wee, cassel, poirot, esther, alam, willard, durga, hurley, putin, cheney, pollock, leroy, singleton, clint,altman,
behead, provost, eulog, mira, norri, raju, dreamer, lakshmi, mir, daley,evangelist, libretto, mitt, betroth, hanuman, barton, maharaj, regina, webber, jasmin, roach, jokingli,
katz, nikita, erin, bahadur, mathew, xfile, faber, hippo, lennox, rommel, steinberg, suleiman, jarvi, martel, shen, cavendish, kristen, kaishek, polk, béla, dev, angrili, cho,
pooja, viola, bert, marjori, mott, richi, cj, nora, gladi, henson, jayz, simmon, tanaka, dali, anil, wr, gunn, firth, blackston, rep, jasper, capo, cathi, nate, emanuel, gamespot,
faulkner, chopin, hodgson, speer, darl, hazrat, alison, grossman, carlton, tchaikovski, mosh, cornerback, shanti, zhao, tarzan, leland, pandit, iain, cheung, hazel, jagger,
patel, staci, wharton, groen, viktor, yve, monet, mack, wanda, ogden, schubert, elijah, harriet, chung, garri, keyn, cari, libbi, macbeth, aga, cheryl, lieberman, phelp, vivian,
fairbank, talbot, abel, apologis, lucia, peirc, alexei, mclean, kimmel, smyth, copeland, coward, lar, tha, bradshaw, roommat, savil, lena, cowel, dempsey, earp, wilkinson,
def, chopra, marr, averi, carmin, ethel, hon, jude, ledger, hotspur, kristin, hua, daphn, tobia, tian, stow, suzi, jiang, templar, mortim, sutherland, bariton, bergman, sai,
hatfield, hartley, janic, gao, denton, bradburi, morrow, shivaji, jameson, carli, wilcox, amelia, ricci, mcnamara, pell, conn, ono, arlen, ismail, himach, ej, tam, morrissey,
feng, housewif, pundit, exwif, heme, sahib, matilda, gallagh, scorses, cha, jani, harrington, brig, decker, yeltsin, lauri, percussionist, haa, townsend, olga, soloist, ist, sach,
gershwin, grime, gardin, sawyer, malik, lim, bret, jodi, hamid, gotti, dori, thom, dreyfu, francesca, amir, waiter, barbarossa, bunt, cindi, keyboardist, lizzi, tracey, siegel,
cy, marta, sabin, botanist, amadeu, macleod, orwel, stanton, grandchildren, xv, kahn, kelley, easton, felic, frasier, macmillan, gile, nikki, wasn, nanni, conni, umar, müller,
hahn, dickson, barron, shankar, gopal, jimi, estrang, sonia, rankin, elliot, jarrett, headmast, amo, fonda, lamar, megan, middleag, novak, ambros, vinni, marley, nehru,
jeanbaptist, deathb, prodigi, maxi, getti, dyson, macpherson, cocreat, druri, stepfath, swanson, harman, psycho, shelton, pasha, greenberg, dawkin, ganesh, tycoon, mona,
henchmen, douglass, rasmussen, herbi, judd, countess, diaz, whistler, ling, comedydrama, yue, msnbc, dillon, gillian, huxley, ren, ree, haley, duval, mccall, bunni, jahan,
fallon, bowen, rusti, qi, iren, horrifi, jung, rudd, codi, goldstein, minaj, natasha, bartholomew, kemp, dora, whitman, biden, swore, asha, bragg, knowl, radcliff, xian, lydia,
granni, mimi, wen, dunbar, jermain, rosenberg, hem, darbi, archibald, prakash, calhoun, fay, uthman, elia, blain, regi, dumont, nadia, gretzki, mckay, enoch, petra, edmond,
isaiah, congratul, winslow, aquitain, mister, burrel, clanci, rooney, carlson, clare, jax, sammi, gillespi, chrétien, rufu, gwen, deva, erickson, faisal, vader, kipl, jc, gleason,
banjo, amr, billionair, suzann, yew, jeann, bower, hutton, graci, dole, weinberg, aquina, barnett, corbett, mandolin, scotti, glover, saraswati, lenni, feldman, earldom, falk,
corey, heiress, gavin,mclaughlin, pai, marlen, xviii, huston, rihanna, werner, mitch, fran,
V3=predic, ellipt, ganglion, gpu, stockpil, taller, asynchron, edema, unpleas, circumfer, ultrasound, placenta, interlock, nanotechnolog, highqual, planar, gestat, blocker,
pentium, glide, myocardi, disinfect, modulu, polygon, gnu, fingerprint, plutonium, lubric, conspicu, overdos, uptak, gaussian, vastli, seismic, sedimentari, decompress,
truncat, conif, agonist, plum, rotari, xy, cleaner, blister, sq, silt, deterg, alkaloid, broth, jelli, boson, minimis, lettuc, circuitri, fasten, norepinephrin, leach, conduit, barley,
subspac, aft, herbivor, testicl, sinu, mismatch, carbohydr, phosphat,lympho cyt, sewag,paralysi, co olant,mo dular, dementia,hivaid, actin, monom, perpendicular, tangent,
longitudin, asphalt, trough, dung, lid, superconduct, fractal, swollen, phosphoru, blackberri, cantilev, disconnect, enamel, discomfort, foam, hydrid, lifecycl, parametr,
cretac, smelt, mustard, benzodiazepin, valuat, heater, nicotin, sync, entropi, seam, igneou, avian, airspe, retrofit, coli, shrink, slit, pdf, refract, electromechan, hydroxyl,
unintend, starch, longrang, uneven, pineappl, phosphor, tray, quantifi, permeabl, incis, basalt, facet, pancrea, ≤, phenol, thruster, graft, lisp, permut, biomass, silic,
unman, neurosci, carburetor, incendiari, nonhuman, mussel, inward, recharg, lowcost, venou, pediatr, trajectori, ct, inlet, micro, taxabl, dilat, compressor, cytokin, pvc,
masonri, shoal, optimum, grower, highend, lagrangian, granul, predetermin, cranial, inerti, neurotransmitt, inhibitori, alkyl, bowel, gum, pesticid, nitrat, pickl, stamina,
headlight, eukaryot, fascia, cytoplasm, verif, porou, pendulum, lupu, retent, hubbl, lumbar, δ, creep, halogen, torsion, recombin, evergreen, adhes, nt, knit, meteorit,
methamphetamin, selfesteem, steroid, stellar, harden, coke, subsist, unicod, ϕ, voip, cellulos, congenit, sweeten, covent, taxonomi, abdomen, manur, fructos, measl, depreci,
rodent, firmwar, gsm, pancreat, burner, herbal, yaw, aftermarket, improp, pacemak, hue, handheld, hallucin, intracellular, cmo, alga, slug, mucu, crank, cramp, inflow,
megawatt, submachin, hemp, cereal, prognosi, refractori, zoom, subsurfac, insomnia, adob, reflector, cholera, snout, vortex, xenon, platelet, woven, penicillin, fermi, fungi,
rust, muddi, graze, polymeras, duplex, evenli, garbag, fungu, tau, horsepow, forehead, tradeoff, sine, debug, flammabl, amnesia, buckl, decidu, nonzero, convect, clog,
sulphur, prefront, contraind, finer, syring, peat, atrophi, incub, reload, baggag, auditori, spectromet, lactat, mucosa, sulfid, inorgan, calori, parabol, aircondit, hepat, bruis,
decomposit, metamorph, spong, khz, pollin, notch, lichen, mtdna, scalp, null, peni, bleach, lifespan, uv, scarciti, wastewat, filesystem, latenc, thicker, countermeasur,
phenotyp, foliag, router, rippl, trivial, soak, psychot, braid, itch, kinas, neonat, ipv, canin, morphin, constrict, amp, bloodstream, airbag, phonolog, inductor, deflat,
primordi, macroscop, pastur, matric, proportion, lumber, yeast, pv, spleen, rot, thorac, undul, reddish, gait, microprocessor, dorsal, acet, infus, locu, ballast, tuna, benign,
analges, köppen, multiplex, analogu, hotter, gravi, nozzl, tomographi, loosen, morbid, coval, malwar, anomal, quotient, ultrason, hilbert, regimen, pollen, resin, insensit,
wifi, workstat, appetit, onboard, beryllium, disson, precaut, infest, inert, builtin, logarithm, grind, eeg, leukemia, reusabl, millet, camshaft, inhal, lighten, pellet, perfor,
phylogenet, encapsul, manmad, withstand, cdma, nausea, flex, sap, smallscal, snowfal, covari, tentacl, asymmetri, pandem, crosssect, coagul, vagin, cinnamon, mole, ach,
chill, bottleneck, firewal, boolean, pcr, tread, antidepress, firstord, tecton, formaldehyd, lithium, hn, maiz, refil, acetylcholin, taper, subtyp, perenni, chimpanze, inexpens,
psychosi, siphon, refresh, highperform, ditch, anesthesia, particul, breadth, sharpen, microbi, ribosom, litter, strawberri, hf, scalar, url, lifethreaten, ounc, acryl, bait,
centrifug, quench, perfum, chimney, pci, baselin, powerpc, dilut, stabilis, epsilon, inertia, boni, kombat, fetal, unequ, shutter, plumb, nucleotid, rangefind, ammonia,
magnesium, neutrino, cough, unsaf, razor, filtrat, protrud, pulley, rectangl, dissect, boron, cylindr, milder, retina, thunderstorm, endogen, welldefin, waterproof, mening,
addon, overload, eyelid, salin, pore, quadrat, reentri, halflif, traction, undesir, drawback, latex, vertebra, socket, scuba, diod, opaqu, herbicid, graphit, cervic, starvat,bulki,
sedat, isom, ef, cn, aspirin, carcinogen, reclam, apoptosi, carnivor, stainless, crankshaft, cholesterol, beak, euclidean, fuze, ovul, twostrok,wedg, geotherm, fluoresc, polymer,
waist, mediums, sutur, testosteron, proteas, spp, stool, helix, tremor, oven, backbon, aldehyd, glutam, gearbox, gaseou, gastrointestin, apertur, amphibian, metabolit,
audibl, flake, calculu, fiberglass, tint, tether, unsuit, bodili, adren, prune, mould, prenat, crosslink, brine, serotonin, assay, accret, deactiv, pharmacolog, chalk, turbul,
elicit, wafer, xml, plankton, euler, termit, antimicrobi, prokaryot, theta, otter, diarrhea, tonic, heterogen, ligament, allel, σ, chromatographi, valenc, headlamp, oyster,
ether, mgkg, ioniz, bedrock, taxonom, fibrosi, mammalian, prosthet, crt, noisi, shortest, farmland, parallax, electrochem, gel, ejacul, buoyanc, bmp, seawe, doppler, pelvic,
axial, tensil, welldevelop, allergi, arabl, lumen, telephoni, flap, dredg, intox, amd, scaffold, plume, atm, deforest, lc, dipol, tungsten, excis, monoton, hydroelectr, scanner,
tightli, rash, bladder, nebula, applianc, seafood, gut, css, plumag, viscos, clariti, λ, bulg, anemia, deceler, antioxid, citru, dehydr, purifi, scrub, amplitud, kbit, chipset,
homolog, gunpowd, octav, lessen, costeffect, asymptot, millisecond, footprint, hypothet, cumul, amphetamin, pariet, extinguish, resistor, supercharg, spectral, binomi,
opioid, dashboard, viabil, vesicl, benzen, lcd, landfil, buildup, aerosol, barb, allerg, pneumat, spectra, urinari, grassland, tannin, numb, inflect, macrophag, capacitor,
cleav, emuls, tcp, mitochondri, monoxid, photosynthesi, placebo, topographi, counteract, intermitt, fission, petal, etiolog, tyrosin, beet, cathet, lug, ghz, inciner, muscular,
excret, toxin, ampl, subunit, syphili, tick, damper, safer, duct, endocrin, sweat, steril, epidemiolog, glue, embryon, leakag, mesh, transluc, dosag, methanol, pear, attenu,
palett, convolut, latent,semiautomat, shunt, cleavag, fluorid, reactant, continuum,obliqu, hamiltonian, wearer, ht, ellips, errat, ev, hydrolog, dough, centimetr, µm, asthma,
biopsi, inflammatori, electrolyt, amplif, spoiler, sausag, regen, interperson, cobalt, timer, lath, diffract, lowlevel, fece, highpressur, subduct, washer, impart, munit, handset,
autoimmun, powertrain, plywood, vagina, salti, malnutrit,halid, asb esto, microorgan, submerg, relativist, necrosi, debilit, hygien, throughput, manganes, unload, magnifi,
smallpox, perturb, lymphoma, iq, avion, π, saliva, pasta, probabilist, schrödinger,recurs, methan, hairi, sac, virul, slab, outpati, mangrov, glaze, retard, tactil, tonal, ventral,
smoother, pastri, waveform, thermomet, ultraviolet, hotspot, transmembran, cuff, unix, fungal, kwh, triangular, scrape, fixat, anaerob, walnut,isomorph, charcoal, seawat,
legum, movabl, exacerb, toplevel, cocoa, contour, appendag, breech, vomit, qualit, pavement, stew, splice, realworld, agil, clad, drier, dataset, cortic, pars, epithelium,
seab, impur, immatur, serum, dim, shred, atyp, hydrat, hydropow, gust, thyroid, stochast, urea, ovarian, uteru, medial, endotheli, bacterium, damp, decompos, prism,
synaps, primer, thirdparti, mri, scalabl, selfpropel, ganglia, javascript, lorentz, brood, usabl, airflow, coars, sticki, orthogon, germ, polyethylen, opportunist, ester, gastric,
yearround, hydrolysi, calibr, reset, zip, ionic, convex,slender, chew, migrain, pebbl, curvatur, groundwat, snack, peptid, rainwat, outweigh, potent, contigu, nmr, harmless,
cadmium, brittl, modem, postag, nectar, clade, anal, follicl, generalpurpos, stove, vend, cathod, yogurt, contracept, capacit, ozon, kappa, indistinguish, milki, neon, cation,
thicken, theropod, solidst, uterin, howitz, arthriti, electrostat, nylon, ε, dissoci, localis, anatom, celsiu, softer, thinner, warhead, hash, dendrit, rectifi, tubular, dimer, ulcer,
anesthet, viscou, linearli, highenergi, smoker, hemoglobin, lobster, warmth, carbid, biodivers, occlus, vanilla, diaphragm, α, microscopi, tar, anod, hover, sideeffect, seedl,
brighter, thereof, aircool, cdc, heaviest, topograph, vitro, folder, innerv, ripen, reagent, ailment, eucalyptu, cornea, supernova, illicit, immunolog, quark, nuclei, queue,
edibl, mildli, quantiz, concav, innat, bipolar, determinist, cigar, onsit, aromat, smartphon, polymorph, titanium, slr, actuat, interconnect, mixer, outward, centimet,pylon,
dohc, zx, resili, riemann, hydro, loader, asexu, lamin, windshield, floral, weakli, latch, emitt, garlic, peroxid, tricycl, ovari, lng, antipsychot, intric, subfamili, compost, bile,
bulb, abras, mitochondria, redirect, synapt, tandem, histolog, ventricular, gelatin, olfactori, hippocampu, histon, mpa, transvers, cabbag, alkalin, flang, filament, shrimp,
polio, torso, ammonium, condition, β, raft, unaffect, nematod, asymmetr, laundri, swine, denser, planck, cartesian, infinitesim, mach, endpoint, intraven, necessit, ccd,
cushion, helium, mimic, exce, maneuver, arthropod, readabl, cyanid, booster, chloroplast, aneurysm, saltwat, chemotherapi, unpredict, petrochem, cutoff, mandibl, carcass,
alveolar, fixedw, ppm, microb, fore, fig, kv, hyperbol, increment, θ, transistor, suction, situ, perceptu, tumour, impract, carboxyl, truss, µm, fume, fertilis, spacetim,
semiarid, ineffici, etch, incandesc, squeez, alluvi, interoper, buttock, motil, runoff, aquif, brightli, pelvi, sew, tighten, lag, greas, lymph, nucleophil, binder, projector,
odor, conic, vinegar, aroma, dirac, enzymat, colorless, projectil, bhp, darken, poultri, drip, embryo, bromid, ligand, shrub, ieee, symptomat, scent, pcb, aortic, germin,
biodiesel, eigenvalu, manpow, chromat, lh, fern, millimet, collater, funnel, homemad, toxicolog, malign, crystallin, mango, ±, brows, irrevers, biochem, migratori, spindl,
coaxial, disproportion, semen, distal, moist, constrain, predatori, fibrou, frontal, loaf, syntact, respir, luggag, strata, mosquito, hierarch, warmer, immobil, elast, sturdi,
camouflag, kiln, forearm, invertebr, offroad, iodin, lexic, hydrophob, spectroscopi, rom, chlorin, spheric, dimorph, feeder, byproduct, thigh, schema, psi, mbit, faulti,
cytochrom, orgasm, isoform, autosom, biotechnolog, highpow, chromium, sprout, dopamin, infarct, basal, flatten, scsi, vat, mpeg, cucumb, rub, lumin, irradi, markov,
aberr, insolubl, aquacultur, granular, plasmid, coronari, nomenclatur, ripe, fissur, wheelbas, foodstuff, biofuel, cleanup, pathophysiolog, handgun, apic, relaps, planetari,
corneal, sulfat, sheath, hexagon, quadrant, halv, cyclic, menstrual, cataract, sonar, influenza, yarn, conveyor, shingl, malfunct, yellowish, barium, inlin, jpeg, methyl,
genotyp, luminos, nonstandard, bayesian, carbonyl, outflow,androgen, glu, overh, loudspeak, lipid, antisoci, poisson, suppressor, â, shellfish, condom, replenish, fetu, ipad,
clot, postur, fourier, superfici, bsd, savanna, lengthen, rudder, lump, carrot, recoil, magnif, encas, catalyt, iodid, taxa, fluorin, preferenti, suck, hamstr, avers, massproduc,
hydroxid, weed, hz, capillari, pallet, sanitari, lattic, epitheli, soybean, markedli, anion, ic, imbal, carcinoma, ecm, overlay, carbin, cryptograph, helic, caffein, microwav,
alkali, sore, tab, higg, heurist, purif, clamp, plaster, lightli, transloc, http, anomali, twodimension, extracellular, estrogen, malt, linen, geodes, workload, mainfram, indent,
firepow, threedimension, scaveng, touchscreen,collagen, pariti, radiant, antiinflammatori, cryogen, hose, kerosen, subtract, nucleic, squid, clam, subclass, nitric, opensourc,
pituitari, insecticid, flu, motherboard, macintosh, metadata, ubiquit, γ, glycol, palsi, abstin, canopi, retrograd, colder, affix, airfram, vivo, sickl, dehydrogenas, chunk,
potenc, html, predictor, adhd, epilepsi, rgb, canist, phylogeni, linkag, cassava,radial, hardwoo d, nostril, hemorrhag, ruptur, biosynthesi,aqueou, no dul, flare, pounder, dn,
metallurgi, volt, cipher, biochemistri, mrna, complementari, ω, crustacean, nippl, solder, wingspan, strut, microbiolog, almond, intrus, vapour, extratrop, magma, mite,
spore, fahrenheit, diurnal, soy, forag, uncontrol,cleft, glacial, vertex, disloc, lemur, marrow, bluetooth, photovolta, unwant, transfus, droplet, sludg, retin, fp, sewer, molar,
cartilag, sql, galvan, delic, agar, instantan, powerpl, molten, sunflow, subsystem, palat, thorium, untreat, quicker, punctur, transient, uniformli, spici, soften, soda, gui,
ethylen, polyest, vascular,µ, axon, p etrol, somat, superstructur, functor, precess, turbocharg, hydrocarbon, syrup, oneway,sugarcan, tendon, knob, dryer, furnac, hyperact,
kb, arid, benchmark, streamlin, pouch, skew, minu, runtim, stationari, macro, fourcylind, apex, cystic, phosphoryl, overflow, infertil, dope, sclerosi, chiral, interstellar,
sequenti, tuber, hing, filler, repositori, undersid, ethyl, lambda, transduc, sn, nocturn, nanoparticl, skelet, rearrang, amorph, spam, keyword, inactiv, ethernet, caterpillar,
bog, longitud, massag, bio, snail, throttl, ventricl, radiolog, elong, quartz, malform, aerob, weaponri, hypertens, groin, ipod, concuss, redistribut, summat, rivet, cultivar,
pheromon, takeoff, catalyz, dielectr, silica, floppi, paddl, amput, rf, mollusc, keton, cyst,
V4=anchorag, uci, bremen, metropoli, holstein, terrier, sagar, airstrip, decommiss, rink, pisa, burgundi, showdown, wichita, raleigh, honolulu, playhous, hillsborough, essen,
openair, yucatán, sooner, careerhigh, cheltenham, augusta, bazaar, suntim, avon, internazional,regatta, awa, luzon, taekwondo, sw, tahiti, hereford, galatasaray,wat, punic,
wyom, swindon, stirl, samoa, surrey,b oardwalk,goaltend, lynx, zurich, midwest, cypress, hackney, fruition, lineman, pendleton, hampstead, pike, sinai, warwick,paralymp,
britannia, lowli, tripoli, eskimo, qs, hom, vicechancellor, durham, chengdu, triest, lsu, barangay, somerset, hermitag, dakar, payperview, ba ja, metz, silesian, williamsburg,
antigua, galway, fillmor, kochi, heathrow, patna, lauderdal, grizzli, jamestown, swat, chattanooga, equestrian, chesapeak, hilton, farmhous, headtohead, arcadia, heidelberg,
genoa, sofia, suffolk, dorset, borneo, berkshir, racetrack, fk, tallinn, ghent, auditorium, northbound, utc, calai, canuck, centenari, goali, shortstop, limerick, ut, yearend,
volga, granada, atol, thenc, blazer, nugget, chinatown, nxt, ipswich, geelong, southward, cologn, auckland, glastonburi, seoul, xm, galveston, thoroughfar, schleswig, ural,
reenter, brunel, usl, salford, iaaf, raptor, deco, palermo, tavern, haifa, turf, infield, dresden, georgetown, reschedul, gamewin, northumbria, unbeaten, outskirt, threepoint,
cairn, biennial, wolverhampton, aggi, huski, inver, bodybuild, rotunda, fordham, squash, dormitori, encamp, colspan, somm, qanta, redshirt, mcgill, bahama, nagar, caf,
humber, monmouth, stoni, uruguayan, kathmandu, ff, wellesley, bathurst, cove, carlisl, tucson, antwerp, tf, upstat, centenni, everglad, disembark, northerli, myrtl, taluk,
durban, kolkata, turin, midget, penang, doneg, langley,schoolboy, sacramento, straddl, yokohama,thessaloniki, indu, viaduct, niagara, backtoback, newport, prom, vermont,
30
greenwich, postcod, aleagu, guangdong, subregion, savoy, eal, seasid, olympiad, twotim, mangalor, zürich, nagpur, allamerica, mrt, barbari, northernmost, wta, burbank,
argyl, jurass, nj, ventura, freiburg, tmobil, woke, fewest, semi, roh, kennel, madeira, staterun, asean, vicker, siberian, shawne, argonaut, hsbc, precinct, lill, nippon, verona,
fresno, monterey, belgrad, jaya, guangzhou, sumatra, redskin, bhopal, redoubt, flint, varanasi, interc, tryout, showtim, norwich, heisman, bogotá, intercollegi, occident,
renumb, anfield, undraft, abscbn, saratoga, snowboard, wiltshir, bois, boulder, cheyenn, dealership, starboard, ute, boutiqu, vale, claremont, tonga, canberra, shropshir,
arbor, hinterland, tehran, gladiat, salem, seminol, olympia, sept, cavit, nrl, steamship, coldest, minneapoli, edmonton, turnpik, midwestern, anaheim, taipei, sesam, derri,
belfast, mersey, entrant,b eacon, stuttgart, salzburg, maui, fujian, ohl, ctv, jaipur, grang, savannah, irb, picket, motel, northumberland, callup, upland, rochest, pyongyang,
cnbc, fedex, kiel, roadblock, nagoya, luton, colombo, bungalow, gettysburg,horsedrawn, eastwest, loyola, staten, adriat, mariana, vauxhal, powerhous, potsdam, humboldt,
raceway, clemson, coimbator, embank, waterfront, caen, camden, fremantl, hillsid, bere, caledonia, ashram, sarajevo, upn, goalscor, indycar, burnley, bergen, bronx, astro,
wildcat, repertori, shipyard, doncast, barnsley, hawkey, wakefield, defenceman, aaa, swansea, strikeout, wb, andalusia, hq, début, fulham, annapoli, storey, mma, sichuan,
fjord, firstround, knoxvil, kickbox, pagoda, heineken, baden, basel, longdist, promenad, dunde, panhandl, azor, captainci, dortmund, panchayat, exet, siriu, waterford,
perth, condor, croydon, pontiac, nanj, gothenburg, regularseason, mekong, sumo, byu, nit, kraków, auburn, islet, ajax, eureka, newark, lisbon, peke, watford, simulcast,
zion, himalaya, oiler, barnet, steamboat, nassau, bash, sunda, mainz, godavari, armori, hampton, fai, caledonian, nyc, rté, dor, wilmington, soho, hobart, calgari, aspen,
sorti, westchest, coeduc, doha, shutout, galile, alsac, windsor, albani, olympiaco, bordeaux, bsc, oncampu, highestgross, lucknow, airbas, westfield, rotterdam, trolley,
transvaal, vanderbilt, regroup, nyu, spruce, jakarta, landfal, superson, oricon, allegheni, disus, saracen, skylin, nautic, westbound, wharf, nl, casablanca, mesa, fiesta,
antarctica, hammersmith, louisvil, pacer, wadi, fuji, twentysix, pba, judo, quezon, freshmen, amphitheatr, sevilla, albuquerqu, halfhour, stepp, gloucestershir, avalanch,
southampton, raffl, panathinaiko, badminton, dhaka, novi, aegean, lièg, collieri, cfl, dinamo, barclay, cheshir, bali, colchest, wessex, akron, crosscountri, toledo, blackpool,
parkland, juventu, augsburg, massif, calcutta, allaround, macau, parma, midatlant, caspian, postworld, eindhoven, hanov, pune, leyt, cayman, wba, marriott, fl, airlift,
peripheri, merseysid, meridian, papua, halifax, blitz, cbd, seaport, glendal, kota, cyclist, stalingrad, kingston, malibu, ballpark, outfield, purposebuilt, leipzig, worcest,
expo, lufthansa, eri, guantanamo, euphrat, fir, oriol, golfer, copenhagen, barg, amherst, uptown, bison, carpathian, aba, dartmouth, gloucest, peterborough, eastbound,
aero, vodafon, voivodeship, leinster, sprawl, smoki, loch, steamer, peninsular, sinojapanes, nave, dunk, caldera, palisad, concordia, secondlargest, diamondback, prep,
collingwood, ghat, skier, gala, fiba, loir, wellington, kyoto, sixday, bermuda, golan, valencia, cumberland, southernmost, blackhawk, aerodrom, walkway, fremont, danzig,
streetcar, tacoma, midseason, highris, spree, scarborough, oclock, omaha, highschool, timor, allireland, albion, zee, lafayett, matricul, bayer, elgin, wrexham, piccadilli,
aqueduct, caraca, homecom, tasmania, caravan, bournemouth, sill, stoppag, dweller, yeshiva, transcontinent, yangtz, timeslot, unc, distilleri, sellout, weeknight, midsumm,
avro, poli, whitehal, catchment, sk, cavali, saskatchewan, worcestershir, electrifi, suez, warwickshir, oslo, piedmont, middlesbrough, hilli, belmont, watersh, manitoba,
gotham, shetland, antil, selangor, rodeo, hilltop, sabr, bucharest, bethlehem, stockholm, midtown, fb, tundra, aberdeen, severn, potomac, churchyard, pomerania, kany,
concours, oceania, norwood, comcast, wildcard, alta, ballroom, pennant, stamford, grandstand, subdistrict, realign, hove, sevil, dockyard, clipper, overtook, firstyear,
norfolk, hostel, chatham, pdc, lancast, sioux, dorm, ovat, daytona, semifinalist, baylor, ave, odessa, bastion, bandar, nowdefunct, fairfield, condominium, uninhabit,
avalon, johannesburg, faro, euroleagu, benfica, beirut, guam, buckinghamshir, lazio, singleseason, shandong, alaskan, cod, portsmouth, phra, millwal, concord, nairobi,
buccan, napoli, utrecht, boca, threetim, moat, jaffa, wbc, bookstor, pretoria, fairfax, eton, monorail, riversid, aurora, uppsala, jutland, spokan, bukit, grenadi, shenzhen,
kimberley,lo ckout,greenfield, platt, greyhound, ipl, himalayan, upscal, skyscrap, shrewsburi, wineri, shorelin, slough, monza, bluff, saigon, helsinki, grassi, warmup, pyrene,
unincorpor, arlington, feb, düsseldorf, charleston, overland, waterloo, corinthian, stockton, twentyfour, mannheim, nohitt, penultim, madurai, algier, handbal, borussia,
soldout, bobcat, canari, cork, yosemit, mare, coliseum, equatori, snooker, corinth, lexington, bridgehead, yukon, anzac, nw, addi, stratford, zagreb, dover, danub, isthmu,
ravin, tiebreak, guernsey, chesterfield, quay, pregam, okinawa, inducte, cumbria, tuft, kandahar, winnipeg, portico, brandenburg, afb, verd, palo, chestnut, guildford, cska,
northampton, wesleyan, lookout, argo, downhil, causeway, lans, natal, undisput, stockport, elm, porto, bonn, naia, concacaf, parramatta, coyot, bologna, strasbourg, knick,
fargo, italia, tyne, breaker, spitfir, gma, az, badger, falkland, tvb, budapest, harlem, enfield, beaufort, threeway, anglia, devon, dynamo, canadien, oasi, triplea, christchurch,
rada, tehsil, raaf, aachen, racecours, carmel, tbilisi, comanch, dayton, nottingham, sunris, rampart, bilbao, yellowston, agra, chichest, gaa, rochdal, marlborough, cougar,
snowi, asiapacif, arrondiss, champ, mcc, brighton, psv, purdu, oxfordshir, inelig, constructor, disneyland, maroon, sein, fenway, rutger, pembrok, fia, bromwich, carleton,
tko, roundabout, aisl, bedford, eurasia, tx, foothil, usaaf, staffordshir, redwood, riyadh, salisburi, transatlant, citadel, calder, derbyshir, darlington, huntington, deccan,
busiest, rerun, tianjin, ahl, kindergarten, mexicanamerican, cheerlead, essex, gmbh, outli, harrow, trafalgar, allstat, amtrak, charlton, woodstock, elb, pasadena, pullman,
amman, yunnan, professorship, scoreless, courthous, marquett, burlington, everest, hertfordshir, toulous, munster, ecoregion, kensington, ymca, sv, jfk, nant, bye, marlin,
andean, cdp, bloomfield, malmö, havana, allahabad, confluenc, málaga, tulsa, nsw, escarp, buena, sussex, circumnavig, reno, tramway, ganga, bakeri, topten, stamped,
chittagong, montevideo, agglomer, standout, hartford, buckey, coventri, crossroad, clubhous, bangkok, westwood, idaho, surat, appalachian, lima, multipurpos, firstev,
leiden, southbound, middlesex, nouveau, bari, glee, picturesqu, siemen, phnom, waiv, tuni, gymnasium, oldham, shire, kremlin, australasia, lago, westernmost, syracus,
hanoi, cebu, lincolnshir, tyrol, paddington, tee, mercia, whaler, welterweight, northsouth, rooster, hc, trenton, penitentiari, lausann, longhorn, huddersfield, ahmedabad,
tuscani, guadalcan, slum, zoolog, marseil, cafeteria, orkney, wnba, mainlin, lighthous, kilkenni, roanok, osaka, sidewalk, lancashir, macquari, ashor, rerout, mindanao,
feyenoord, hurl, homestead, brunswick, trafford, sahara, alto, württemberg, raze, equalis, puma, fife, ecw, plymouth, seneca, mohawk, huron, porch, oneday, stagecoach,
govt, matchup, scrimmag,
V5=vain, misconduct, glider, practis, solitari, traitor, vengeanc, vanish, pertain, safeguard, remembr, subdu, axiom, salut, nobodi, bump, judgement, gentl, deaf, restart,
runaway, eighti, smuggl, heroic, sneak, trooper, timelin, smile, bulli, exemplifi, allud, sheer, bb, thiev, breeder, immers, bp, veto, noon, sympathi, realism, lifelong, phantom,
thor, cosmolog, karat, sermon, tailor, hesit, anecdot, ch, wive, rational, liken, defi, straightforward, deadli, reincarn, perpetr, risen, creditor, curb, achil, imper, multimedia,
token, vandal, penni, sb, tremend, dilemma, rhyme, terribl, spoof, bail, loot, forgiv, redempt, arbitr, lan, banish, spartan, repuls, conjectur, genealog, pill, blur, semin,
donkey, ci, awaken, regret, veil, taxpay, landslid, ark, bizarr, newest, ge, exagger, pentagon, pal, hr, groom, inhous, masculin, atroc, wholesal, midst, entireti, distract,
unexpectedli, purg, embarrass, graffiti, isnt, homag, undertook, revert, onscreen, rand, prop, acquit, stab, mess, summon, slayer, pre, nonexist, jade, contempt, dissemin,
twothird, reallif, cannib, racist, bred, clown, probat, caution, wick, brilliant, ll, nude, evok, firework, roam, inact, messeng, gotten, weird, rm, revisit, theyr, somebodi,
incap, hallmark, unwil, tori, ineffect, homeless, calm, pretend, bias, diversifi, extant, problemat, unpreced, ts, wouldnt, motto, allot, transcend, pragmat, sincer, remodel,
spectacular, guilt, funni, cheat, obey, undercov, banknot, flock, notifi, quotat,somehow, vener, repent, colloqui, mayb, auster, mega, anymor, grate, dalek, nowaday, sharpli,
trump, annihil, epa, amin, ware, unto, novelti, melod, atari, wield, invis, courag, saloon, atheist, onethird, decim, surprisingli, disastr, starv, swear, eager, countdown,
inadvert, outrag, belov, cosmo, disclosur, ki, dividend, highprofil, agit, disapprov, promptli, talmud, await, halo, proxi, justif, inmat, onair, astonish, deduc, drunk, cp,
parcel, thrill, reassign, dull, satan, feloni, plausibl, mural, ec, eccentr, espionag, sensibl, transgend, yell, twentyf, yahoo, rever, unfair, unus, wherein, connot, eras, deserv,
erad, storytel, coven, couldnt, sabotag, optimist, taboo, restless, ve, ni, elderli, vow, shame, supposedli, intuit, countless, jealou, sunshin, inappropri, nurseri, sorri, rage,
hamper, slaughter, pharaoh, samurai, psychic, comed, hadith, plaintiff, katrina, clue, fcc, scarc, firstli, paus, thou, excerpt, incompet, unaccept, demolit, thoroughli, hijack,
tomorrow, annoy, bisexu, marketplac, faint, misunderstand, burger, .. ., profici, erot, spotlight, gunner, indulg, cr, transcrib, secreci, stir, avatar, dear, humbl, signifi,
psychedel, salon, feat, lure, secretli, insult, curios, newborn, infin, peculiar, worthi, fx, suspici, rhythmic, sexi, wherev, ascent, ka, startup, endeavour, excus, flashback,
viciou, foe, propheci, chronolog, mistaken, gag, sacrif, fundrais, hello, injustic, popularli, treason, blunt, revamp, immort, stakehold, bloodi, clever, awkward, liabl, puriti,
whisper, compliment, £m, reconcili, imperson, heighten, unpopular, pascal, zeu, oversaw, supersed, ridicul, drone, plc, skateboard, abid, mock, werent, denial, yoga,famous,
seldom, socrat, slash, est, rc, disposit, em, gi, motown, ugli, stylist, disqualifi, weaker, charm, ari, metaphys, makeup, messiah, dell, es, wont, allah, commend, societ, triad,
endless, metacrit, thwart, rpg, au, thief, ra, serpent, decisionmak, disintegr, mitsubishi, whoever, viewpoint, receipt, seduc, credenti, merci, cruel, subsid, indict, afflict,
ambit, altogeth, occult, frighten, roleplay, ah, scrutini, spars, va, mistakenli, highlevel, ai, adulthood, workforc, bust, fanci, contractu, hadnt, arguabl, overturn, skater,
exposit, broaden, foremost, ambiti, arent, dec, incorrectli, gentlemen, interf, banquet, scarlet, porn, abduct, amid, imperfect, batch, slim, acronym, obscen, bunch, soror,
perish, plato, notori, everybodi, blackmail, fist, brothel, satisfact, torah, perfectli, ae, meantim, si, chariot, nc, prehistor, buffi, landlord, forthcom, amc, persona, insan,
bounti, deceiv, fiat, verdict, various, amidst, evad, tran, pleasant, reappear, lucki, reintroduc, prelud, archetyp, discern, enumer, marg, hercul, rework, afraid, lefthand,
loser, quietli, menac, fresco, lgbt, crippl, unnam, criterion, coma, stupid, oneself, bestow, interrog, youv, strive, gestur, weve, beneficiari, proud, holocaust, paradox, sporad,
amass, neglig, postmodern, arrog, ransom, hinder, hack, devote, loud, marit, persuas, nice, inflict, bedroom, noteworthi, daddi, terrifi, compass, hungri, decept, femin,
impend, adversari, suspicion, litig, destini, unreli, scare, harder, medicar, evidenc, chiefli, freak, lunch, repel, robberi, stipul, fragil, entail, amnesti, fraudul, ruthless, alik,
hatr, mod, reclaim, freud, mourn, vigil, asleep, shout, reconcil, babylon, gangster, kant, importantli, rumour, therapist, demis, orphan, riski, despair, bt, shi, drunken, sad,
jealousi, bowler, python, geniu, obvious, daylight, password, aforement, sub, indefinit, standalon, resent, aristotl, undergon, casual, handicap, implicit, injunct, urgent,
goodby, simplic, keeper, attest, intimid, nonsens, pend, unconstitut, wipe, quarrel, rfc, guess, swimmer, jehovah, outlook, millionair, useless, tolkien, righthand, vintag,
negat, hisher, disgust, tsunami, umpir, skeptic, provoc, bargain, mythic, rewrit, pivot, relic, unconsci, eyewit, rid, backdrop, atlanti, intensifi, dislik, clarifi, apocalyps,
humili, dispens, interpol, noir, sacrament, erron, turnov, sp, masterpiec, aggrav, seventi, prematur, halloween, cremat, bearer, incit, parol, versatil, akin, retrospect, opium,
climax, discrep, quota, willing, broker, vest, enrag, fond, haunt, samsung, vagu, magnific, psalm, grasp, digniti, ng, dracula, wane, sympathet, overli, nowher, ostens, allus,
slang, ufo, homicid, formid, boycott, instantli, intim, xmen, karma, comrad, unawar, conscienc, slap, detriment, offspr, energet, astrolog, anyway, havent, asylum, repay,
lucr, purana, obsolet, hardship, setback, incident, certainti, voluntarili, bold, unlaw, ar, plead, reminisc, resurg, memorandum, hereditari, deceas, hare, customari, myself,
nineteen, confisc, mom, ok, cocacola, mileston, vivid, overnight, isi, fabl, tragic, obedi, autobiograph, magician, rude, dad, circumcis, narcot, min, leverag, unfold, foul, hoax,
decent, preexist, ought, honest, obsess, wise, dissent, invok, drown, inscrib, postul, ex, vigor, irrelev, infiltr, ninja, nonstop, discredit, famin, brutal, imaginari, sting, lender,
characteris, fascin, memo, adolesc, temptat, plea, ultra, luck, cs, underway, bo dyguard, ourselv, onstag, bankrupt, ir, muse, defer, breakup, salvag, shooter, mao, relianc,
refut, incarcer, salvat, lengthi, firmli, envis, rogu, redeem, slogan, shatter, ea, reluct, everywher, eleg, underworld, theoriz, succumb, misus, immin, celesti, blockbust,
ancestr, notorieti, damn, bandit, re, utilis, learner, covert, nineti, costli, sage, undisclos, outright, coincident, cruelti, hedg, usher, hed, dharma, mighti, cohes, mankind,
allegi, unfinish, mutant, gon, anybodi, rode, spous, yearli, pawn, gentleman, rubi, retali, furnish, caller, gypsi, sorrow, panic, absurd, finest, nonviol, inaccur, secondli,
unavail, seeker, unlock, confidenti, stranger, doom, hound, forbid, wan, farewel, reel, instinct, mislead, outspoken, invalid, purport, ape, hardli, summar, yourself, shaman,
stanza, oracl, sake, dare, heterosexu, lament, tempo, forcibl, tu, vegetarian, phonet, buzz, uncov, incred, outlaw, furiou, maya, musket, ego, idl, iri, pornographi, dismay,
sd, delight, curiou, unlimit, deed, baptism, crook, ti, indiffer, censor, archaic, gossip, relaunch, heavenli, empow, corps, ptolemi, poetic, wellb, ascrib, abruptli, codenam,
restrain, inquir, randomli, cobra, hint, intoler, shelv, cop, batsman, keen,handsom, stuff, constel, restraint, bribe, fo ol, mob, ador, marijuana, commodor, prejudic, endeavor,
wrath, reinstat, longstand, emphasis, prestig, tb, enthusiasm, miracl, horribl, grief, bother, stun, lent, merced, coexist, feminin, stole, corpu, refrain, illus, oh, contempl,
skip, umbrella, fuck, hunger, authoris, pardon, incompat, intrigu, libel, gym, forgotten, soar, audi, verg, crise, supernatur, porsch, par, aint, pray, humour, suzuki, utter,
paragraph, deter, censorship, firefight, disregard, prefac, nightmar, exodu, unexpect, ko, tempera, uncomfort, pointer, forget, conspir, amen, furi, unhappi, evict, beg, ya,
swap, coffin, thorough, recogniz, solicit, overhaul, hobbi, aw,sp oil, reiter, infam, stunt,ta, prank, disguis,
V6=cartel, guerilla, huntergather, nazism, sparta, stronghold, baptiz, cleans, sloven,planner, detaine, quo, baháí, jat, kurdish, unpaid, parthian, swede, orphanag, authoritarian,
colonis, gestapo, extermin, moravian, militar, abolit, sicilian, unicef, annal, assyria, nepali, signatori, hama, safavid, royalist, academia, reorganis, cornish, islamabad, sabah,
acced, seleucid, chaplain, xiongnu, alqaeda, yoruba, anatolia, demarc, grassroot, vehement, nagasaki, expeditionari, abolitionist, cree, epiru, aleppo, nkvd, indoaryan,
plunder, kazakh, hispania, nationalis, détat, vicar, upbring, breakaway, ghanaian, mesop otamian, angloamerican, suffrag, anarch, bicamer, bourgeoi, goodwil, aristocraci,
gentri, rajasthan, iroquoi, priori, orthodoxi, auditor, paratroop, kgb, malacca, amerindian, galician, extremist, charismat, milošević, joseon, automak, rwandan, paraguay,
curia, huguenot, xinjiang, oriya,uzb ekistan, wehrmacht,chechnya, bantu,chairperson, celt, offsho ot, montenegrin,dissid, zulu, hezb ollah, mesoamerican, unrest, samaritan,
maori, lakota, fianna, entrench, druze, eucharist, scandinavia, privatis, parliamentarian, frisian, prerog, gupta, mayan, uncondit, johor, hannib, austriahungari, gibraltar,
freedmen, downfal, extradit, insular, codifi, arafat, protestant, burgundian, tasmanian, rhineland, carthag, transylvania, silla, caretak, ankara, franciscan, brethren,
herodotu, palatin, guiana, truce, yiddish, confucian, mestizo, francophon, intergovernment, hegemoni, enshrin, senatori, somali, naacp, hutu, nizam, eurasian, liberia,
peerag, oman, imf, antioch, politburo, staunch, kashmiri, iit, espous, peso, akkadian, arian, uzbek, bedouin, oust, algerian, banu, aeronaut, estonian, gnostic, junta, outnumb,
jiujitsu, seced, auschwitz, frankish, germanspeak, guyana, namibia, conven, esoter, puritan, pinyin, unitarian, briton, overthrew, bureaucrat, dacia, basra, crackdown,
precolumbian, overrul, azerbaijani, hittit, heraldri, meiji, modernis, priesthood, mutini, kuomintang, launder, outreach, seljuk, aryan, romani, transliter, meteorolog,
caucu, assad, heret, tantric, jurisprud, rescind, sumerian, ascens, bohemian, claimant, barbarian, athenian, pretext, apartheid, secess, turmoil, sinhales, multiparti, privi,
hakka, strife, achaemenid, weimar, lahor, bolshevik, gallic, benin, appeas, elizabethan, acadian, canaan, gentil, sizeabl, habsburg, seneg, creol, defam, carthaginian,
moldavia, burgh, unilater, ombudsman, sukarno, maltes, recaptur, bohemia, burmes, masjid, ngo, regenc, capitul, edo, peacekeep, cadr, singaporean, pali, demot, tunisian,
preemin, edict, monast, manchu, czechoslovak, catalonia, diplomaci, englishsp eak, rhodesia, ensign, spaniard, knesset, defianc, intermarriag, schism, unoppos, subjug,
hagu, andalusian, policemen, nordic, subcommitte, goth, barbado, belarusian, fledgl, consular, majesti, seafar, croat, coinag, noncommiss, ib eria, malawi, khmer, sharia,
haryana, servicemen, ceylon, archeolog, eunuch, selfgovern, antislaveri, subcontin, counterterror, bihar, artisan, riviera, plight, ceasefir, catalan, roug, malaya, hardlin,
devout, zen, ultimatum, mistreat, odisha, dragoon, airmen, nobleman, turkmenistan, solidar, bylaw, slovenia, kabul, dalit, paraguayan, warlord, inuit, darfur, tort, imposit,
zoroastrian, rector, bosnian, serf, rupe, seventhday,instig, bavaria, gop, expatri, rumbl, entrust, iconographi, niger, dynast, jacobit, excommun, taoist, unitari, babylonian,
benevol, ru, lawmak, latvian, gaddafi, notari, paleolith, censur, khyber, historiographi, austrohungarian, marxism, hokkien, tyranni, postsecondari, populist, fishermen,
bolster, bhutan, nuremberg, sarawak,constitution, liturg, reformist, subsaharan, cornerston, individualist, sui, semit, travancor, mamluk, somalia, fiji, taiwanes, byzantium,
baroni, separatist, zambia, disarm, ravag, zanzibar, kazakhstan, hillari, multin, civilis, devolv, chalukya, leningrad, roc, seamen, mongolian, nonchristian, heresi, kali,
sindh, lowincom, hajj, gubernatori, nicaragua, vilniu, quaker, grievanc, uyghur, swahili, assent, mujahideen, postcoloni, quell, prefect, wealthiest, patrician, mon, belaru,
igbo, suharto, tripartit, grenada, incurs, spearhead, governorgener, saxoni, hellen, nco, hussar, silesia, overrun,charlemagn, sindhi, botswana, lombard, revit, baton, gurkha,
bipartisan, nonmuslim, sabha, romantic, ssr, viceroy, gregorian, kiev, overseen, mahayana, vassal, pacifist, slav,traditionalist, shogun, manchuria, adventist, togo, ministeri,
byelect, thebe, oecd, hun, uboat, leftist, overthrown, highrank, envoy, augustinian, macedon, primaci, taoism, javanes, breton, ioc, forerunn, thrace, loyalist, sworn, jihad,
orthographi, planter, prehistori, dictatorship, ugandan, mali, damascu, frenchspeak, selfdefens, dogma, emissari, caucasian, wwii, m¯aori, calvinist, gloriou, reichstag, angola,
bourbon, cypriot, ismaili, guatemalan, basqu, distrust, syncret, tutsi, manageri, pashtun, bangladeshi, imperialist, admiralti, judah, bjp, anglo, pacifi, indoeurop ean, sami,
fief, justinian, mauritiu, bishopr, prc, kurd, moro, geologist, frontlin, legat, relinquish, prewar, aceh, bosniak, tajikistan, retak, bonapart, oblast, sufi, supervisori, lobbyist,
rhodesian, jain, paramilitari, outpost, impoverish, craftsmen, aztec, malayan, anc, rightw, fascism, sikhism, janata, slovak, afrikaan, navajo, choctaw, upperclass, mecca,
landown, vichi, populac, gaul, saladin, bern, commanderinchief, mesoamerica, reactionari, despot, environmentalist, carolingian, jordanian, antitrust,dep os, sassanid, fugit,
pontif, archdioces, resettl, novgorod, spd, boycot, polynesia, flander, romanesqu, faa, appointe, fide, forbad, federalist, armada, ashkenazi, galicia, agrarian, uphold, siames,
antiwar, vernacular, cloister, vanguard, enclav, industrialist, cognat, uae, hasid, bourgeoisi, bahrain, guatemala, baath, venezuelan, contra, nonpartisan, zionist, sixyear,
scientolog, decentr,mep, sudanes, promulg, constabulari, disobedi, p ompey,gunboat, moorish, businessmen, gujarati, encroach, fundamentalist, madagascar, shang, khanat,
expuls, friar, eritrea, traine, mesopotamia, shinto, selfdetermin, mysor, papaci, balochistan, emancip, corsica, benedictin, crimea, tatar, karachi, anglosaxon, wto, insurrect,
haitian, goguryeo, uttarakhand, skirmish, nicaraguan, brunei, undocu, yemeni, syriac, chechen, latino, sizabl, wight,kyrgyzstan, legitimaci, mandarin, cosp onsor, solicitor,
muster, piou, eurozon, secretarygener, tokugawa, nara, sephard, dravidian, rebelli, mozambiqu, inquisit, armistic, hmong, hellenist, mauritania, horticultur, florentin,
cambodian, diocesan, dday, turnout, nascent, tipu, discont,b oer, militarili, enslav, anatolian, naga, bolivia, maoist, bureaucraci, ascet, azad, stateown, aden, turkic, judea,
venetian, flemish, surveyor, impeach, rajput, illyrian, moroccan, cham, vedic, repatri, barrist, moldova, judiciari, consecr, conservat, islamist, diaspora, latvia, jharkhand,
renounc, medina, baku, sectarian, plebiscit, dal, politi, magnat, maldiv, flotilla, reunif, nepales, phoenician, abdic, pentecost, adjut, etruscan, mubarak, bolivian, candidaci,
cochair, embargo, reaffirm, daytoday, anticommunist, poorest, jammu, pla, upheav, libyan, chairmanship, indentur, kmt, repar, checkpoint, siam, ghetto, cochin, goa,
israelit, fatah, ratif, workingclass, bnp, maratha, overwhelmingli, counterinsurg, airspac, moot, ecuadorian, castilian, liechtenstein, crimean, repudi, mobilis, supremaci,
mercantil, disciplinari, revok, unionist, presumpt, oversight, overlord,bavarian, pillag, jainism, congoles, influx, assyrian, demographi, autocrat, plo, coptic, whig, strategist,
mla, affluent, peshawar, surinam, inca, shiit, catech,brahmin, sardinia, osteopath, orissa, ndp, tanzania, writ, staf, magyar, tenet, cantones, chola, polynesian, riga, moravia,
hondura, philanthropi, ecumen, nasser, multicultur, kurdistan, gdr, scythian, sikkim, clandestin, leftw,rwanda, tunisia, kenyan, heartland, musharraf, multilater, peasantri,
fluent, beliz, idf, pantheon, jamaican, ltte, messian, totalitarian, cameroon, middleclass, genoes, labrador, hiroshima, nongovernment, statewid, thracian, statesman, levant,
visigoth, samoan, usurp, qin, desecr, crete, ionian, foothold, ardent, steward, umayyad, pogrom, zionism, reconstitut, cityst, ashoka, jurist, cossack,mausoleum, dio cletian,
assam, mennonit, auspic, cpc, richest, loanword, scandinavian, dalmatia, dacian, berber, battlecruis, peruvian, governorship, chieftain, iberian, tagalog, secretariat,
shipbuild, gazett, abbasid, arama, malabar, myanmar, neolith, liaison, erstwhil, slovenian, pursuant, siberia, statehood, dignitari, indochina, circa, nguyen, registrar, toppl,
synod, reassert, interwar, unarm, lordship, sympath,
V7=tyre, pigeon, flora, mous, upright, chicken, groov, ant, tap, coconut, hind, knive, rig, curl, hat, corn, tooth, marbl, underwat, cooki, foil, lit, bud, wrap, beneath, snap, feather,
cosmic, potato, pie, explod, timber, wolv, burrow, fuse, bread, eject, stuf, rack, plenti, gray, p epper, helmet, blown, deton, shade, calib, grill, owl, worm, straw, granit, badg,
collid, deer, chrome, lace, pile, weld, outfit, tile, ribbon, chocol, miniatur, axe, tilt, beard, whip, monkey, haul, candi, sabbath, basement, bamboo, spike, pod, crest, cam,
candl, nut, envelop, juvenil, bark, mint, brass, limeston, fog, shirt, powder, chees, cherri, skeleton, saddl, dot, garment, bolt, enclosur, dungeon, sandston, wrist, furnitur,
salad, blond, axl, adorn, whistl, lotu, bake, duck, dune, pork, velvet, collar, canva, comb, peanut, lamb, weav, rip, fri, enclos, squirrel, locker, stuck, bracket, butter, click,
blast, triangl, exot, camel, cart, robe, clutch, engrav, spray, brick, pyramid, pale, brace, strand, cream, pink, torch, carpet, mast, grip, scratch, chick, knife, aluminum, cab,
hollow, underneath, darker, thrown, wasp, vent, jupit, rib, spider, bitter, bite, toss, stripe, dash, slice, flesh, jewel, larva, dwarf, omega, diver, pickup, leather, crab, lean,
liveri, toe, rabbit, fauna, pierc, twist, lip, cement, flip, bucket, pencil, mat, shed, honey, dip, tongu, turret, halfway, burst, punch, deflect, coral, forg, brew, bead, stalk,
31
dairi, swan, crocodil, motif, wash, comet, pizza, rainbow, wore, aquat, backward, nail, mosaic, jar, emblem, roast, orchard, piston, pedal, chi, crane, magnum, belli, violet,
ceram, wax, arrow, strap, blossom, cow, bumper, beef, shake, flavour, maiden, shotgun, lever, sleev, goat, reef, herd, preciou, debri, alley, screw, bug, pant, tattoo, bent,
mortar, shine, vault, pan, ginger, bee, beetl, mule, fuselag, hung, lemon, onion, kitchen, pad, lizard, liquor, rotten, coaster, jewelri, dial, crescent, trim, claw, recip, banana,
pot, curtain, smell, crawl, hood, replica, fade, scrap, plaqu, blank, bloom, pulp, flour, fed, mud, butterfli, dust, wheat, purpl, swallow, peel, wool, willow, mapl, poni, chin,
spear, tan, noodl, gravel, crust, soup, cockpit, sandwich, scroll, pig, dye, sheep, eclips, botan, leap, fin, lantern, tini, textil, thread, butt, dinosaur, dig, basket, oval, trouser,
header, mushroom, lightweight,lightn, illumin, fender, nickel, void, sock, ore, roller, chop, discard, ink, sauc, turtl, swing, stain, atp, frozen, rectangular, shave,diagon, grab,
ivori, glove, pour, herb, ornament, shower, insignia, sox, slip, trout, elbow, blanket, exterior, ceil, cigarett, asteroid, brush, gem, venom, pigment, shark, ladder, drift, juic,
stair, lime, dump, shaft, horizon, mold, dug, glow, spiral, sour, torn, skate, cheek, pea, silk, leaf, spun, sink, capsul, rim, flush, floyd, dessert, thumb, jacket, heel, accessori,
cane, bounc, potteri, dirt, leopard, grape, rope, fur, launcher, microphon, stitch, choke, liner, mantl, bean, moth, tomato, nake, cage, cone, tin, frog, eleph, hammer, grenad,
crimson, reptil, thunder, amber, bathroom, chili, tshirt, tight, hut, eaten, needl, batter, cube, har, warp, cutter, throat, hatch, spice,vine, toilet, burnt, plug, hook, salmon,
retract, cake, bubbl, skirt, balloon,
V8=devi, lester, wrestlemania, lili, calvin, stephani, freddi, akbar, sandra, agn, kathi, humphrey,philipp, mickey,do c, liu, vishnu,ronald, gil, stevi, patern, dant, jami, ned, rudolf,
anton, piu, melissa, rao, col, winston, louis, clive, bonni, saddam, andr, gu, seth, wang, aaron, liam, valentin, edmund, patti, isaac, paulin, wu, brotherinlaw, maggi, judi,
sue, lionel, brad, doug, shannon, darren, clarenc, randi, tina, jeremi, kyle, ronni, niec, marvin, hermann, joel, cum, stan, trevor, geoffrey, hassan, betti, nina, madhya,jeffrey,
gustav, mahatma, pierr, horac, nicol, wong, trent,sharon, lynn, zhou, holli, ludwig, hulk, wolfgang, eleanor, dee, swami, benni, emma, rachel, miranda, tel, leigh, eugen, elton,
bo, mo, basil, mohammad, xavier, yu, yi, rama, hannah, sherlock, clement, timothi, halfbroth, bryan, bori, marilyn, erik, edwin, wei, shirley, grandson, nephew, kirk, indira,
allison, anni, friedrich, shane, hal, shri, rupert, sidney, kati, chiang, mauric, archi, luci, isabel, travi, paula, helen, omar, alexandra, ernest, tai, karen, buddi, maid, herbert,
johann, isabella, qb, sy, gregori, marc, bernard, marion, barack, jennif, andrea, glenn, bon, barney, butcher, katherin, leonard, jenni, vanessa, laurel, dalai, prof, ahmad,
kitti, jess, cao, vladimir, jacki, ahm, lesli, marcu, daisi, sophia, lyndon, patricia, lil, dana, jessica, boyfriend, amanda, marti, felix, alfonso, jo, christin, constantin, pratt,
laurenc, sonni, wilhelm, debbi, shawn, chen, joan, emili, sheikh, jerom, perci, ethan, sen, conan, mama, edgar, reverend, kurt, befriend, dorothi, diana, fu, dale, vic, lauren,
ashley, kicker, bruno, mose, fritz, mick, dwight, sara, alia, noel, augustin, uttar, dexter, ernst, cowritten, heather, byron, franki, robbi, josh, dudley, guru, mikhail, theo dor,
julia, matern, kumar, heinrich, brett, malcolm, teresa, abdul, clair, vernon, christina, bing, judith, brendan, granddaught, randolph, ellen, jin, glen, benedict, te, petersburg,
loi, janet, sebastian, laura, baba, claud, raj, donna, clyde, raymond, liz, wendi, florenc, nichola, lindsay, rev, gerald, woo di, ho, lok, magnu, juliu, ivan, jule, leopold, mel,
rené, tottenham, romeo, madam, teddi, abdullah, salli, grandmoth, angela, trinidad, cal, carol, cyru, fr, allan, vinc, moham, françoi, empress, jacqu, brittani, notr, stella,
noah, jake, kenni, congressman, maj, nathan, martha, milton, consort, herman, vincent, joey,seymour, walli, und, nigel, molli, eva, kapoor, nicola, abba, peggi, gerri, zhang,
adrian, lt, ibrahim, cecil, mozart, jill, kenneth, colin, rodney, julian, hey, sid, conrad, olivia, krishna, nanci, ricki, brandon, imam, neal, raja, adolf, derek, goldman, joshua,
lou, alma, linda, antoni, eli, otto, augustu, gloria, lanc, leonardo, sophi, rita, franz, beth, roland, kuala, ruth, dian, chad, fanni, rex, carolin, andhra, hank, mistress, rebecca,
lin, natali, bart, traci, frontman, tara, catherin, geoff, née, ferdinand, helena, maharaja, elvi, yang, aunt,
V9=dealer, mortgag, agenda, forbidden, embodi, lab, propon, whenev, medit, compli, behav, credibl, deficit, organiz, gambl, durat, bilater, discrimin, heroin, advocaci, penal,
portfolio, scholarli, cheap, rhetor, overview, abort, uncertain, biblic, comprehens, uncommon, pursuit, sociolog, depriv, abstract, strictli, sentiment, perpetu, rehabilit,
inclus, proven, inspect, anonym, monetari, self, bigger, identif, pronunci, prevail, strict, pleasur, ordin, escal, sudden, incorrect, formul, implic, firearm, tender, justifi,
articul, dictat, judgment, abbrevi, relax, conjunct, liabil, terminolog, proposit, retriev, augment, shorten, overlap, weaken, traffick, lifestyl, statutori, imit, legitim,
contradict, reliev, curriculum, bia, monopoli, proce, deem, antisemit, specialti, ideolog, contrari, placement, habit, stanc, conceptu, restructur, albeit, hierarchi, voluntari,
specialis, loyalti, humanitarian, theft, copyright,etymolog, ambigu, discours, setup, immens, privaci, inconsist, classroom, metaphor, endur, methodolog, synonym, remedi,
authent, silenc, simplifi, taxat, intact, alarm, procur, conspiraci, disclos, feasibl, steadili, vital, incomplet, wholli, verifi, workplac, plagu, maxim, appreci, norm, reward,
infring, constraint, dealt, concurr, tough, compel, criteria, assumpt, homosexu, imageri, fratern, critiqu, manifest, omit, endang, racism, inabl, predomin, lineag, adher,
anticip, humor, complianc, vocabulari, quran, complement, expenditur, fulfil, correctli, diminish, strongest, harsh, broadli, feminist, peer, profound, mediat, nonetheless,
puzzl, eas, accordingli, modest, explicit, openli, flaw, partit, sophist, artifact, cope, practition, unrel, adequ, donor, claus, seemingli, forecast, spite, disagre, irregular,
deepli, inher, hypothesi, largescal, chemistri, crucial, confin, fiscal, guidanc, aspir, obscur, realist, convey, frustrat, absent, breach, outlin, buyer, offenc, disagr, wisdom,
postal, submiss, conform, royalti, compromis, extraordinari, obviou, merit, broader, healthi, properli, overcom, stereotyp,prioriti, systemat, affirm, quiet, chao, encompass,
undertaken, capitalist, logist, aesthet, analyz, rigor, charit, poorli, scenario, healthcar, adject, neglect, provok, repress, astronom, segreg, oppress, verb, essenc, racial,
guidelin, explicitli, deterior, fraud, enlarg, distant, collector, deduct, pace, buddha, steadi, autonomi, government, disadvantag, burden, alert, fare, offend, exempt,
compulsori, wors, tendenc, trait, enorm, enlighten, noun, discourag, wartim, advent, singular, fault, accent, astronomi, everyday, mandatori, freeli, visa, insight, genuin,
harass, assur, harmoni, overwhelm, primit, scope, obstacl, heal, premis, regardless, underw, categor, aros, unclear, verbal, boost, lend, percept, non, plural, wherebi,
conscious, likewis, expertis, geolog, tenant, inevit, uniti, sphere, anthropolog, trademark, necess, inventori, incent, undertak, regulatori, assimil, virtu, conceal, moreov,
prescrib, profess, consciou, exam, forens, registri, iso, pharmaceut, clone, embrac, devis, consensu, undermin,
V10 =sol, roo, à, libertador, rivera, barrio, dauphin, carmen, flore, québec, revu, javier, alessandro, roi, iglesia, lope, félix, rodríguez, alfredo, gran, avant, je, león, pérez, banda,
français, provenc, ain, rancho, willem, pont, argentinian, sarkozi, peña, oro, ángel, khomeini, marqu, sul, allend, salazar, davao, silvio, chico, mort, delgado, claudio, blanc,
antoin, maestro, niño, salina, cid, ole, international, brasil, universidad, córdoba, enrico, navarro, navarr, varga, val, tito, guadalup, banco, mariano, jaim, vila, paolo, côte,
benito, guadalajara, nord, garibaldi, bam, vittorio, sergio, castillo, qaeda, école, bravo, jardin, witt, hustl, moreno, molina, catalina, rey, comt, batista, serra, rochel, parc,
libr, julio, gael, ferrer, bernardo, mend, dio, ortiz, sant, veracruz, américa, estadio, historia, luna, ernesto, vill, eduardo, campo, angelo, españa, pietro, cerro, teatro, oaxaca,
laguna, carrera, emilio, vasco,ignacio, opu, p erón, haut, toro, toma, lombardi,hernández, terr, marcello, ricardo, laci, siena, gonzález, ruiz, deportivo, una, casa, coco, puebla,
pico, jong, rossi, estrada, chavez,juárez, tarantino, aux, santana, bella, dei, capon, gómez, fernández, loma, grup o, padr, raúl, nacion, dalí, vita, vizier, gonzaga, lobo, quentin,
ramo, roch, méxico, temp, ramirez, della, guerrero, paso, ivanov,blanco, alvarez, asturia, vin, mata, sánchez, mina, stefano, pueblo, khalifa, boi, laval, mal, räikkönen, ciudad,
gard, garcía, alonso, yankov, césar, zaragoza, château, sur, guillermo, domingo, nuevo, ramón, ronaldo, francesco, herrera, ou, sera, dia, martinez, mendoza, joaquin, cort,
tijuana, arroyo, ayatollah,giorgio, isla, montoya, leyland, aragon, yo, chevali, saba, fontain, sanchez, ferrara, martínez, bel, novo,castil, alejandro, piero, para, canto, aquino,
arturo, luigi, messina, pinto, marqui, ligu, gore, federico, romero, dino, mateo, gambino, stade, lair, scala, centro, quito, divoir, museo, guillaum, rodrigo, vida, telenovela,
salvador, rizal, nueva, mussolini, palazzo, alamo, por, mond, duchess, national, dom, división, maccabi, trujillo, santand, dolor, ateneo, borg, vicent, verdi,diablo, fray, amor,
rue, sonora, vie, fernand, palai, alba, bolívar, samba, aguinaldo, bahia, mayo,primera, femm, felip, hidalgo, cali, cabrera, corté, torino, jazeera, soto, coutur, joão, nort, que,
viva, tre, gallo, nadal, louvr, como, río, díaz, martín, monterrey, fernandez, paz, suárez, lac, greco, musé, massa, cesar, enriqu, rosario, société, renn, vall, ponc, giusepp,
lópez, fontana, chanel, conquistador, piazza, chávez, cristina, picasso, porta, croix, lux, saud, gonzal, académi, mora,
V11 =roadway, dine, luxuri, unveil, excav, travers, grove, fring, countrysid, sedan, subspeci, harbour, convoy, bend, ridg, trench, thrive, closur, builder, ambush, fortress, frigat,
java, voyag, meadow, renault, pipelin, tanker, att, pave, escort, coastlin, ski, leisur, strait, steep, highland, fountain, perimet, beaver, sm, aerospac, downstream, shelter,
scenic, junction, gorg, trunk, bunker, usaf, rebuilt, cedar, ferri, inland, portal, toll, pedestrian, northward, alpin, marsh, subdivid, loung, torpedo, tent, intersect, uss, detach,
expressway, pt, plaza, greenland, bangalor, sunk, hemispher, aboard, tractor, freestyl, terrain, raf, boom, cafe, ambul, antitank, lagoon, swept, sniper, café, wreck, ca, il, boe,
breweri, wilder, antiaircraft,interst, refineri, toyota, wetland, canyon,cascad, hm, refurbish, fork, armament, observatori, dwell, smallest, bs, demolish, pier, chennai, cruiser,
surg, motorway, taxi, waterway, racer, tram, nokia, zo o, aa, pavilion,volcano, lawn, tributari, paradis, palm, ramp, bypass, hike, vineyard, flew, inn, rebrand, harbor, baltic,
mk, thame, anywher, warehous, honda, shelf, nightclub, mig, hamlet, pub, fortif, winchest, oak, upstream, hangar, barrack, telecom, quarri, vista, nissan, refug, beverli,
pearl, groceri, somewher, crater, slope, mall, ny, arctic, consortium, ranch, distributor, ab, atla, widen, reconnaiss, rug, forestri, pillar, vicin, supermarket, sandi, redevelop,
fisheri, parkway, flank, pine, divert, outlet, overlook, cano, facad, mt, aquarium, eastward, monsoon, marina, corridor, cliff, hudson, atop, flown, rocki, ordnanc, depot, erod,
ballist, ag, offshor, auxiliari, capitol, encircl, tornado, parachut, swamp, buse, erect, chevrolet,rift, bike, waterfal, mansion, volkswagen, suburban, pa, sank, airbu, antarct,
dock, llc, nearest, glacier, runway, refuel, aerial, apach, po, airfield, neighbourho od, fortifi, maneuv, amalgam, gm, sunset, gateway, cf, panama, woodland, chrysler, lodg,
erupt, fenc, airplan, surf, plantat, estuari, boulevard, alp, carriag, warship, interchang, hub, amazon, casino, remnant, amphibi, lowland, mi, endem, nile, redesign, stapl,
jungl, rhine, prairi, stall, boast, dismantl, battleship, terminu, hawaiian, sanctuari, luftwaff, terrac, nh, altar, haven, courtyard,cottag, en, sub divis, rental,volcan, subway,
plateau, battlefield, fs, cater, adjoin, farther, sweep, freeway, reop en, platoon, typhoon, westward, tow, tallest, bombard, delawar, pond, manor, hamburg, wagon, shipment,
garag, cruis, flagship, wildlif, cabin, mound, spa, township,
V12 =physiolog, byte, degener, dental, insulin, dispar, radioact, nervou, enzym, varianc, aero dynam, lung, recurr, diagnosi, antibiot, virus, obstruct, collis, diagnos, patholog,
textur, fractur, infecti, surgic, implant, facial, mice, decay, inadequ, regener, vertebr, cognit, transplant,evolutionari, viabl, lesion, passiv, limb, thermodynam, socio econom,
arteri, pathogen, volatil, abdomin, irrit, insuffici, neural, gamma, neurolog, sensat, reflex, exponenti, tract, mood, reproduc, sensori, viral, feedback, nonlinear, cardiac,
chromosom, uncertainti, momentum, neutron, primat, diagnost, duplic, bacteria, arous, viru, inflamm, impuls, renal, liver, schizophrenia, synthet, deviat, oscil, slight,
synthesi, psychiatr, simpler, cerebr, analyt, sigma, chronic, substrat, reactiv, subgroup, vein, phonem, defici, quantit, benefici, morpholog, incur, vibrat, polynomi, nucleu,
cardiovascular, coher, encod, seizur, dysfunct, focal, acut, cure, reciproc, fusion, accuraci, allevi, prone, react, ankl, instabl, phi, robust, angular, bleed, headach, genom,
drastic, phenomena, gravit, genit, microscop, replic, obes, gland, intercours, malaria, lobe, urin, antagonist, semant, impair, templat, pronoun, shortterm, affin, diverg,
trauma, congest, stimuli, nutrit, indirect, unstabl, proton, harmon, autism, cellular, apparatu, distress, anatomi, invert, nerv, scar, fatigu, hormon, tumor, lethal, deform,
stiff, prolong, epidem, syndrom, likelihood, catalyst, mild, propag, tempor, mitig, unchang, ecosystem, genera, tens, underli, toxic, meaning, development, posterior, unnec-
essari, infer, traumat, inequ, advers, pulmonari, exert, queri, conson, stimul, recess, conjug, invers, stomach, nich, tensor, inclin, dietari, caviti, pregnanc, beta, indirectli,
bandwidth, drought, parasit, hazard, transpar, spectrum, multipli, symmetr, molecular, grammat, simplest, neuron, causal, occurr, reson, imped, onset, catastroph, regress,
breast, anxieti, digest, inferior, invari, alzheim, correl, cord, suffix, hypothes, subtl, kidney, intestin, spatial, suscept, transcript, corros, marker, parkinson, therapeut,
degrad, reus, iter, rigid, dimension, syntax, cortex, homogen, paradigm, fever, decod, cannabi, spinal, syllabl, bacteri, subset, coeffici, arithmet, efficaci, cue, repetit,
worsen, ingest, pneumonia, semiconductor, nasal, stimulu, antibodi, spine, reproduct, swell, durabl, inhibitor, modal, prostat, polym, peripher, tuberculosi, breakdown,
redund, antigen, induc, symmetri, geometr, tertiari, cocain, mutat, addict, metabol, electrod, respiratori, intrins, magnitud, spontan, xray, hiv, rna, precursor, prolifer,
equilibrium, prescript, inhibit, arbitrari, pest, pathway, vaccin, anterior, pi, analys, cosmet, isotop, distort, diabet, abnorm,
V13 =gibb, fischer, rodriguez, stalin, goodman, sherman, macdonald, gill, troy, levi, lenin, solomon, pearson, porter, rodger, dunn, casey, cameron, thomson, thompson, stern,
lama, berri, boon, bradford, fletcher, gandhi, dame, mater, ferguson, carpent, hawkin, reynold, caesar, wallac, perkin, weaver, barrett, harper, bowi, gould, curri, myer,
drake, chapman, byrn, owen, mclaren, hussein, wright, canterburi, cole, forrest, benson, reid, hoover, morrison, sheridan, newton, spencer, bailey, gilbert, fraser, freeman,
walsh, emerson, fuller, griffith, carey,jen, starr, morri, scotia, ix, blake, helm, sinclair, livingston, phillip, carrol, levin, quinn, mccarthi, watson, wagner, rahman, xvi, curti,
fitzgerald, crosbi, harrison, bro, montana, armstrong, lynch, hammond, elli, xi, webster, walker, lennon, xii, chan, parker, maxwel, archer,tate, p otter, edison, dixon, bradi,
nichol, osborn, kent,reed, logan, nash, b ennett, finn, lang, thorn, allmus,stuart, eisenhow, holden, fisher, whitney, clara, presley,b ooth, montgomeri, dylan, beck, luther, kay,
murray, irv, hogan, cohen, arnold, webb, lyon, lambert, christi, obama, blair, heath, newman, gibson, burk, churchil, shelley,eb ert, powel,crow, shaw, eden, carson, truman,
watt, wade, jenkin, henderson, butler, harvey, vii, lumpur, mustang, cain, roosevelt, murphi, riley, penn, reagan, sander, mccain, viii, baldwin, monica, boyd, barker, hyde,
kane, swift, mann, tyler, doyl, bach, palmer, crawford,coleman, barb er, carr, jefferson, schumach, hast, luca, barn, nixon, griffin, md, aka, pitt, mead, sim, koch, macarthur,
wesley, oneil, campbel, preston, holm, gardner, dawson, bradley, tucker, sr, vi, meyer, den, fe, hay, hardi, chamberlain, collin, frost, savag, mccartney, mason, morton,
robertson, burton, klein, baker, chester, laden, byrd, piper, der, robinson, sterl, norton, einstein, hopkin, sullivan, buchanan, stark, johnston, elliott, duncan, stewart, laud,
bacon, hart, moss, hancock, peterson, mitchel, parson, marx, kerri, buck, darwin, mcdonald, turner, hawk,lop ez, rockefel,monro, mcmahon, obrien, xiv, richardson, holland,
hamilton, xiii, chandler, bryant, cox,weber, joyc, madison,
V14 =anthem, carniv, spinoff, wii, followup, reissu, sequel, conductor, apollo, cameo, slate, vh, eve, gig, unreleas, smart, directori, contributor, trio, embark, sung, ds, horror,
rendit, poster, puppet, lp, pb, sang, arcad, facebook, manga, rca, crazi, hd, recount, broadway, genesi, rap, editori, simpson, seller, vol, herald, myspac, rbhiphop, mc,
percuss, jam, smash, diari, jockey, funk, bet, preview, cassett, amaz, latest, hail, bollywood, triumph, protagonist, favourit, airplay, forev, regga, guin, rapp er, fantast,
mtv, artwork, batman, xbox, hardcor, liveact, duo, cheer, choir, recur, commentari, headlin, quartet, rehears, selftitl, con, choru, vinyl, doll, breakthrough, fulllength, oz,
dinner, hymn, dancer, maker, trilog, pen, remix, enthusiast, podcast, referenc, telegraph, remast, saga, sitcom, ace, summari, sketch, ballad, cohost, duet, wizard, remak,
villain, marvel, hiphop, reunion, soni, melodi, spawn, idol, theatric, lone, backup, epic, aria, teen, zombi, joy, madonna, nintendo, mini, superhero, cbc, b estknown, itv,
spiderman, repris, coproduc, emi, recit, monthli, legendari, glori, flute, chat, bang, breakfast, unoffici, showcas, bonu, midnight, shortliv, upload, crossov, banner, greet,
bluray, reprint, superman, screenplay, lesson, memor, chant, improvis, merchandis, rereleas, stereo, dirti, catalogu, satir, platinum, tonight, ne, tribut, ensembl, pokémon,
thriller, delux, cnn, echo, garner, playabl, instant, numberon, cowrot, parodi, discographi, demo, dj, sonic, imprint, catalog, incarn, scream, hiatu, antholog, disco, upcom,
memoir, mgm, trek, itun, circu, dawn, nickelodeon, twitter, espn, photographi, hbo, playboy, footag, autobiographi, clip, riaa, longrun, orchestr, paramount, tragedi, trailer,
youtub, mixtap, eurovis, beatl, sega, playstat, miniseri, ap, bside, eponym, blog, repertoir, bestsel, rerecord,
V15 =sc, wwf, tenni, streak, sec, cub, chess, podium, lap, talli, spectat, seventeen, sheffield, texan, rooki, boxer, allstar, wolverin, afl, fixtur, everton, sat, laker, gt, comeback,
springfield, sixteen, vacat, softbal, overtim, bulldog, surpass, blackburn, brewer, pageant, tna, cowboy, brisban, tackl, leed, rivalri, fierc, odi, speedway, trainer, conced,
roster, postseason, refere, freshman, sack, barcelona, leicest, av, wicket, striker, raven, poker, dodg, wimbledon, penguin, knockout, bruin, eighteen, –present, millennium,
acc, sixti, ucla, richmond, scorer, stint, packer,thanksgiv, quarterback, raider, â, runnersup, dodger, golf, bundesliga, division, hometown, aggreg, stoke, trophi, pac, thirti,
semest, thirteen, trojan, flyer, bout, fumbl, ½, sunderland, gymnast, ufc, bristol, nebraska, gator, raini, usc, replay, falcon, nashvil, td, oakland, volleybal, easter, shootout,
lacross, maverick, icc, milwauke, brave, hattrick, mlb, marathon, wander, vacant, midfield, nwa, narrowli, vike, pitcher, wcw, bench, ferrari, forti, charger, indianapoli, ––,
wwe, rbi, jacksonvil, indoor, postpon, adelaid, lotteri, premiership, fourteen, memphi, undef, chelsea, colt, lineback, semifin, sophomor, preseason, fifteen, incept, derbi,
smackdown, dart, clinch, starter, cincinnati, runner, hornet, fastest, inning, rebound, bronco, panther, kickoff, cardiff, celtic, offseason, rover, ensu, rejoin, punt, philli,
quarterfin, goalkeep, remaind, nhl, threw, arsen, varsiti, ivi, steeler, collegi, tier, qualif, beaten, wigan, ml, orlando, dolphin, firstclass, feud, duel, mascot, columbu, berth,
er, panzer, deadlin, inter, motorsport, seahawk, europa, halftim, preliminari, newcastl, sidelin, titan, wrestler, bolton, yacht, alltim, midway, runnerup, releg, autumn,
nascar, wembley, yanke, spur, intercept, catcher,rematch, jaguar,
V16 =lao, sailor, ukrainian, kosovo, serbian, loyal, bloc, modernday, slavic, morocco, arabian, ulster, citizenship, guinea, uruguay, dissolut, p ersia, aristocrat, provision, midland,
armenia, colombian, croatian, azerbaijan, arabia, cornwal, taliban, filipino, yugoslav, confederaci, hyderabad, ethiopia, feudal, cuisin, jamaica, refuge, treasuri, ghana,
lebanes, cuban, madra, bengal, legion, punjab, ham, istanbul, negro, blockad, balkan, ethiopian, frontier, colonist, malta, concess, ottawa,tang, yuan, nomad, czechoslovakia,
ming, gaelic, counterattack, alexandria, vietnames, roma, mercenari, serb, newfoundland, kenya, uganda, gaza, expel, dominion, peasant, gujarat, iraqi, westminst,
patriarch, insurg, consul, trader, dominican, settler, migrant, ancestri, mongol, herzegovina, colon, palestin, montenegro, archipelago, syrian, malay, malaysian, mafia,
nors, qing, haiti, libya, bombay, pilgrim, papal, hispan, homeland, lithuania, freed, pragu, upris, cede, zimbabw, qatar, yemen, ira, chilean, kerala, detain, isl, commando,
macedonian, dakota, besieg, conscript, karnataka, warsaw, pact, canton, kuwait, nepal, hostag, detent, bulgarian, folklor, aborigin, predominantli, protector, normandi,
maharashtra, prussian, afghan, victorian, pilgrimag, monarchi, incumb, macedonia, dubai, overthrow, denounc, bosnia, algeria, genocid, militia, cypru, indonesian, sudan,
prefectur, albanian, ussr, pakistani, brussel, mongolia, romanian, embassi, congo, napl, catholic, pow, baghdad, vatican,tib et, garrison, cambodia, nobil, saxon, caliph, unif,
conting, thai, slovakia, nationalist, lithuanian, constantinopl, luxembourg, passport, yugoslavia, georgian, airway, turk, caucasu, albania, reestablish, takeov, guerrilla,
estonia, burma, scot, nigerian, proclam, argentin, emir, partisan, yorkshir, finnish, sicili, fascist, synagogu, deport, flourish, cairo, ecuador, sovereignti, cheroke, napoleon,
kashmir, milit, prussia,
V17 =demon, cri, knew, creatur, asid, creator, devil, blind, constantli, wolf, infant, fate, hitler, wed, revel, stolen, competitor, doubt, crush, pretti, pleas, worri, surpris, exactli,
lover, wonder, persuad, childhood, accident, trick, els, quick, badli, thank, lesbian, fun, sword, samesex, sure, ye, na, friendship, rider, prompt, tortur, teenag, testimoni,
pride, ma, worst, bare, everyon, pronounc, doesnt, insist, aliv, mad, lifetim, grave, certainli, till, killer, disappoint, alien, desper, hate, realis, troubl, ghost, repeatedli,
shadow, anyon,fallen, narrat, bride, homer, su, none, spark, captiv, dialogu, temp orarili, kidnap, guilti, scandal, joke,wake, ive, monster, versu, recal, quest, angri, resurrect,
bless, welcom, toy, funer, confess, robot, wasnt, prostitut, pregnant, specul, anger, imagin, costum, testifi, sin, comfort, forth, conceiv, unfortun, guardian, companion, rape,
hang, hide, sacrific, foster, whatev, fake, reportedli, ms, heaven, shall, devast,id, hunter, grace, sick, storylin, suddenli, presum, audit, witch, fortun, mistak, passion, spare,
steal, rumor, survivor, interrupt, wrong, truli, boss, acknowledg, suppos, blame, vampir, kiss, inherit, curs, silent, apolog, evil, coincid, betray, contend, etern, guy, mar,
upset, deliber, mate, odd, remind, allegedli, cant, jail, portrait, predecessor, complain, laugh, hell, spoke, serious, confid, kid, custodi, hurt, strang, suicid, repli, imprison,
beast, aveng, closest, dozen, innoc, confront, reveng,
V18 =sugar, mask, habitat, explos, narrow, flavor, cotton, nois, concret, accumul, tea, sight, bag, carv, arc, insert, tail, thin, cloud, tear, cylind, insect, valv, sand, flash, meat,
32
raw, mammal, tall, arch, clay, drill, p ocket, galaxi, shoe, rod, ft, beam, tast, suspens, patch, wire, dive, bore, ammunit, dish, outer, meal, rough, teeth, mirror, tip, barrel,
shoulder, hull, stroke, fossil, cattl, fold, shorter, cm, beer, discharg, bow, disk, loop, copper, swim, inner, sheet, plastic, bath, thick, locomot, finger, pistol, float, crystal,
diamet, bicycl, neck, blade, steam, pack, penetr, empti, exhaust, ash, bed, chest, steer, gaug, snake, blend, sweet, predat, rat, clock, bullet, pipe, flat, bottl, flame, spin, gear,
rocket, motorcycl, wet, tone, axi, gap, fat, tobacco, slide, ear, milk, trap, laser, mercuri, inch, smoke,p oison, genu,ingredi, ro of, curv, boot, horn, fabric, nest, brake, rubber,
skull, deck, polar, cluster, circular, barrier, grain, grass, pet, tire, breath, scatter, pit, knee, mouth, stamp, blow, lamp, stick, soft, chip, hidden, specimen, fragment, outdoor,
cartridg, shell, propel, log, harvest, lift, sharp, prey, kit, button, batteri, drag, slot, smooth, delta, whale, crack, medium, pin, bright, coffe, pool, vertic, cannon, artifici,
faster, cultiv, horizont, dispos, appl, nose, wooden, egg, metr, alpha, dome,
V19 =mathematician, trumpet, superstar, preacher, sonata, patronag, psychologist, isbn, chancellor, sculptor, encyclopedia, physicist, endow, avantgard, elementari, berkeley,
conservatori, rave, beethoven, curat, soprano, modernist, tsar, avid, philharmon, sergeant, archaeologist, reich, comedian, mit, jd, tenor, cyril, brigadi, prolif, unesco,
birthplac, manifesto, paperback, apprentic, telugu, malayalam, humanist, deutsch, chef, spokesperson, punjabi, wikipedia, councillor, magna, diploma, bibliographi,
magistr, singersongwrit, uc, counselor, truste, biograph, ballet, creed, alumni, newcom, mentor, synopsi, ign, economist, yale, pamphlet, postgradu, englishlanguag,
banker, choral, businessman, princeton, smithsonian, math, clerk, coauthor, librarian, sheriff, cornel, thesi, dictionari, superintend, tuition, freelanc, violin, entrepreneur,
culinari, seminar, astronaut, urdu, sociologist, forb, screenwrit, petti, emin, troup, vocat, vogu, polytechn, •, pianist, veterinari, discipl, tutor, regent, inspector, yorker,
nonfict, biologist, shepherd, concerto, ba, neoclass, rabbi, textbook, abbot, op, preparatori, mba, jointli, filmmak, standup, spokesman, parttim, vicepresid, surgeon, pupil,
supervisor, choreograph, pornograph, citat, marathi, bilingu, psychiatrist, playwright, treatis, renown, pseudonym, quarterli, naturalist, bulletin, fellowship, classmat,
theorist, kannada, hindi, acquaint, anthropologist, constabl, columnist, baroqu, appel, pharmaci, dissert, shakespear, defunct, saxophon, cartoonist, playback, inventor,
grammar, blogger, chemist, instructor, upheld, campus, prose, subtitl, africanamerican, fairi, riff, pp, bengali, thinker, emeritu, technician, stanford, novelist,
V20 =render, sensit, proper, fatal, perceiv, toler, sole, rapidli, earthquak, necessarili, solv, bound, owe, repair, satisfi, emphas, craft, strengthen, explan, wider, ecolog, fals, excit,
somewhat, poverti, framework, wealth, tension, simultan, landscap, imposs, manipul, immun, temporari, massiv, exact, matur, grown, behaviour, elabor, outcom, disrupt,
closer, emphasi, flexibl, pose, ordinari, defect, minim, notion, complic, understood, circumst, ration, strain, valuabl, superior, similarli, furthermor, longterm, vowel, reli,
wage, stronger, exploit, fairli, passag, unless, comparison, dissolv, rapid, counter, extinct, heavili, shock, routin, oral, ongo, easi, accomplish, ignor, afford, modif, absenc,
socal, compens, suppress, expos, preval, perspect, partli, phenomenon, vulner, widespread, slowli, moder, therebi, dramat, attain, mainten, confus, character, circul, displac,
ideal, absolut, familiar, destruct, reinforc, theoret, violent, margin, deliveri, recoveri, stabl, suffici, resolv, substanti, strongli, aris, unusu, disabl, accur, sort, neutral, huge,
facilit, fewer, accommod, attitud, valid, reconstruct, relev, greatli, harm, suitabl, safe, radic, alloc, disturb, impli, optim, isol, persist, weak, easier, visibl, mere, supplement,
broad, proof, diet, reliabl, aggress, trend, interfer, borrow, trigger, align, preced, undergo, gradual, gender, loos, clearli, exclud,
V21 =forum, hostil, evacu, counsel, guarante, stake, seiz, constitu, voter, ceas, inquiri, referendum, terrorist, servant, friendli, ratifi, halt, interven, ralli, administ, reorgan,
prospect, parliamentari, auction, unifi, advisor, prosecut, equiti, liberti, demograph, eu, rebel, submit, leas, permiss, telecommun, pledg, salari, autonom, civic, lawsuit,
tribal, faction, tribun, withdraw, fulltim, withdrawn, enact, judici, tenur, commenc, physician, consolid, recipi, mandat, contractor, resum, bureau, begun, consent,
nonprofit, accredit, reelect, complaint, bid, clinton, sa, diplomat, elector, charter,withdrew, rent, nasa, unsuccess, volunt, behalf, unanim, nationwid, merger, decre, admiss,
nato, intervent, propaganda, cia, petit, landmark, repeal, elit, endors, sovereign, elig, deleg, subordin, abolish, spi, commerc, holder, dispatch, licenc, ss, legislatur, analyst,
cadet, terror, auto, aftermath, democraci, culmin, registr, oblig, maritim, enlist, archiv, specialist, criticis, clash, warrant, bankruptci, coalit, pension, opt, welfar, interim,
advic, condemn, privileg, recognis, prosecutor, workshop, mutual, regain, rebuild, expir, disband, sharehold, slaveri,casualti, sanction, riot, advisori, publicli, fda, prosp er,
ballot, lobbi, statut, activist, coup, congression, un, supervis, surrend, urg, fbi, renov, overse, postwar, discontinu,
V22 =infrar, vapor, fibr, inflat, nitrogen, residu, ambient, knot, lowest, surplu, absorpt, −, mw, silicon, commod, detector, torqu, subsidi, heavier, temper, offset, distil, solvent,
dens, gase, ventil, boil, uranium, crude, ph, bulk, melt, humid, deeper, mph, greenhous, gradient, lava, payload, spill, °c, lighter, deplet, muzzl, mb, overhead, abund, sanit,
rpm, fatti, upward, plasma, threshold, kmh, hydraul, eros, shortag, alloy, glucos, evapor, cheaper, compart, refriger, sulfur, tide, wavelength, cooler, shear, median, photon,
thrust, enrich, moistur, petroleum, gasolin, buffer, freez, width, zinc, cyclon, radiu, boiler, puls, subtrop, veloc, satur, mg, machineri, potassium, sodium, amino, pollut,
thermal, combust, shale, dioxid, kinet, diffus, nm, drain, calcium, dispers, emit, tidal, friction, nutrient,propuls, unemploy, ignit, flux, °f, shallow, contamin,freshwat, recycl,
turbin, precipit, tariff, clearanc, ethanol, rainfal, sunlight, slower, rainforest, discount, beverag, reservoir, insul, sperm, lesser, intak, aluminium, chlorid, irrig, °, fluctuat,
vacuum, latitud, dissip, livestock, solubl, fertil, jaw,manifold, lunar, coil, drainag, literaci, condens, altitud, mortal, sediment, proxim, dose, electromagnet, ferment, rotor,
downward, fraction,
V23 =realtim, telescop, cpu, protocol, codic, node, interv, compat, diagram, grid, terrestri, finit, server, lens, io, freight, ps, email, cach, googl, stack, navig, automot, proprietari,
sensor, compact, plugin, notat, transmit, static, xp, autom, commut, connector, api, subscrib, portabl, pc, broadband, desktop, q, matrix, gb, browser, layout, ibm, linear,
consol, array, android, ×, turbo, probe, graviti, modul, mac, kernel, relay, hp, hybrid, denot, →, prefix, tablet, radar, iphon, ac, quantum, integ, pixel, graph, cc, kw, shuttl,
mhz, automobil, embed, supplier, db, vitamin, delet, z, intermedi, vendor, emul, random, simul, gameplay, diesel, chassi, ip, default, gp, laptop, remot, wireless, subscript,
newer, bmw,converg, download, topolog, os, prototyp, vector, algebra, scan, encrypt, usb, antenna, cargo, font, transmitt, theorem, leak, surveil,chord, intel, synchron, refin,
bundl, amplifi, len, app, readili, menu, interfac, premium, printer, analog, multiplay, reactor, linux,synthes, hardwar, conveni, paramet, dual, infinit, processor, spacecraft,
databas, packet, configur, highspe, geometri, discret, binari,
V24 =danni, leo, justin, neil, maria, ibn, nelson, colleagu, kevin, warren, ted, dean, russel, lisa, nova, eric, billi, tim, pat, dick, steven, princess, longtim, teammat, matthew, kim,
fred, benjamin, jean, willi, singh, carter, max, cousin, jon, jan, pete, hugh, carl, bassist, kelli, kate, larri, widow, da, craig, ralph, eldest, harold, ron, susan, abu, eddi, santa,
lloyd, terri, nick, charlott, franklin, ross, yearold, jay, costar, grandfath, greg, anna, jane, gari, bruce, jeff, alan, charli, shah, elder, wayn, jacob, li, albert, phil, sibl, michel,
bin, christoph, drummer, alic, karl, ed, archbishop, ian, ryan, victor, margaret, leon, bobbi, johnni, tommi, denni, rick, ken, robin, perri, luke, todd, ben, sarah, norman,
morgan, anthoni, girlfriend, gordon, matt, sean, brook, andi, gen, jerri, donald, evan, graham, dougla, jason, jonathan, barri, oliv, abraham, uncl, reunit, chuck, alfr, brian,
roy,walter, cofound, youngest, baron, ami, mario, muhammad, keith, alex, frederick, jimmi, dave, rob, dan, barbara, samuel,
V25 =sampl, expens, classif, index, upgrad, innov, strategi, algorithm, otherwis, enhanc, topic, difficulti, wherea, stabil, variabl, equival,input, usag, exp eriment,automat, evalu,
client, visual, context, motion, coordin, fundament, shift, discoveri, graphic, dynam, mode, intens, accid, represent, classifi, segment, util, variat, revers, differenti, variant,
modifi, evolut, laboratori, fast, monitor, revis, core, virtual, assess, error, logic, henc, dimens, map, zero, enabl, mathemat, pure, transmiss, delay, sustain, procedur, calcul,
essenti, alter, evolv, handl, extern, correct, weather, appropri, composit, bit, packag, orient, check, add, specifi, extra, predict, descript, equat, statist, precis, scheme,
manual, balanc, updat, fix, andor, divers, partial, strength, manner,
V26 =hungari, contin, exil, mumbai, norway, turkey, ukrain, patriot, tokyo, beij, frankfurt, caribbean, ontario, bulgaria, athen, delhi, romania, nigeria, afghanistan, peninsula,
lebanon, cuba, taiwan, belgium, cemeteri, austria, iran, iceland, malaysia, munich, finland, hawaii, switzerland, northeastern, greec, vancouv, jerusalem, neighbor, fled,
thailand, mainland, alaska, sieg, amsterdam, queensland, geneva, croatia, southwestern, hampshir, venic, glasgow, villa, serbia, peru, netherland, nevada, manila, brazil,
emigr, annex, pirat, dublin, indonesia, syria, ambassador, metro, chile, summit, madrid, invad, singapor, orlean, northwestern, southeastern, shanghai, portug, colombia,
edinburgh, poland, montreal, alberta, moscow, sweden, presentday, venezuela, bangladesh, denmark, milan, elsewher, vienna, argentina, quebec, abroad, neighbour,
V27 =parad, injur, slow, trace, nicknam, touch, oppon, caught, pull, ahead, wound, penalti, crowd, chase, broke, induct, vs, journey, fought, straight, bat, sail, besid, row, climb,
ram, longest, shut, cap, twin, knock, twelv, bought, disappear, struck, jump, stood, departur, twice, trip, broken, span, driven, substitut, laid, pitch, suspend, ward, throw,
twenti, tiger, retreat, lane, hurrican,kick, rush, goe, u, ran, drove, whilst, drawn, eleven, giant, gone, buri, tripl, gang, wait, drew, plu, sit, strip, fell, catch, exit, warrior,lay,
push, readi, collaps,
V28 =coron, worship, mytholog, sultan, abbey, shrine, monk, monasteri, mosqu, wealthi, persian, realm, ce, missionari, han, thcenturi, ruler, shiva, feast, antiqu, bce, ruin, heir,
rebellion, myth, renaiss, priest, conquest, chapel, cathedr, revolt, ancestor, mystic,descent, commemor, surnam, goddess, burial, buddhist, gothic, tib etan, byzantin,mediev,
onward, throne, rite, nobl, sikh, clan, denomin, alphabet, proclaim, mughal, conquer, prophet, hindu, crusad, buddhism, dynasti, patron, monarch, ascend, sanskrit, deiti,
flee, calendar, inscript, massacr, treasur, armenian, sacr, counterpart, archaeolog, monument,saudi, tomb,
V29 =shape, sequenc, abil, signal, target, integr, qualiti, uniqu, interact, symbol, etc, presenc, detail, directli, devic, focu, equal, eg, principl, fit, resourc, factor, categori, pattern,
knowledg, messag, user, definit, advantag, mass, contrast, capabl, characterist, environ, correspond, detect, consum, interpret, matter, distinct, phase, demonstr, sens,
reflect, transform, item, kind, impact, techniqu, root, option, deriv, simpl, analysi, solut, consider, content, tool, skill, compon, display, mechan, multipl, basic, restrict,
safeti, altern, address, consequ, ie, implement, aspect, electron, instanc, ident,
V30 =wast, bind, dri, membran, storag, depth, fluid, clean, consumpt, oxid, orbit, substanc, radiat, atmospher, fuel, compress, skin, solar, adjust, fruit, heat, veget, deposit, soil,
hydrogen, protein, crop, layer, acceler, decreas, feed, emiss, particl, oxygen, tissu, inject, load, maximum, ion, carbon, atom, filter, pump, dna, fiber, exposur, acid, reduct,
solid, bone, muscl, mixtur, angl, tropic, molecul, warm, coal, stem, tube, salt, absorb, cool, miner, receptor, liquid, rotat, fresh, drink, optic, excess, extract, alcohol, constant,
minimum,
V31 =photograph, occasion, possess, cite, ban, enjoy, pilot, request, depict, suit, confirm, unlik, guid, meant, pursu, abandon, rescu, repeat, encount, descend, favor, obtain,
watch, chosen,distinguish, incorp or, dedic, paid, respond, choos, fashion, sought, search,warn, explor, invent, convert, preserv, experienc, perman, permit, regist, introduct,
convers, encourag, gather, assign, engag, count, ensur, creation, seek, grew, restor, kept,threaten, attribut, buy, recruit, accus, deni, send, deliv, recommend, recov, belong,
princip, split, accompani, conclud,
V32 =currenc, enterpris, membership, certif, household, expans, tourist, sector, loan, interior, subsidiari, insur, net, ltd, payment,export, consult, farmer, rural, disast, visitor, inc,
renew, worldwid, destin, partnership, profit, fair, relief, asset, merg, fee, budget, geograph, ticket, viewer, revenu, residenti, exclus, survey, entiti, sponsor, cash, transact,
compris, recreat, crisi, estat, censu, investor, trust, employe,vast, ownership, chariti, ministri, illeg, patent, acquisit, infrastructur, ventur,debt, co, donat, tourism, domest,
retail, newli, telephon, financ,
V33 =austin, virginia, maryland, kentucki, connecticut, atlanta, jersey, portland, seattl, b oston, oregon, kansa, chicago, manchest, iowa, wisconsin, melbourn, avenu, reloc,
pittsburgh, illinoi, baltimor, michigan, ranger, arizona, downtown, miami, liverpool, brooklyn, houston, phoenix, detroit, arena, toronto, dalla, colorado, birmingham,
louisiana, denver, philadelphia, pennsylvania, texa, berlin, suburb, tech, minnesota, cardin, manhattan, buffalo, indiana, usa, fc, utah, massachusett, metropolitan,
cleveland, georgia, missouri, florida, alabama, sydney,b orough, ohio, arkansa, oklahoma, tennesse, mississippi,
V34 =storm, plate, cycl, boat, chamber, rear, winter, bomb, apart, meter, floor, leg, bottom, steel, frame, burn, flood, ring, door, insid, wave, fli, bar, switch, panel, tabl, block,
attach, height, column, parallel, spring, glass, tank, onto, edg, gate, mill, stone, forward, bond, lock, wheel, circl, vessel, crash, deep, chain, mm, stream, seed, shop, path,
circuit, pair, wood, garden, tower, feet, fill, mount, foot, factori, plane, truck,
V35 =medicin, cooper, attract, statu, librari, mainli, declin, formal, prepar, focus, hospit, agent, demand, programm, architectur, whole, organis, recogn, themselv, file, divid,
attent, foundat, agenc, purpos, primarili, discuss, mostli, document, legal, teach, benefit, domin, conduct, controversi, employ, regul, basi, effort, aid, exhibit, mission,
economi, intellig, custom, job, cours, situat, convent,money, emerg, oppos, aim, branch, progress, expand, opportun, secret, worker, contact, conflict,
V36 =convict, ‘, someon, answer, hear, divorc, herself, neither, didnt, truth, nor, im, alon, mind, heard, impress, babi, admit, sentenc, promis, fear, dead, punish, perfect, noth,
older, anyth, bad, sleep, remark, gift, wish, moment, birth, wit, victim, convinc, spirit, everyth, intent, talent, commit, jesu, notic, inde, suspect, soul, unknown, realiz,
mysteri, whi, occas, beauti, coupl, happi, chanc, rememb, gay, holi, dream, listen,
V37 =sourc, concept, method, problem, protect, appli, rather, object, experi, rel, subject, particular, measur, individu, o ccur, reason, condit, combin, certain, specif, improv,
normal, theori, express, concern, complex, approach, evid, sound, physic, typic, imag, structur, properti, applic, materi, function, formula, itself, defin, signific, element,
observ, speci, remov, code, indic, compar, valu, therefor, either, data,
V38 =account, futur, learn, shown, suggest, seen, recent, face, particularli, achiev, relationship, past, adopt, introduc, hold, propos, full, memori, charg, reveal, surviv, maintain,
separ, numer, carri, contribut, today, share, respect, promot, subsequ, regard, select, key, particip, gave, advanc, earn, accept, despit, saw, especi, rais, gain, whose, identifi,
except, least, argu, toward, extend,
V39 =toni, clark, o, miller, tom, bell, jordan, scott, adam, harri, jone, marshal, frank, ford, brown, kennedi, jr, chri, allen, johnson, mike, moor, simon, howard, anderson, knight,
don, bush, ray,jack, van, daniel, jim, von, roger, lee, iv, taylor, jackson, lewi, joe, davi, sam, biographi, wilson, ali, steve, smith, khan, bob,
V40 =phrase, bibl, prayer, legaci, illustr, ritual, poet, tale, liter, gospel, narr, poetri, scholar, influenti, cinema, dub, amongst, reader, hebrew, essay, linguist, devot, romanc,
painter, legend, dialect, chapter, icon, philosoph, spoken, poem, manuscript, speaker, vers, heritag, spell, canon, romant, cult, quot, chronicl, literari, earliest, wellknown,
sculptur, reviv, pioneer, script, literatur,
V41 =bird, yellow, sun, wine, resembl, lion, dress, grey, flower, spot, dragon, eat, hair, dog, planet, moon, breed, b elt, coin, wild, cloth, colour, magic, orang, hunt, iron, eagl, wear,
worn, rain, snow, coat, cat, decor, ride, rose, hors, rich, diamond, bull, rice, hole, ice, dark, uniform, seal, cook, bear, shield,
V42 =anglican, liturgi, congreg, sunni, clergi, pagan, ld, apostol, oath, pradesh, lutheran, baptist, methodist, pastor, shia, judaism, theologian, scriptur, brotherhood, rabbin, sect,
nun, episcop, cleric, apostl, theolog, basilica, dioces, persecut, secular, hinduism, evangel, ecclesiast, communion, parish, triniti, seminari, jesuit, marxist, christ, mormon,
presbyterian, orthodox, anarchist, preach, libertarian, ordain, martyr,
V43 =manuel, juan, rosa, carlo, pedro, são, lorenzo, sierra, di, rafael, josé, giovanni, lui, roberto, mont, pablo, andré, fernando, marco, jorg, gabriel, alberto, silva, aviv, miguel,
hugo, ana, cruz, copa, fidel, maría, torr, garcia, monaco, paulo, du, polo, marino, castro, antonio, santo, jose, franco, bernardino, santiago,
V44 =northeast, basin, migrat, cave, corner, creek, resort, southwest,tunnel, railroad, nearbi, southeast, pacif, mediterranean, inhabit, geographi, p ole, atlant,coastal, b oundari,
restaur, municip, canal, dam, desert, km, highway, headquart, ocean, adjac, trail, cap e, northwest, hotel, fort, stretch, castl, plain, entranc, beach, shore, mile, underground,
neighborhood,
V45 =sever, manag, chang, base, found, area, provid, although, produc, product, creat, power, intern, complet, report, each, open, line, within, local, act, point, anoth, remain,
lead, own, compani, oper, major, addit, accord, continu, receiv, design, set, under, present, build, current, form, hous, same, support,
V46 =violenc, threat, leadership, belief, opinion, faith, recognit, motiv, scientist, resolut, argument, freedom, divin, speech, intellectu, spiritu, duti, philosophi, alleg, advoc,
conclus, ethic, disput, moral, corrupt, instruct, exercis, excel, choic, expert, examin, favour, vision, nevertheless, debat, reput, reject, doctrin, disciplin, statement, creativ,
dismiss, assert,
V47 =natur, those, special, consist, activ, though, limit, repres, engin, bodi, possibl, market, further, involv, test, project, exampl, model, standard, respons, industri, contain,
effect, issu, type, land, event, exist, human, period, class, control, case, term, way, great, process, ad, anim, offer, requir,
V48 =figur, mention, credit, print, pictur, piec, collabor, plot, earlier, label, novel, text, theme, screen, celebr, websit,journal, newspap, adapt, arrang, instrument, press, comment,
magazin, scene, audienc, interview, page, letter, volum, articl, voic, paper, background, doctor, inspir, card, edit, fan, mix, paint,
V49 =peabodi, jubile, daytim, pulitz, bafta, primetim, filmfar, prizewin, awardwin, allamerican, nobel, posthum, telecast, desk, firstteam, accolad, emmi, globe, saturn, finalist,
cann, sundanc, oscar, nomine, brit, gemini, prestigi, baseman, medalist, carnegi, nielsen, laureat, guild, juno, dove, mvp, honorari, cw, mellon, grammi,
33
V50 =founder, deputi, formerli, mayor, bishop, meanwhil,successor, chose, chairman, admir, lincoln, advis, secretari, fellow, hire, cabinet, crown, invit, inaugur, ceo, scout, editor,
politician, resign, lawyer, colonel, journalist, assassin, mp, veteran, lieuten, architect, chair,rival, renam, presidenti, briefli, victoria, commission, attorney,
V51 =tonn, usd, capita, lb, ton, ago, litr, cubic, cent, trillion, exceed, metric, kilomet, €, pound, gram, annum, dollar, revolv, yen, fifti, euro, kg, kilogram, crore, gallon, kilometr,
se, gdp, weigh, acr, gross, hectar, rs,
V52 =cavalri,jet, helicopt, rifl, warfar, battalion, raid, assault, naval, airborn, artilleri, fleet, bomber, missil, guard, strateg, squadron, submarin, regiment, patrol, expedit, fighter,
corp, deploy,infantri, brigad, armour, carrier, combat, tactic, aviat, personnel, armor,
V53 =egyptian, portugues, polish, welsh, palestinian, vice, austrian, commonwealth, oversea, nazi, puerto, dutch, czech, scottish, confeder, swiss, hungarian, tamil, continent,
merchant, sri, turkish, mexican, irish, iranian, danish, isra, belgian, imperi, swedish, norwegian, provinci, brazilian,
V54 =behavior, mental, cancer, brain, emot, neg, therapi, symptom, psycholog, compound, diseas, abus, clinic, seriou, ill, reaction, depress, gene, treatment, surgeri, biolog, pain,
stress, treat, patient, sexual, infect, chemic, disord, failur, genet, risk,
V55 =via, varieti, link, extens, construct, flight, ground, manufactur, store, transport, equip, ship, access, distribut, suppli, rang, facil, space, avail, format, free, commerci, car,
weapon, fire, aircraft, food, comput, plant, vehicl, connect,
V56 =coverag, regularli, hollywood, comedi, disney,logo, weekli, anchor, amateur, serial, daili, theater, fox, fm, poll, documentari, realiti, bbc, cancel, holiday, venu,affili, franchis,
drama, nbc, mail, syndic, sky, cb, abc, cartoon,
V57 =brief, genr, adventur, tape, disc, photo, compil, biggest, cd, soundtrack, mainstream, dvd, hero, entitl, highlight, signatur, lyric, favorit, ep, christma, audio, certifi, lineup,
fantasi, tune, greatest, session, sing, solo, string,
V58 =civilian, jew, citizen, occup, immigr, egypt, era, affair, regim, pakistan, occupi, tribe, settl, settlement, soldier, republ, coloni, israel, airlin, allianc, invas, slave, flag, revolut,
alli, troop, camp, philippin, rome, navi,
V59 =previous, soon, simpli, refus, heart, frequent, better, yet, longer, might, hope, agre, actual, intend, commonli, —, here, hard, immedi, unabl, expect, ultim, quickli, alway,
alreadi, fail, ever, probabl,
V60 =miss, crew, schedul, youth, challeng, elimin, prior, entri, incid, beat, lose, struggl, driver, strike, compet, owner, crime, offens, enemi, shoot, partner, defend, draw, contest,
tie, latter,
V61 =abl, claim, doe, must, hand, upon, attempt, still, need, action, order, initi, without, onc, instead, find, should, decid, help, never, right, plan, eventu, tri,
V62 =softwar, advertis, joint,licens, rail, microsoft, transit, onlin, web, instal, digit, mobil, camera, bu, traffic, satellit, platform, window, cabl, phone, passeng, termin, internet,
V63 =appar, desir, fulli, lot, increasingli, highli, beyond, quit, understand, difficult, prove, perhap, prefer, easili, clear, rare, danger, sex, awar, necessari, extrem, tend,
V64 =stop, stand, keep, behind, taken, brought, escap, travel,drop, destroy, fall, rest, discov, captur, sent, save,arriv, bring, visit, drive, stay, put,
V65 =orchestra, hop, acoust, symphoni, hip, guitarist, vocalist, bass, folk, drum, guitar, vocal, keyboard, warner, rb, punk, jazz, pop, rhythm, piano, songwrit,
V66 =japan, canada, territori, zealand, germani, europ, spain, mexico, england, ireland, kingdom, franc, australia, border, china, capit, russia, provinc, itali, scotland, pari,
V67 =dont, think, seem, ask, want, told, feel, felt, happen, realli, done, someth, thought, am, let, know,talk, explain, tell, got,
V68 =concentr, profil, elev, capac, extent, ratio, yield, significantli, domain, incom, output, quantiti,densiti, sum, p ercentag,slightli, frequenc, effici, voltag, proport,
V69 =bridg, villag, road, front, hill, site, templ, section, street, middl, outsid, rout, valley, mountain, centr, cross, resid, port, nativ,
V70 =again, led, return, die, sign, join, leav, replac, meet, attack, togeth, begin, enter, left, lost, reach, kill, held, mark,
V71 =younger, husband, historian, succeed, alongsid, saint, reign, ladi, lord, queen, emperor, pope, portray, sir, mr, dr, le, captain, princ,
V72 =boy, king, wife, parent,friend, son, murder, marri, father, met, brother, woman, girl, mother, daughter, marriag, sister, whom, child,
V73 =soap, bueno, thth, fifteenth, buckingham, sixteenth, tampa, nineteenth, fourteenth, thirteenth, eleventh, midth, twentieth, twentyfirst, twilight, eighteenth, twelfth, pga,
seventeenth,
V74 =sinc, origin, member, success, appear, live, life, them, group, against, histori, peopl, home, famili, seri, show, countri, perform,
V75 =park, side, along, region, across, river, western, central, london, built, northern, town,eastern, island, near, lo cat, throughout, southern,
V76 =undergradu, cambridg, oldest, oxford, lectur, bachelor, faculti, enrol, graduat, phd, professor, taught,teacher, scholarship, nurs, campu, harvard, galleri,
V77 =legisl, administr, seat, committe, congress, vote, parliament, campaign, opposit, commiss, council, leader, governor, senat, bill, candid, assembl,
V78 =june, born, februari, novemb, august, januari, career, juli, septemb,forc, york, decemb, march, april, octob, announc, began,
V79 =given, person, allow, result, becaus, consid, veri, even, refer, among, describ, mean, give, see, make,caus, due,
V80 =oil, spread, upper, urban, climat, lie, portion, agricultur, ga, zone, farm, forest, surround, bay, wind, flow, mine,
V81 =sunday,monday, saturday,pm, walt, tuesday, afterward, wednesday, weekend, shortli, friday, thereaft, weekday, morn, thursday, afternoon, newscast,
V82 =medici, bayern, aston, jure, plata, liga, facto, sall, moin, atlético, rothschild, havilland,janeiro, palma, vega, versail, gaull,
V83 =b, c, x, v, iii, g, e, r, f, j, d, k, p, l, w, h,
V84 =next, summer, previou, ten, hour, five, everi, nine, seven, six, eight,big, hall, night, entir, few,
V85 =research, commun, servic, inform, studi, scienc, educ, organ, econom, center, institut, polit, busi, program, train, social,
V86 =elizabeth, loui, arthur, alexand, stephen, patrick, martin, mari, philip, joseph, lawrenc, andrew, franci, edward, ann, duke,
V87 =blood, gun, light, tree, wing, room, eye, sea, machin, surfac, heavi, color, earth, metal, wall, fish,
V88 =draft, nba, pick, rugbi, nfl, squad, junior, winner, tournament, playoff, qualifi, confer, ncaa, retir, senior,
V89 =christian, greek, thousand, latin, arab, protest, islam, muslim, hundr, speak, minor, translat, ancient, religion, jewish,
V90 =exchang, sold, acquir, invest, sale, brand, purchas,sell, transfer, corp or, global, fund, pay, stock, privat,
V91 =true, thing, littl, idea, your, question, my, whether, how, our, look, fact, god, too, me,
V92 =famou, note, write, list, short, wrote, danc, classic, compos, collect, notabl, read, written, cover,
V93 =justic, investig, appeal, judg, crimin, approv, declar, constitut, peac, decis, arrest, prison, trial, grant,
V94 =avoid, damag, prevent,suffer, affect, grow, potenti, po or, resist, loss, drug, injuri, strong, lack,
V95 =languag, cultur, practic, word, relat, view, interest, histor, movement, influenc, associ, modern, tradit, societi,
V96 =russian, indian, french, canadian, german, australian, spanish, royal, english, foreign, italian, japanes, chines,
V97 =differ, small, larg, main, ani, close, similar, import, popular, common, good, variou, increas,
V98 =charl, john, georg, robert, michael, jame, paul, david, william, henri, thoma, peter, richard,
V99 =twoyear,lo ckhe,rho de, oneyear,rio, virgin, zepp elin, fouryear,threeyear, amus, sponsorship, fiveyear,
V100 =athlet, pro, cricket, basebal, profession, wrestl, soccer, bowl, basketbal, hockey, sup er, stadium,
V101 =final, start, late, second, end, four, earli, until, last, befor, three,
V102 =price, speed, level, energi, pressur, temperatur, degre, rate, water, cell, cost,
V103 =stage, fight, defeat, race, club, battl, victori, player, competit, match, win,
V104 =pass, come, turn, go, came, get, run, date, just, move, went,
V105 =fa, afc, stanley, middleweight, fifa, intercontinent, sprint,nfc, uefa, heavyweight, costa,
V106 =charact, book, music, role, featur, stori, star, titl, direct, version,
V107 =larger, less, below, greater, higher, smaller, abov, reduc, lower, low,
V108 =so, i, we, could, what, do, did, you, if, like,
V109 =negoti, firm, provis, proceed, jurisdict, impos, amend, violat, prohibit, enforc,
V110 =prize, outstand, honor, silver, honour, bronz, ceremoni, medal, golden,
V111 =channel, movi, launch, host, entertain,news, media, broadcast, sp ort,
V112 =consecut, eighth, tenth, ninth, seventh, rd, sixth, fifth, nd,
V113 =children, himself, young, man, women, age, men, death, old,
V114 =annual, total, tax, percent, highest, rise, estim, averag, growth,
V115 =socialist, reform, labour, communist, liber, democrat, republican, labor, conserv,
V116 =former, attend, chief, assist, elect, board, serv, head, appoint,
V117 =spent, roughli, almost, squar, spend, approxim, worth, nearli, £,
V118 =academ, environment, scientif, technic,medic, primari, financi, health, secondari,
V119 =outbreak, gulf, vietnam, cold, iraq, korean, tag, revolutionari,
V120 =establish, offici, church, rule, offic, author, independ, parti,
V121 =ball, roll, walk, cut, step, break, shot,
V122 =weight, size, length, distanc, enough, scale, amount,
V123 =indi, korea, asian, coast, wale, asia, carolina,
V124 =doubl, femal, male, promin, adult, guest, cast,
V125 =champion, cup, premier, coach, footbal, championship,
V126 =song, top, singl, album, track, band,
V127 =command, polic, staff, execut, post, box,
V128 =green, white, red, blue, black, gold,
V129 =hit, uk, studio, concert, debut, tour,
V130 =versa, latterday, rico, nadu, rica, rican,
34
V131 =y, grand, et, del, el, al,
V132 =say,said, ’, b eliev, love,
V133 =museum, contemporari, fine, master, martial,
V134 =american, west, east, south, north,
V135 =america, africa, india, african, bank,
V136 =singer, musician, actress, writer, actor,
V137 =secur, task, defens, reserv, defenc,
V138 =through, back, away, down, off,
V139 =third, regular, finish, rank, fourth,
V140 =half, largest, round, decad, quarter,
V141 =now, becom, best, becam, well,
V142 =artist, rock, style, director, video,
V143 =olymp, theatr, festiv, airport, trade,
V144 =yard, field, score, touchdown, goal,
V145 =prix, juri, slam, duchi, testament,
V146 =motor, magnet, real, nuclear, electr,
V147 =million, around, popul, us,
V148 =british, union, presid, govern,
V149 =earl, birthday,grade, anniversari,
V150 =univers, art, law, student,
V151 =billboard, hot, peak, chart,
V152 =usual, often, thu, sometim,
V153 =season, leagu, war, team,
V154 =fiction, technolog, polici, care,
V155 =recept, review, prais, acclaim,
V156 =francisco, angel, diego, kong,
V157 =divis, minut, overal,
V158 =militari, armi, depart,
V159 =opera, marin, palac,
V160 =nomin, academi, won,
V161 =n, m, t,
V162 =lake, britain, deal,
V163 =long, much, far,
V164 =determin, vari, depend,
V165 =known, took, take,
V166 =agreement, contract, treati,
V167 =copi, billion, per,
V168 =televis, episod, tv,
V169 =day,month, week,
V170 =washington, district, suprem,
V171 =soviet, feder, european,
V172 =religi, indigen, ethnic,
V173 =counti, colleg, california,
V174 =mid, bc,
V175 =critic, posit,
V176 =retain, assum,
V177 =ottoman, roman,
V178 =minist, prime,
V179 =ii, civil,
V180 =radio, railway,
V181 =de, la,
V182 =air, arm,
V183 =comic, publish,
V184 =columbia, dc,
V185 =station, network,
V186 =lanka, lankan,
V187 =cathol, empir,
V188 =court, school,
V189 =high, public,
V190 =lo, hong,
V191 =“, ”,
V192 =st,
V193 =san,
V194 =fame,
V195 =award,
V196 =place,
V197 =wide,
V198 =centuri,
V199 =th,
V200 =∅
16.3 Companies with the highest daily returns
16.3.1 Sector breakdown within the S&P500 and the dataset
Table 5 contains the sector breakdown within the dataset as well as the S&P500. These numbers are based on S&P500’s factsheet
from 2022.
16.3.2 Ticker symbols of the 300 constituents
Here are the ticker symbols of the 300 companies that we considered, in order:
V=IP, CB, ZBH, AAPL, GS, IBM, AMGN, MMM, CVX, FDX,COST, CMI, UNP, AVB, BLK, SPG, HD, LMT, JNJ, KMB, JPM, GD, MCK, ESS, CI, UNH, CSCO, PXD, MCD,
NVDA, INTC, PSA, MTB, HON, BXP,GWW, NOC, TMO, BA, INTU, APD, TRV, RTX, PEP,CAT, AMAT, TXN, ORCL,WHR, BDX, PPG, QCOM, SHW, UPS, PH, LRCX,
PFE, NSC, HUM, ECL, DE, ADP, GE, SRE, ROK,WMT, EOG, MLM, PG, RE, DIS, NEE, T, ITW, KLAC, XOM, PNC, RL, AON, EA, LOW, BAC, AXP, VZ, CMCSA, SYK,
EBAY, STZ, WFC, HPQ, ROP, AMT, ABT, CLX, BEN, C, LLY, SNA, SWK, MS, CTXS, KSU, MCO, MRK, EL, FRT, KO, HAL, APA, WM, SJM, ADI, DHR, FCX, JCI,
VMC, MSI, IFF, SBUX, GILD, CTAS, CVS, ALL, UHS, COO, SLB, MMC, TT, MCHP, NLOK, MDT, HSY, TGT, BMY, TROW, NKE, USB, COP, EFX, XLNX, MO, DRI,
ROST, DTE, JNPR, CCI, BBY, NTAP, DUK, OXY, TFX, VLO, LHX, PAYX, FITB, SBAC, ETR, GPS, COF, NEM, MRO, KR, YUM, CL, DD, DGX, WBA, SCHW, MAR,
GLW,SO, BK, JBHT, NTRS, PGR, TJX, AIG, ADM, HWM, A, CAG, STT, SYY, HES, ABC, DOV, CSX, EIX, GIS, TFC, WMB, NUE, VFC, ETN, BAX, EMR, EXC, JKHY,
CAH, AEP, AFL, XEL, TECH, ATVI, ARE, DVN, HIG, AVY, NWL, WY, OMC, PCAR, FE, D, MAS, POOL, LEN, BBWI, VNO, EQR, NI, TER, CPB, DHI, PEG, K, LUMN,
PPL, HAS, MU, MKC, PLD, LNC, ZION, ED, APH, MGM, CNP, PVH, CMA, CTSH, EMN, FAST, TSN, IEX, RSG, AEE, EXPD, TSCO, TXT, ES, CINF, MOS, CHRW,
CERN, PBCT, RCL, UDR, CTRA, PEAK, TAP, CCL, SEE, KIM, ALB, KEY, XRAY, RMD, STE, DRE, BWA, WEC, RHI, GPC, FMC, L, J, LEG, OKE, MAA, CMS, PHM,
VTR, IRM, PKI, O, ODFL, SWKS, AES, HRL, BLL, AME, AJG, IVZ, RJF, PNR, GL, LUV, IPG, PNW.
35
Sector Weight Percentage in dataset
Industrials 7.8% 16.3%
Health Care 12.7% 11.0%
Information Technology 29.3% 10.0%
Consumer Discretionary 13.2% 10.7%
Communication Services 10.4 3.0
Consumer Staples 5.6% 10.7%
Utilities 2.4% 7.7%
Financials 10.8% 14.0%
Materials 2.5% 6.3%
Real Estate 2.6% 7.3%
Energy 2.7% 5.3%
TABLE 5
Sector breakdown of the S&P500 by index weight, together with the relative percentages of each sector within the 300 constituents considered in
the dataset.
16.3.3 Processed sequence of observations
Here is the complete, processed sequence of observations:
X1:`=ADI, AES, PVH, HUM, NTAP, AMT, EBAY, NTAP, J, RL, PVH, ROST, ODFL, DVN, EOG, XLNX, ODFL, LOW, A, INTU, CCI, NTAP, ODFL, ATVI, TSCO, EBAY, STE,
BLL, MLM, EXPD, CCI, ODFL, LUV, J, AAPL, ZBH, HAS, TGT, ROK, AJG, CTXS, ODFL, NEM, NLOK, ATVI, SWKS, MSI, SWKS, EA, SWKS, MGM, MU, NLOK,
JNPR, NEM, JNPR, CTXS, SCHW, SWKS, JNPR, XLNX, CTSH, TAP, USB, FCX, NTAP, NI, JNPR, HPQ, QCOM, SBAC, TSCO, PVH, SBAC, CTSH, SBAC, TSCO,
SWKS, MOS, TER, SBAC, MKC, WMB, FCX, RMD, PKI, ODFL, USB, MRO,NTAP, ODFL, NTAP, ODFL, DRI, SWKS, AMT, WMB, EL, TSN, CTSH, ATVI, AES, GLW,
ODFL, JNPR, TER, JBHT, HAL, ODFL, HAL, SWKS, HUM, ZBH, NVDA, SWKS, HUM, ABC, STE, HAL, QCOM, RCL, FCX, JCI, LMT, STE, AAPL, HUM, MOS, CCI,
GLW,CCI, SBAC, MAR, PVH, PAYX,SJM, STE, TT, ROP, AES, WMB, ATVI,AMT, CCI, JNPR, WMB, SBAC, AES, JNPR, CTXS, ODFL, PNR, ODFL, AES, RSG, FCX,
QCOM, PHM, JKHY, FMC, TSCO, JNPR, QCOM, PXD, AES, BMY, ODFL, NVDA, POOL, TSN, MKC, CCI, LRCX, SBAC,AMT, UNH, JKHY, MSI, SBAC, APD, NTAP,
STE, NVDA, JCI, CMCSA, JCI, RMD, HPQ, EFX, NTAP, RL, RSG, NTAP, SWKS, CNP, JCI, TECH, CNP, NEM, WMB, SWKS, CNP, NEM, ODFL, NTAP, ROK, SJM,
CTSH, ORCL, ODFL, MCHP, JCI, HUM, AES, JCI, AMGN, SBAC, JBHT, AMT, LUV, AMAT, CCI, AES, AMT, JCI, NEM, JNPR, ORCL, LRCX, CNP, CTXS, UPS, JCI,
ROP, NTAP, AMT, SBAC, COF, AES, SBAC, JKHY, WMB, CNP, CCI, WMB, NWL, GLW, MO, MGM, IPG, LRCX, XEL, CCI, AON, SWKS, SBAC, CCI, GLW, WMB,
AES, TSN, GLW, TSN, AMT, GLW, SBAC, CCI, NLOK, BBY, JKHY, COF, NVDA, AES, MOS, AES, HON, JCI, SWKS, PEP, DUK, WFC, EIX, MCHP, CNP, MCHP,
WMB, DD, MU, VZ, GPS, XEL, BK, DGX, SBAC, AES, AMT, TSCO, APH, MU, AES, XEL, IPG, AMT, CCI, SBAC, WM, PKI, SBAC, AES, JNPR, NI, LMT, SWKS, CNP,
LLY, WMB, NVDA,SBAC, WMB, SJM, AMT, SBAC, TER, AMT, SBAC, MSI, TER, HAL, AES, DUK, AES, STE, AMT, AES, LRCX, SBAC, AES, WMB, SEE, CCI, DRI,
ODFL, NKE, SWKS, AES, SBAC, ATVI, TGT, WMB, SBAC, CERN, SBAC, GLW, TECH, AMT, LHX, WMB, GLW, STE, SBAC, WMB, SBUX, SBAC, AES, SBAC, TER,
VLO, WMB, PKI, NUE, SBAC, GLW, AES, A, AES, ADI, WMB, DGX, SBAC, UHS, WMB, CMCSA, MU, CTAS,BBWI, GLW, CCI, WMB, IPG, GILD, TSCO, IPG, AME,
ODFL, SBAC, WMB, SBAC, NTAP,SBAC, FCX, CVS, CMS, CTXS, FAST, AES, MCD, SYK, CCI, TGT, SBAC,CMI, CMS, JKHY, WMB, SBAC, ROK, SBAC, BDX, CMS,
TAP, AMT, TECH, AES, SBAC, MCD, CCI, ROST, SBAC, HIG, SWKS, MSI, CTSH, AES, SBAC, INTU, MO, NI, SBAC, TXN, GLW, BBWI, SBAC, MU, CTSH, BBY,
MCD, SWKS, MU, AMT, ODFL, PGR, BAX, HON, TER, SWKS, GILD, GPS, SBAC, CMS, CTSH, SBAC, CTXS, CCI, SBAC, SWKS, SBAC, TER, HPQ, GILD, LRCX,
BLL, CERN, HAL, TT, CTSH, TSCO, POOL, FAST,CCI, SBAC, CI, CMI, XEL, SBAC, EXPD, RL, BBY, NKE, LRCX, TECH, KSU, TSN, TXN, MU, SBAC, NTAP, NLOK,
JNPR, FCX, LRCX, AAPL, PKI, COO, J, COO, UHS, SBAC, NVDA, CMS, ROK, FE, RL, AES, CTSH, SCHW, SLB, SBAC, NLOK, MKC, LMT, AES, GLW, AMT, DHI,
CCI, FAST,SBAC, SWKS, ODFL, TJX, CTSH, SBAC, IPG, GWW, EFX, SBAC, CERN, RE, POOL, PHM, STE, RCL, TER, CCI, AES, CI, MMC, EXPD, TXN, TJX, FCX,
L, ABC, CERN, ABT, PGR, HAL, FCX, UNH, ATVI,RCL, NTAP, MOS, EA, LRCX,MOS, J, MOS, EOG, KSU, SBAC, CCL, MCHP, SWKS, AES, NEM, RSG, IPG, CTAS,
CERN, MCD, SBAC, AAPL, NVDA, SBAC, FMC, MCK, SBAC, IRM, GILD, AMT, SBAC, JNPR, SBAC, TMO, TER, JNPR, LHX, TSCO, CAH, TSCO, TSN, AVY, PKI,
LEG, RMD, TSN, PBCT, RMD, TER, WMB, EXPD, FCX, TFX, SBUX, MU, FCX, HSY, MCD, ADI, SWKS, PHM, AES, MCHP, CCI, COO, CCI, AAPL, BDX, AAPL,
NLOK, TECH, FAST,HUM, ODFL, CTXS, TSN, MCD, SWK, NVDA, TER, PNR, FMC, NVDA, PBCT, BLK, TRV, SBAC,JBHT, PEAK, CERN, VLO, VTR, ABT, GWW,
SBAC, ODFL, MCK, MSI, MCHP,BXP, TROW, AES, LUMN, MCK, EOG, JNPR, SWKS, DVN, TSN, FCX,DHI, FITB, NTAP, CMI, NTAP, NUE, ROST, ECL, PVH, PXD,
ODFL, GLW, STE, NVDA, EXPD, AAPL, EA, JBHT, SBAC, MGM, EL, CMS, STZ, AME, AAPL, LEG, CNP, MOS, SBAC, JNPR, SCHW, A, COO, TER, HUM, WMB,
MOS, SYK, FCX, XLNX, CTSH, TGT, HUM, CAH, UNH, PXD, MCHP,NTAP, AME, LRCX,HWM, PCAR, CCI, PKI, ZBH, SBAC, ADM, OKE, SWKS, ROST, GPS, MOS,
TSCO, ADI, TGT, ADI, LUV, GLW,SWKS, INTU, SWKS, PHM, FMC, ATVI, AAPL, BLK, MOS, COO, ROST, COO, PHM, SBAC,NVDA, SBAC, MCHP, CAH, VTR, IEX,
CTAS, TER, PXD, FITB, DRI, HAL, MRK, FMC, SBAC, CTSH, CTXS, SWKS, ORCL, DHR, ROP, SBAC, GLW, SBAC, AAPL, SBAC, GPC, AJG, SWK, TER, MCHP,
MMC, VFC, OMC, TMO, IRM, PKI, HUM, AON, EXPD, MO, RJF, ROK, PVH, ATVI, PVH, HRL, EA, LHX, MO, AMAT, EOG, AAPL, CTRA, MOS, NEM, SBAC, LRCX,
NTAP, SBAC, NTAP, CL, FCX, XLNX, ATVI, MOS, VTR, PHM, MRK, LEG, OXY, KSU, ODFL, EXPD, CAG, CTSH, MU, TER, HRL, EFX, HSY, EXPD, MGM, AAPL,
ODFL, ATVI, HAL, ATVI, PXD, ADM, TGT, INTU, SBAC, ETR, J, AME, NVDA, MDT, NUE, VLO, RL, TSCO, RMD, TSCO, VFC, AON, CTSH, NTAP, ATVI, A, WMB,
TSN, MRK, NEM, MMC, PHM, SLB, FMC, CTXS, TECH, FCX, JNPR, TAP, PVH, ADI, SBAC, HAL, MGM, VLO, DVN, HAL, NVDA, PAYX, VLO, FMC, AIG, HPQ,
TXN, CMS, VLO, MCK, ODFL, VLO, RMD, TSCO, EOG, SEE, LLY, SYK, ABT, VLO, PVH, GLW, JKHY, GILD, ATVI, PKI, GLW, MCHP, HUM, EOG, PVH, EBAY,
CCI, ATVI, GLW, SBAC, GLW, TMO, ORCL,LOW, A, YUM, AAPL, JNPR, AES, SWKS, PVH, CCI, STZ, PEAK, GPS, ZBH, ATVI, SWKS, EOG, MGM, MMC, EA, BBY,
SBAC, POOL, TECH, IPG, KR, CCI, DVN, SBAC, STZ, PAYX, VTR, MKC, CTRA, ROK, IVZ, SCHW, TER, EA, RHI, CMI, CTAS,SWK, EA, TER, EBAY, TSCO, WHR,
PNR, MCHP,IRM, ATVI, TSN, TAP, PBCT, ROST, NVDA, AME, CNP, APA,MCD, AAPL, NTAP, TECH, HPQ, CMS, CAT, SBAC, RMD, ATVI, ABC, HRL, LRCX, VLO,
VMC, HES, COF, AAPL, MGM, COO, ATVI, CMI, RE, MU, MCD, RE, AAPL, NSC, AON,PGR, ALL, EOG, RMD, STZ, EBAY, MAR, EIX, CLX, LUV, TJX, ROST, APH,
CTRA, TXN, AAPL, SWKS, MO, FITB, MGM, AAPL, SBAC, CTRA, HPQ, IRM, ADM, MSI, EXPD, JBHT, WMB, AON, GPS, SBAC, LRCX, PVH, BMY, TSN, CERN,
EOG, INTU, MCHP, SBAC,DE, MMC, TER, COO, FCX, SBAC, TER, ZBH, GLW, NEM, ES, CTRA, ECL, INTU, PFE, JBHT, MO, DRI, MRK, TRV, GILD, HUM, ODFL,
STZ, HES, ATVI, HES, ALB, ADM, LHX, ATVI, DHI, FCX, ATVI, NTAP, PEG, MOS, JNPR, APH, SLB, ETN, CHRW, GLW, TXT, CMI, AAPL, SBAC, MLM, JKHY,
MOS, MU, RMD, VFC, TSN, AON, BMY, CTRA, ROP, VLO, COO, RHI, PHM, IEX, ROP, PAYX, JNPR, GLW, ADM, SWKS, MAA, CVS, TSN, FCX, ATVI, JNPR, SWKS,
EOG, SBAC, RMD, PVH, IPG, PHM, PXD, CHRW, EBAY, WHR, NLOK, JKHY, SWKS, NVDA,PHM, CCI, AMT, MOS, UNH, HON, VFC, PHM, APH, AME, LEG, PEG,
ODFL, TSN, SNA, NVDA, MOS, LRCX, EXPD, ATVI, ADM, HUM, TSN, COO, UNH, TFX, HUM, JNPR, KR, TSN, JNPR, WHR, SBAC, ATVI,RMD, OKE, RCL, JNPR,
WBA, DRE, TRV, TAP, NSC, CERN, DIS, NTAP, DVN, SBAC, PXD, CCL, AES, SWKS, AAPL, JBHT, SWKS, MRO, NVDA, FCX, SBAC, WMB, POOL, MO, EIX, GPS,
KLAC, ESS, CTXS, DHI, MAR, ODFL, RL, GL, CERN, HAS, ALB, XLNX, IRM, NVDA, HUM, VMC, MCO, PHM, LMT, ADM, EBAY, CERN, CHRW,NVDA, KIM, JNPR,
NVDA, EBAY, NTAP, FCX, WY, SJM, MDT, SWKS, ODFL, EBAY, LRCX, RCL, SWKS, DVN, PBCT, PHM, EFX, NWL, BBY, BLK, SWKS, MGM, NUE, MMC, SWKS,
OXY, BWA, LRCX, HES, EXPD, EBAY, MCO, DHI, JBHT, NVDA, CMI, DD, NVDA, MRO, XLNX, YUM, LRCX, ETN, CAH, TSCO, CHRW, CERN, EBAY, NUE, NSC,
MCHP, PKI, NLOK, NVDA, D, ROST, CTRA, JNPR, BA, ADM, NUE, TMO, LRCX, HD, PNR, NTAP, EOG, LRCX, DE, FCX, PPL, CHRW, CSCO, PXD, PHM, VLO,
NUE, GLW, MAS, CTRA, PCAR, NLOK, TSN, COO, WY, SWKS, RMD, NUE, HPQ, JNPR, GPS, SWKS, AAPL, A, IVZ, ODFL, GLW, MSI, GPS, AAPL, COO, BWA,
MOS, GS, PHM, PEAK, CCI, COO, CSX, EBAY, FCX,KLAC, JNPR, EOG, CHRW, ODFL, POOL, CTSH, GLW, VNO, MLM, MOS, EXPD, CSX, NTAP, MOS, SJM, VMC,
HAL, INTU, TAP, MAS, CI, ORCL, TSCO, KLAC, LHX, VTR, PVH, APH, EL, T, JNPR, MCO, CSX, ODFL, MLM, ROP, CSX, EBAY, LEN, ATVI, HAL, AVB, JBHT,
CHRW, PXD, NVDA, AES, BBWI, CMCSA, FAST, PNC, WY, MMC, PHM, PPG, POOL, PXD, RMD, WHR, SWKS, MCHP, POOL, XRAY, WHR, IFF, MU, EXC, STE,
PVH, GPS, AMGN, AMAT, AMGN, HAL, LEN, SCHW, LEG, CERN, MCO, LRCX, MOS, AVB, MOS, BMY, WMT, CTAS, CMI, COO, DGX, ADM, NEE, CTRA, MOS,
PGR, NVDA, HUM, CMI, NVDA, SLB, FMC, TECH, FCX, AME, HES, SCHW, AAPL, DGX, TGT, CMI, HES, SWKS, FRT, TJX, KSU, LRCX, BLK, SHW, CAT, CI, UNP,
SNA, ESS, STZ, L, TROW, FRT, HUM, ATVI, TROW, EXPD, VTR, FRT, TFX, LUMN, NVDA,A, CMA, EXC, MOS, NVDA, NUE, BBWI, ROST, LMT, NEM, APH, IVZ,
CMI, FCX, RL, WY, MS, ATVI, AMGN, HUM, FCX, DHI, NWL, CMI, TAP, SWKS, EL, AES, IVZ, ALB, MLM, INTU, HAS, LEN, OXY, KSU, MGM, BEN, MOS, TJX,
JBHT, NTAP, STT, NVDA, UNP, PVH, PHM, NTAP, PCAR, NEE, MOS, JNPR, MLM, NEM, XRAY, SBAC, BWA, HES, RL, ODFL, PSA, LUV, MOS, LEG, AMAT, SBUX,
JCI, OXY, BBWI, TGT, HUM, SPG, ALB, ABC, DHI, IVZ, TECH, ORCL, LEN, PVH, PHM, JNPR, COO, LEN, IVZ, MAR, HES, VFC, MOS, IVZ, PEAK, CMI, PPG, TFX,
VTR, NEM, J, NEE, LLY,CAH, OKE, ROST, MOS, LUV, PVH, MMC, XLNX, LOW, PHM, NTAP,ALB, LEN, SBAC, RHI, PHM, NVDA, CMI, IEX, CI, MCO, SBAC, LEN,
EQR, ROK, CTRA, HRL, MOS, DHI, POOL, SBAC, MOS, LEN, JNPR, EOG, EA, NTAP, IVZ, TSN, IEX, CMA, PEAK, RJF, HUM, PHM, PEAK, JPM, MS, HUM, MS,
BLK, PVH, MOS, PAYX, KIM, MU, LEN, MU, POOL, MRK, COF, CTXS, MU, LOW, POOL, HES, SCHW, KLAC, LEG, SLB, HES, FITB, RJF, TROW, MOS, COF, PVH,
CMI, PHM, MRO, PXD, DHI, STE, EOG, ATVI, HWM, STZ, PHM, NVDA, LHX, VMC, HUM, CI, QCOM, HSY, MOS, BBY, CTSH, VLO, MCO, SWKS, CTRA, BAX,
APA, GPS, MOS, TER, MS, ZION, ATVI, MS, CCL, KEY, HAL, LUMN, KR, MAS, MOS, STZ, RMD, DTE, ECL, MGM, LEN, PNW, HWM, FITB, PHM, USB, TFC, ZION,
KEY, MOS, RCL, DRE, SNA, POOL, RE, BAC, HES, WEC, CTSH, CI, MGM, MLM, ABC, DHI, MGM, ATVI, MOS, PHM, UHS, ES, EOG, VLO, LEN, MGM, POOL, PVH,
LEN, PHM, MGM, YUM, PVH, APA, CMA, WHR, AVY, MU, AIG, ZION, JBHT, AIG, NEM, EQR, RJF, NEM, VMC, LEN, TSN, JPM, RHI, STT, FITB, MU, FMC, AIG,
AES, STT, GLW, LNC, UNH, KEY, WFC, PPG, SBAC, AES, KEY, TRV, MRO, PNC, ODFL, IPG, MGM, RSG, AIG, HIG, J, MGM, DHI, AES, HRL, GS, AES, ALL, HIG,
PLD, STZ, GLW, IPG, KIM, LNC, LEN, LNC, CTAS, PLD, MS, HIG, DRE, NVDA, PEAK, UNH, DRE, NEM, PLD, KSU, MU, RCL, TECH, ARE, MU, PCAR, HES, IEX,
IRM, MGM, MU, LNC, ATVI, DHI, GPS, COO, LNC, VFC, DRE, MU, PH, PNC, STT, HIG, VLO, ZION, FITB, ODFL, BA, ROK, DHI, APH, CI, FITB, GE, EXPD, IPG,
BWA,DHI, IVZ, COF, MO, KEY, LEN, COF, FITB, USB, FITB, LEN, SWKS, GLW, SWKS, UNH, USB, DRE, MU, COF, AIG, IP, AIG, HWM, NEM, LNC, IP, PLD, NWL,
LNC, HUM, HIG, TXT, BWA,MGM, TXT, UNH, ROK, DRE, FITB, ZBH, AXP, DRE, MGM, ORCL, STT, SNA, IVZ, MAS, UHS, TECH, CERN, NWL, PKI, FITB, MGM,
ZION, CI, ZION, MGM, MOS, TFC, HIG, ODFL, ZION, MU, UHS, CTXS, BLK, UDR, TER, CI, CSX, PCAR, HWM, SBUX, KEY, IPG, LEN, BLK, IP, KEY, VMC, RL,
CI, TJX, HUM, CI, CNP, IP, LEN, JKHY, LNC, MAA, RCL, MTB, KIM, CI, ROST, MU, AIG, IPG, AXP, WHR, AIG, VNO, IPG, KEY, HIG, COF, ZION, TXT, C, FMC,
ODFL, KEY, ARE, AIG, BBWI, MAR, WMB, NSC, HIG, FITB, TFC, HUM, DE, AIG, EXPD, RMD, KIM, AIG, KSU, FITB, MOS, AIG, BBY, CMCSA, SPG, AIG, MGM,
MSI, AIG, ZION, CMA, VLO, LEG, AIG, RJF, EA, CTAS, MGM, HIG, MCO, YUM, HSY, AIG, BBWI, SCHW, MU, LEN, MU, BWA, LEN, KIM, VLO, LHX, MSI, FITB,
DE, LEN, TROW, ROP, UNH, AIG, IVZ, NEM, COF, AIG, PHM, AIG, SWKS, PLD, TXT, ODFL, CTAS, MAS, EL, MOS, DRE, PGR, MOS, CI, COO, BBWI, MGM, EBAY,
SWKS, MOS, KLAC, RHI, CMCSA, ODFL, CMA, UNH, LUV, MGM, WHR, NVDA, ODFL, DRI, IVZ, AIG, MGM, FAST, AIG, ODFL, A, WFC, KEY, HIG, ZION, CMI,
TFX, MGM, FITB, TXT, VLO, C, ZION, RJF, MGM, HAL, UHS, IVZ, CTXS, PBCT, NVDA, STE, AJG, COO, NEM, LEN, AIG, MGM, MSI, IP, GILD, TXT, JNPR, AIG,
TMO, KEY, SRE, IPG, FITB, PCAR, AIG, PVH, AIG, C, PCAR, LNC, MGM, MLM, FDX, ROST, MGM, RCL, KEY, AIG, MGM, AES, MCK, PXD, MGM, ZION, AIG,
MGM, J, AIG, VNO, ZION, VMC, MLM, C, ZION, CHRW, TXT, PHM, KSU, UHS, HUM, LHX, RMD, MLM, UDR, GILD, STE, AON, MAS, AIG, TXT, EA, UHS, SEE,
BAX, RL, ROST, PXD, BAX, CMCSA, TXT, NTAP, CAH, HSY, HAL, SBAC, PBCT, NEM, ZION, ODFL, LEN, NTAP, AIG, TER, MCO, MTB, KR, NEE, SJM, MCO, MO,
ODFL, TXT, BBWI, JNPR, JCI, AES, MCO, AES, MSI, IVZ, PGR, GS, MOS, WY, STT, ABT, FITB, TFX, AIG, ECL, CVS, CTXS, SWK, MOS, UHS, MOS, IFF, PKI,
AES, CL, BWA, EL, AEE, CSCO, SWK, TGT, NTAP, INTU, MOS, LEN, LNC, DRI, LNC, ABC, HIG, APH, PVH, GS, NTAP, NLOK, RCL, TER, MU, BLK, MCK, MOS,
ORCL, WHR, NVDA, HWM, KLAC,MU, LUV, NEM, CTRA, TSN, BBWI, EA, JNPR, STZ, CI, MGM, MCO, FITB, WMB, CHRW, GPC, WFC, ZION, WFC,FITB, NVDA,
TSCO, AME, ODFL, FMC, HAL, MCK, VMC, MGM, DHI, BK, HAL, EOG, HAL, JNPR, CAT, TSN, RCL,HAL, CTRA, HUM, MU, SWKS, HRL, WBA, LOW, MAS, LEN,
MGM, TER, JBHT, LNC, AIG, MSI, CTRA, AIG, STZ, MSI, CERN, DRE, RCL, XLNX, TER, AIG, KEY, MOS, GPS, NVDA, LNC, MSI, AIG, NVDA, IPG, NVDA, HES,
NVDA, UHS, TER, NOC, LMT, LOW, WY, SEE, SHW, HAL, NWL, HES, AME, SEE, NVDA, IPG, A, HAS, LEN, RL, JNPR, CLX, MRO, IVZ, TER, NVDA, INTU, PEG,
CTRA, PKI, VLO, EIX, AES, XLNX, VLO, A, EXC, LUV, NLOK, SBUX, WY, CTRA, F
This sequence of observations constitutes the following sparse frequency matrix in Figure 12.
16.3.4 Detected groups
Here are the detected groups in model ˆ
P:
V1=CB (Financials), GS (Financials), IBM (Information Technology),AMGN (Health Care), MMM (Industrials), CVX (Energy), FDX (Industrials), COST (Consumer Staples),
UNP (Industrials), AVB(Real Estate), SPG (Real Estate), HD (Consumer Discretionary), JNJ (Health Care), KMB (Consumer Staples), JPM (Financials), GD (Industrials),
36
Fig. 12. A plot of the matrix
{1
[
ˆ
Fij >
0]
}i,j
, where the rows and columns are sorted according to the improved clustering. We plotted the matrix
like such because
ˆ
F
is quite sparse due to the trajectory’s length
`
= 2451 being quite short: the minimum, median, mean, and maximum of the
entries of the matrix {ˆ
Fi,j }i,j are 0,0,`/n2≈0.027, and 14, respectively.
ESS (Real Estate), CSCO (Information Technology), INTC (Information Technology), PSA (Real Estate), MTB (Financials), HON (Industrials), BXP (Real Estate),
GWW (Industrials), NOC (Industrials), BA (Industrials), APD (Materials), TRV (Financials), RTX (Industrials), PEP (Consumer Staples), CAT (Industrials), AMAT
(Information Technology), BDX (Health Care), PPG (Materials), SHW (Materials), UPS (Industrials), PH (Industrials), PFE (Health Care), NSC (Industrials), ECL
(Materials), DE (Industrials), ADP (Information Technology),GE (Industrials), SRE (Utilities), WMT (Consumer Staples), PG (Consumer Staples), DIS (Communication
Services), NEE (Utilities), T (Communication Services), ITW (Industrials), XOM (Energy), PNC (Financials), BAC (Financials), AXP (Financials), VZ (Communication
Services), SYK (Health Care), CLX (Consumer Staples), BEN (Financials), C (Financials), LLY (Health Care), SNA (Industrials), FRT (Real Estate), KO (Consumer
Staples), APA(Energy), WM (Industrials), DHR (Health Care), IFF (Materials), CVS (Health Care), ALL (Financials), TT (Industrials), MDT (Health Care), BMY (Health
Care), NKE (Consumer Discretionary), COP (Energy), EFX (Industrials), DTE (Utilities), DUK (Utilities), OXY (Energy), ETR (Utilities), KR (Consumer Staples), YUM
(Consumer Discretionary), CL (Consumer Staples), DD (Materials), DGX (Health Care), WBA (Consumer Staples), SO (Utilities), BK (Financials), NTRS (Financials),
CAG (Consumer Staples), SYY (Consumer Staples), DOV (Industrials), EIX (Utilities), GIS (Consumer Staples), TFC (Financials), ETN (Industrials), BAX (Health Care),
EMR (Industrials), EXC (Utilities), AEP (Utilities), AFL (Financials), ARE (Real Estate), AVY (Materials), OMC (Communication Services), FE (Utilities), D (Utilities),
VNO (Real Estate), EQR (Real Estate), NI (Utilities), CPB (Consumer Staples), PEG (Utilities), K (Consumer Staples), LUMN (Communication Services), PPL (Utilities),
HAS (Consumer Discretionary), MKC (Consumer Staples), ED (Utilities), EMN (Materials), RSG (Industrials), AEE (Utilities), ES (Utilities), CINF (Financials), UDR
(Real Estate), CCL (Consumer Discretionary), XRAY (Health Care), WEC (Utilities), GPC (Consumer Discretionary), L (Financials), OKE (Energy), MAA (Real Estate),
O (Real Estate), BLL (Materials), AJG (Financials), PNR (Industrials), GL (Financials), PNW (Utilities)
V2=IP (Materials), ZBH (Health Care), CMI (Industrials), BLK (Financials), LMT (Industrials), MCK (Health Care), CI (Health Care), UNH (Health Care), PXD (Energy),
MCD (Consumer Discretionary), TMO (Health Care), INTU (Information Technology),TXN (Information Technology), ORCL (Information Technology), WHR (Consumer
Discretionary), QCOM (Information Technology),ROK (Industrials), EOG (Energy), MLM (Materials), RE (Financials), KLAC (Information Technology), RL (Consumer
Discretionary), AON (Financials), EA (Communication Services), LOW (Consumer Discretionary), CMCSA (Communication Services), EBAY (Consumer Discretionary),
STZ (Consumer Staples), WFC (Financials), HPQ (Information Technology), ROP (Industrials), ABT (Health Care), SWK (Industrials), MS (Financials), CTXS (In-
formation Technology), KSU (Industrials), MCO (Financials), MRK (Health Care), EL (Consumer Staples), SJM (Consumer Staples), ADI (Information Technology),
JCI (Industrials), VMC (Materials), MSI (Information Technology), SBUX (Consumer Discretionary), GILD (Health Care), CTAS (Industrials), UHS (Health Care),
SLB (Energy), MMC (Financials), MCHP (Information Technology), NLOK (Information Technology), HSY (Consumer Staples), TGT (Consumer Discretionary), TROW
(Financials), USB (Financials), XLNX (Information Technology), MO (Consumer Staples), DRI (Consumer Discretionary), ROST (Consumer Discretionary), BBY (Con-
sumer Discretionary), TFX (Health Care), LHX (Industrials), PAYX (Information Technology), GPS (Consumer Discretionary), COF (Financials), MRO (Energy), SCHW
(Financials), MAR (Consumer Discretionary), JBHT (Industrials), PGR (Financials), TJX (Consumer Discretionary), ADM (Consumer Staples), HWM (Industrials), A
(Health Care), STT (Financials), HES (Energy), ABC (Health Care), CSX (Industrials), NUE (Materials), VFC (Consumer Discretionary), JKHY (Information Technology),
CAH (Health Care), XEL (Utilities), TECH (Health Care), DVN (Energy), HIG (Financials), NWL (Consumer Discretionary), WY (Real Estate), PCAR (Industrials), MAS
(Industrials), POOL (Consumer Discretionary), BBWI (Consumer Discretionary), DHI (Consumer Discretionary), PLD (Real Estate), LNC (Financials), ZION (Financials),
APH (Information Technology), CNP (Utilities), CMA (Financials), CTSH (Information Technology), FAST (Industrials), IEX (Industrials), EXPD (Industrials), TSCO
(Consumer Discretionary), TXT (Industrials), CHRW (Industrials), CERN (Health Care), PBCT (Financials), RCL (Consumer Discretionary), CTRA (Energy), PEAK
(Real Estate), TAP (Consumer Staples), SEE (Materials), KIM (Real Estate), ALB (Materials), KEY (Financials), RMD (Health Care), STE (Health Care), DRE (Real
Estate), BWA (Consumer Discretionary), RHI (Industrials), FMC (Materials), J (Industrials), LEG (Consumer Discretionary), CMS (Utilities), VTR (Real Estate), IRM
(Real Estate), PKI (Health Care), HRL (Consumer Staples), AME (Industrials), IVZ (Financials), RJF (Financials), LUV (Industrials)
V3=AAPL (Information Technology), NVDA (Information Technology), LRCX (Information Technology), HUM (Health Care), AMT (Real Estate), HAL (Energy), FCX
(Materials), COO (Health Care), JNPR (Information Technology), CCI (Real Estate), NTAP (Information Technology), VLO (Energy), FITB (Financials), SBAC (Real
Estate), NEM (Materials), GLW (Information Technology), AIG (Financials), WMB (Energy), ATVI (Communication Services), LEN (Consumer Discretionary), TER
(Information Technology), MU (Information Technology), MGM (Consumer Discretionary), PVH (Consumer Discretionary), TSN (Consumer Staples), MOS (Materials),
PHM (Consumer Discretionary), ODFL (Industrials), SWKS (Information Technology),AES (Utilities), IPG (Communication Services)
Supplementary Material Literature
[58]
Tiago A. Almeida and Jose Maria Gomez Hidalgo. SMS Spam Collection Data Set. https://www.kaggle.com/datasets/uciml/sms-spam-
collection-dataset.
[59] Zhidong Bai and Jack W Silverstein. Spectral Analysis of Large Dimensional Random Matrices. 2010.
[60]
David Bamman and Noah A Smith. New Alignment Methods for Discriminative Book Summarization. arXiv preprint arXiv:1305.1319,
2013. Accessed at https://www.kaggle.com/datasets/ymaricar/cmu-book-summary-dataset.
[61]
Gordon V. Cormack, José María Gómez Hidalgo, and Enrique Puertas Sánz. Spam Filtering for Short Messages. In Proceedings of the
sixteenth ACM conference on Conference on information and knowledge management, 2007.
[62]
Gordon V. Cormack, José María Gómez Hidalgo, and Enrique Puertas Sánz. Feature Engineering for Mobile (SMS) Spam Filtering. In
Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007.
37
[63]
José María Gómez Hidalgo, Guillermo Cajigas Bringas, Enrique Puertas Sánz, and Francisco Carrero García. Content based SMS spam
filtering. In Proceedings of the 2006 ACM symposium on Document engineering, 2006.
[64]
Ken Lang. NewsWeeder: Learning to Filter Netnews. In Proceedings of the Twelfth International Conference on Machine Learning, 1995.
[65]
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N Mendes, Sebastian Hellmann, Mohamed Morsey,
Patrick Van Kleef, Sören Auer, and Christian Bizer. DBpedia – A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia.
Semantic web, 2015.
[66]
David D. Lewis, Yiming Yang, Tony Russell-Rose, and Fan Li. RCV1: A New Benchmark Collection for Text Categorization Research.
Journal of machine learning research, 2004.
[67]
Daniel Paulin. Concentration inequalities for Markov chains by Marton couplings and spectral methods. Electronic Journal of Probability,
2015.
[68]
Mike Russel. 10.000 Books and Their Genres standardized. Accessed at https://www.kaggle.com/code/michaelrussell4/gutenberg-book-
genre-feature- engineering/data, 2021.
[69] Jaron Sanders, Alexandre Proutière, and Se-Young Yun. Clustering in Block Markov Chains. The Annals of Statistics, 2020.
[70]
Jaron Sanders and Alexander Van Werde. Singular value distribution of dense random matrices with block Markovian dependence. arXiv
preprint arXiv:2204.13534, 2022.
[71]
Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural
Information Processing Systems, 2015.