Content uploaded by Ribana Roscher

Author content

All content in this area was uploaded by Ribana Roscher on Dec 08, 2017

Content may be subject to copyright.

SPARSE REPRESENTATION-BASED ARCHETYPAL GRAPHS FOR SPECTRAL

CLUSTERING

Ribana Roscher, Lukas Drees, Susanne Wenzel

University of Bonn, Institute of Geodesy and Geoinformation

ABSTRACT

We propose sparse representation-based archetypal graphs as

input to spectral clustering for anomaly and change detec-

tion. The graph consists of vertices deﬁned by data samples

and edges which weights are determines by sparse represen-

tation. Besides relationships between all data samples, the

graph also encodes the relationship to extremal points, so-

called archetypes, which leads to an easily interpretable clus-

tering result. We compare our approach to k-means cluster-

ing performed on the original feature representation and to k-

means clustering performed on the sparse representation acti-

vations. Experiments show that our approach is able to deliver

accurate and interpretable results for anomaly and change de-

tection.

Index Terms—Sparse representation, spectral clustering,

sparse graphs, anomaly detection, change detection

1. INTRODUCTION

The task of ﬁnding clusters in data sets has been an active

research area for a long time. Especially if the cluster distri-

bution is complex and different clusters cannot be separated

linearly in input space, there is a demand for powerful clus-

tering techniques. Therefore, spectral clustering has become

a popular clustering approach (e.g., [1, 2]), since it often out-

performs conventional cluster algorithms such as k-means or

EM-algorithm. For spectral clustering, singular value decom-

position of a matrix, representing relationships between data

samples, is used to derive the clustering result. One of the key

factors, inﬂuencing the performance of spectral clustering,

is the construction of the relationship-encoding graph form

which the matrix is derived. Commonly used approaches

build dense graphs comprising pairwise distances between all

data samples. More efﬁcient sparse graphs comprise pairwise

distances between each sample and its neighbors in a small

neighborhood (e.g., [3]). The authors of [4] propose an alter-

native based on sparse representation, which fulﬁll the desired

graph characteristics of high discriminating power, sparsity

and an adaptive neighborhood. They show that so-called L1-

sparse graphs achieve a higher accuracy for clustering and for

semi-supervised classiﬁcation, and its success is also under-

lined by [5].

Our contribution is the introduction of a sparse represen-

tation-based graph as input to spectral clustering, which in-

cludes information about relationships between all data sam-

ples, but also information about the relation to archetypes

(i.e., extremal data samples). Archetypes has been proved

to be suitable representatives of data samples in e.g., [6] and

[7], which can be efﬁciently extracted using simplex volume

maximization (SiVM, [8]1). Due to the relation of all samples

to archetypes, the approach is able to ﬁnd a suitable number

of clusters and provides a spectral information which can be

easily interpreted as anomaly in single images or change in

time-series data. In our experiments, we show the suitability

of sparse representation-based graphs for spectral clustering

by means of plant disease symptom detection in a hyperspec-

tral image and by means of a change detection task. These

tasks have already been addressed by approaches using sparse

representation, e.g., [9], [10]), but none of them uses sparse

representation-based graphs.

2. METHODS

2.1. Spectral Clustering

Spectral clustering is the task of ﬁnding clusters using a spec-

tral decomposition of a matrix, interpretable as graph, de-

rived from the data. In order to represent relationships be-

tween the samples, a weight matrix Wis deﬁned by weights

wnm >0being connections between two samples xnand

xmand wnm = 0, otherwise. We further deﬁne the normal-

ized graph Laplacian Lsym =D−

1

2LD−

1

2with L=D−W.

The degree matrix is given by D= diag (Pmwm), where

wmis the m-th row of W. Since spectral decomposition of

the Graph Laplacian is intractable for large matrices, we use

Nystrm approximation[11] to derive a smaller matrix for de-

composition. We use the singular vectors to the smallest sin-

gular values as input to k-means clustering to obtain the ﬁnal

clustering result.

2.2. Graph Construction with Sparse Representation

In our approach, we follow the work of [4] by designing

a sparse graph in which the estimated sparse coefﬁcients,

computed by a nonnegative least squares optimization, are

1www-ai.cs.uni- dortmund.de/weblab/code.html

used to characterize relationships between data samples

X= [x1,...,xn,...,xN]with Nbeing the number of

samples. We deﬁne a directed graph G={X,W}with the

samples Xbeing the vertices and the matrix Wbeing the

edge weights. The graph is constructed in an unsupervised

way using sparse representation, where the activations are

interpreted as weights in the graph.

In terms of sparse coding a sample xnis represented

by a weighted linear combination of a few elements taken

from a (M×(N−1))-dimensional dictionary T, such that

xn=Tαn+γwith kγk2being the reconstruction error.

The dictionary T= [x1,...,xn−1,xn+1 . . . , xN]is embod-

ied by all other samples except xn. The coefﬁcient vector,

comprising the activations, is given by αn, from which we

can derive wm= [α1,...,αn−1,0,αn+1,...,αN], being

the rows of W. The optimization problem for the determina-

tion of optimal ˆαnis given by

ˆαn= argmin kTαn−xnk2,subject to αn0(1)

where the ﬁrst term is the reconstruction error and the second

term is the non-negativity constraint. Since non-negativity

alone leads to a sufﬁcient sparsity, we do not introduce a fur-

ther sparsity enforcing prior.

Let us call the weight matrix comprising the relationships

between all samples WX. For spectral clustering, we con-

struct the weight matrix W=WXWAby extending WX

by an additional weight matrix WAbased on sparse repre-

sentation on archetypal dictionaries [6]. The weight matrix

WAincludes the independently computed result of (1) of all

samples with archetypes serving as dictionary. Therefore the

relationship between all samples is encoded, but also the rela-

tionship to archetypes leading to an easily interpretable clus-

tering result. More speciﬁcally, the cluster assignments of

the archetypes can be related to all samples in the same clus-

ter and thus, each cluster can be interpreted by an expert by

means of its assigned archetype.

3. EXPERIMENTAL SETUP AND RESULTS

3.1. Datasets

3.1.1. Hyperspectral Sugar Beet Plant

We use a hyperspectral image of a plant, cv. Pauletta (KWS,

Einbeck, Germany), which was cultivated for 8 weeks in

a controlled environment in a greenhouse. The plant was

inoculated with Cercospora beticola, the causal agent of

Cercospora leaf spot. We use the hyperspectral pushbroom

sensor unit VISNIR-camera ImSpector V10E (Specim, Oulu,

Finland) with 1600 pixel, observing a spectral signature with

211 spectral bands in the range of 400 −1000 nm. For eval-

uation purposes, we manually annotated disease symptoms.

Due to the error-prone labeling of the exact area of the symp-

toms, we decided to exclude this effect from the analysis by

relying only on the symptom centers which are labeled more

robust. This dataset is used to evaluate our proposed approach

for anomaly detection, where disease symptoms are deﬁned

as anomalies.

3.1.2. The Bastrop Complex Wildﬁre

This dataset comprises 4 satellite images of size 1534 ×

808 pixel of different sensors acquired over the Bastrop

County, Texas (USA). In September 2011, most of the area

has been destroyed by wildﬁre. One image acquired by

Landsat 5 TM sensor depicts the area before the event and

three images acquired by Landsat 5 TM, Advanced Land

Imager (ALI) from the EO-1 mission, and Landsat 8 Oper-

ational Land Imager show the area after the event. The data

was collected by NASA Land Processes Distributed Active

Archive Center Program and ground truth is available indi-

cating changed areas [12]. We use this dataset to evaluate our

proposed approach for change detection. Here, it is common

to take the differences of images as input features for cluster-

ing. Instead, for our experiments, we stack all images and use

this as input to our clustering approach, because we found that

spectral clustering with sparse representation-based graphs is

more stable for high dimensional feature spaces. However,

for indicating change only Landsat 5 TM images can be

directly compared.

3.2. Experimental Setup

For our experiments, we construct an archetypal sparse graph

as explained in Section 2. In order to be efﬁcient in deriv-

ing WX, we restrict the dictionary for sparse representation

to the 3000 spatially nearest neighbors. The extraction of

archetypes follows the idea of [13] by using multiple ini-

tial points in order to ensure a complete ﬁnal set. The ﬁ-

nal number of clusters is ﬁxed to the number of extracted

archetypes for all tested algorithms. We compare our ap-

proach, using spectral clustering on a sparse representation-

based archetypal graph (SCASR), with k-means in original

data representation (k-meansO), and k-means on coefﬁ-

cients obtained from sparse representation with archetypal

dictionaries (k-meansASR).

For our approach, we deﬁne the archetypes as indicator

for change. Please note, that an archetype represents a pixel

in the stacked image at four points in time. An archetype is

supposed to indicate change, if its RMS difference between

the two Landsat 5 TM images, before and after the event, is

higher than a threshold. The threshold is given as the 0.9

quantile value obtained from all archetypes’ Landsat 5 TM

RMS differences. For k-meansO we use the obtained cluster

centers in the same way, as indicators for change.

3.3. Results

Fig. 2 shows the obtained results for the hyperspectral plant

dataset. By visual inspection, it can be seen that our approach

SCASR is able to ﬁnd different clusters for healthy plant parts,

specular reﬂections, leaf veins and disease symptoms. More

precisely, it is able to cluster two types of disease symptoms

(blue and red areas in Fig. 2 (d) and (h)), which are assigned

to two disease archetypes, illustrated in Fig. 1. However, also

the leaf borders are assigned to disease symptom, resulting

from erroneous measurements similar to the spectra of dis-

ease symptoms. In contrast to this, k-meansO is not able

Fig. 1. Left: Spectrum of healthy plant; Middle and right:

Two spectra of disease symptoms, which were identiﬁed by

archetypal analysis.

to cluster disease symptoms. The approach k-meansASR

detects disease symptoms, however it assigns only one clus-

ter to all of them. Moreover, it gives less smooth results and

noisy detections for some disease symptoms in comparison

to SCASR. After a visual inspection of the archetypes and a

manual assignment to speciﬁc archetypes to disease symp-

toms, a comparison with our annotation yield 163 out of 173

(94.2%) detected disease symptoms for SCASR and 137 out

of 173 (79.2%) for k-meansASR.

Fig. 3 shows the result for the Bastrop ﬁre dataset. By

visual inspection, all three approaches are able to distin-

guish most of the areas affected and not affected by ﬁre by

assigning different cluster sets to both areas. That means,

that the cluster centers detected by k-means as well as the

obtained archetypes are able to describe clusters for change.

However, a comparison of change gives signiﬁcant differ-

ences. Although k-meansO delivers multiple clusters which

could be assigned to change, some parts are missing and

are thus, assigned to clusters deﬁned by no change. For

SCASR, the region affected by ﬁre is clearly visible and less

effected by noise. A qualitative comparison to the ground

truth yielded a kappa coefﬁcient of 0.61 for k-meansO,

0.49 for k-meansASR and 0.80 for SCASR, which under-

lines that our approach is suitable for change detection with

limited manual input. We tested various quantile values to

decide on indicators for change, but for k-meansO as well

as k-meansASR a decrease of the quantile value led to an

increase of falsely detected change.

4. CONCLUSION

We propose a novel design for a directed graph as input to

spectral clustering. The graph is designed using sparse repre-

sentation, which includes information about relationships be-

tween all data samples, but also information about the relation

to archetypes. Our experiments conﬁrm that the clustering re-

sult yield discriminative and interpretable clusters for the task

of change detection and anomaly detection. Our approach can

also be used for multi-sensoral datasets, since the archetypes

are interpretable by experts.

Acknowledgements

The authors would like to thank Jan Behmann and Anne-

Katrin Mahlein for providing the hyperspectral image data

and helpful conversations. The Bastrop data is available from

the U.S. Geological Survey (http://earthexplorer.usgs.gov/).

REFERENCES

[1] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering:

Analysis and an algorithm,” Adv. Neural. Inf. Process. Syst.,

vol. 2, pp. 849–856, 2002.

[2] F. Hu, G.S. Xia, X. Wang, Z.and Huang, L. Zhang, and H. Sun,

“Unsupervised feature learning via spectral clustering of multi-

dimensional patches for remotely sensed scene classiﬁcation,”

IEEE J. Sel. Topics Appl. Earth Observ. in Remote Sens., vol.

8, no. 5, 2015.

[3] U. Von Luxburg, “A tutorial on spectral clustering,” Stat. and

Comput., vol. 17, no. 4, pp. 395–416, 2007.

[4] J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. S. Huang, and S. Yan,

“Sparse representation for computer vision and pattern recog-

nition,” Proc. of the IEEE, vol. 98, no. 6, pp. 1031–1044, 2010.

[5] S. Yan and H. Wang, “Semi-supervised learning by sparse rep-

resentation.,” in SDM. SIAM, 2009, pp. 792–801.

[6] R. Roscher, C. R¨

omer, B. Waske, and L. Pl¨

umer, “Landcover

classiﬁcation with self-taught learning on archetypal dictionar-

ies,” in IEEE IGARSS, July 2015, pp. 2358–2361.

[7] C. R ¨

omer, M. Wahabzada, A. Ballvora, F. Pinto, M. Rossini,

C. Panigada, J. Behmann, J. L´

eon, C. Thurau, and C. Bauck-

hage, “Early drought stress detection in cereals: simplex vol-

ume maximisation for hyperspectral image analysis,” Funct.

Plant Biol., vol. 39, no. 11, pp. 878–890, 2012.

[8] C. Thurau, K. Kersting, and C. Bauckhage, “Yes we can:

simplex volume maximization for descriptive web-scale matrix

factorization,” in Proc. CIKM. ACM, 2010, pp. 1785–1788.

[9] R. Roscher, J. Behmann, A.-K. Mahlein, J. Dupuis,

H. Kuhlmann, and L. Pl¨

umer, “Detection of disease symp-

toms on hyperspectral 3D plant models,” in ISPRS Annals of

Photogrammetry, Remote Sensing and Spatial Information Sci-

ences, 2016, pp. 89–96.

[10] A. Adler, M. Elad, Y. Hel-Or, and E. Rivlin, “Sparse coding

with anomaly detection,” J. Signal. Process. Syst., vol. 79, no.

2, pp. 179–188, 2015.

[11] Anna Choromanska, Tony Jebara, Hyungtae Kim, Mahesh Mo-

han, and Claire Monteleoni, “Fast spectral clustering via the

nystr¨

om method,” in ALT. Springer, 2013, pp. 367–381.

[12] M. Volpi, G. Camps-Valls, and D. Tuia, “Spectral alignment

of multi-temporal cross-sensor images with automated kernel

canonical correlation analysis,” ISPRS J. Photogramm. Remote

Sens., vol. 107, pp. 50–63, 2015.

[13] R. Roscher, S. Wenzel, and B. Waske, “Discriminative archety-

pal self-taught learning for multispectral landcover classiﬁca-

tion,” in Proc. PRRS, Workshop at ICPR, 2016.

(a) (b) (c) (d)

(e) (f) (g) (h)

Fig. 2. Results obtained for hyperspectral plant dataset. (a) Image data, (e) Image data, detail, (b) and (f) Clustering result with

k-means on original representation, (c) and (g) Clustering result with k-means on sparse representation coefﬁcients, (d) and (h)

Spectral clustering result on a sparse representation-based archetypal graph. Colors are arbitrary and indicate assignment to

clusters. Therefore, they are not related between the images.

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Fig. 3. Results obtained for the Bastrop ﬁre dataset. (a) Image data pre-event, (b) Image data post-event, (c) Ground-truth

information for change, (d) Clustering result with k-means on original representation, (e) Clustering result with k-means on

sparse representation coefﬁcients, (f) Spectral clustering result on a sparse representation-based archetypal graph, (g) Detected

change using k-means on original representation, (h) Detected change using k-means on sparse representation coefﬁcients, (i)

Detected change using spectral clustering result.