Content uploaded by Martin Boldt

Author content

All content in this area was uploaded by Martin Boldt on Sep 09, 2016

Content may be subject to copyright.

Clustering Residential Burglaries Using Modus

Operandi and Spatiotemporal Information

Anton Borg

*

and Martin Boldt

†

Department of Computer Science and Engineering

Blekinge Institute of Technology

371 79, Karlskrona, Sweden

*

anton.borg@bth.se

†

martin.boldt@bth.se

Published 17 December 2015

To identify series of residential burglaries, detecting linked crimes performed by the same

constellations of criminals is necessary. Comparison of crime reports today is di±cult as crime

reports traditionally have been written as unstructured text and often lack a common infor-

mation-basis. Based on a novel process for collecting structured crime scene information, the

present study investigates the use of clustering algorithms to group similar crime reports based

on combined crime characteristics from the structured form. Clustering quality is measured

using Connectivity and Silhouette index (SI), stability using Jaccard index, and accuracy is

measured using Rand index (RI) and a Series Rand index (SRI). The performance of clustering

using combined characteristics was compared with spatial characteristic. The results suggest

that the combined characteristics perform better or similar to the spatial characteristic. In terms

of practical signi¯cance, the presented clustering approach is capable of clustering cases using

a broader decision basis.

Keywords: Crime clustering; residential burglary analysis; decision support system; combined

distance metric.

1. Introduction

Internationally, studies suggest that a large proportion of crimes are committed by a

minority of o®enders, e.g., in the USA research suggests that 5% of o®enders are

involved in 30% of the convictions.

1

This is echoed by the Swedish law enforcement

agencies. Law enforcement agencies, consequently, are required to detect whether a

connection exists between crimes, e.g., whether crimes are linked. In this study a link

exists between residential burglaries that share one or more suspects. The detection

of linked crimes is helpful to law enforcement for several reasons. First, the aggre-

gation of information from crime scenes allows for an increase in available evidence.

Second, the joint investigation of multiple crimes enables a more e±cient use of law

enforcement resources.

2

Third, crime linkage is also bene¯cial for crime prevention,

community safety and other general policing functions.

International Journal of Information Technology & Decision Making

Vol. 15, No. 1 (2016) 23–42

°

cWorld Scienti¯c Publishing Company

DOI: 10.1142/S0219622015500339

23

Previously, clustering has been investigated as a method to group crimes based on

characteristics, often spatial and temporal characteristics.

3,4

Recently, other char-

acteristics have been investigated as well, on an individual basis.

5

Research into

estimating linkage using regression analysis has suggested that a combination of

characteristics provides a higher accuracy in linkage estimation. This study inves-

tigates a combined characteristics distance metric for the use in clustering residential

burglaries. Clustering residential burglaries based on di®erent similarity aspects

would potentially allow clustering solutions with a better accuracy and a broader

decision basis than individual characteristics. Similarly, it would potentially allow

law enforcement to ¯nd series whilst reviewing a smaller amount of residential

burglaries, i.e., used as a case selection decision support system (DSS). Consequently,

the use of a combined distance metric would allow law enforcement agencies to save

resources, whilst providing individual investigators with increased support.

1.1. Purpose statement

The purpose of this study is to investigate the e®ectiveness of a combined distance

metric compared to a spatial distance metric. Similarly, the e®ectiveness of di®erent

clustering algorithms are also investigated. The clustering quality is measured using

multiple evaluation metrics and evaluated using statistical tests. A modi¯ed version

of the RI is used to better re°ect the clustering solutions accuracy with regard to

series of residential burglaries. The data comprises residential burglaries from

southern Sweden and the Stockholm area.

1.2. Outline

Section 2presents the related work. Sections 3and 4explain the data and the

methodology. The results are presented in Sec. 5and analyzed in Sec. 6. Finally, the

results are discussed in Sec. 7and the conclusions of the paper presented in Sec. 8.

2. Related Work

Intelligence-led policing and predictive policing are about making law enforcement

less reactive and more proactive.

6

An important aspect of predictive policing is to

link related crimes into series. Much research has been focused on estimating series

based on spatiotemporal characteristics as well as investigating the e®ects con-

cerning repeat and near-repeat victimization.

3,4,7–9

Linking crime cases has been

investigated before, primarily estimating whether pairs of crime cases are connected.

The pair estimation has mostly been conducted for violent crimes with a high pos-

sibility for series.

2,10–14

But research has also been conducted into clustering crime

cases as a means of reducing the number of cases law enforcement o±cers have to

analyze when looking for possible series of crimes.

5,15

The clustering has been in-

vestigated for e.g., residential burglaries. Hotspot detection is a commonly used

technique that can be used to group cases based on spatial information to, based on

24 A. Borg & M. Boldt

density, predict future crime locations.

16–21

The research into clustering and pair-

wise link estimation, however, investigated using other crime characteristics, beside

spatial information.

There exist multiple crime characteristics which can be used for comparison, e.g.,

modus operandi (MO), spatial proximity, and temporal proximity. The MO can be

further divided into three domains; entry behavior, target characteristics, and goods

stolen.

22

Entry behavior describes the procedure used to enter the premises. Target

characteristics describe characteristics of the residence being targeted.

Studies have computed the similarity between pairs of crimes based on various

crime characteristics. Many of these studies have used similarity coe±cients between

cases, such as the Jaccard coe±cient.

2

Previous research has suggested that there is a

di®erence between the similarities of linked and unlinked residential burglaries, when

investigating pairs of crimes.

1,11,14,22

Earlier clustering research has investigated clustering using the cut-clustering

algorithm based on single, independent, crime characteristics.

5

Pair-wise link esti-

mation found that there are reasons to combine multiple characteristics.

14

This has

been suggested to increase the accuracy of clustering-based solutions for grouping

residential burglaries. Initial research has investigated model-based clustering to

combine di®erent aspects of crime data.

15

The performance of the cut-clustering algorithm investigated previously did not

produce clustering solutions with a high accuracy. The choice of clustering algo-

rithms a®ects the clustering solution and is dependent on the data investigated.

23

As

such, multiple clustering algorithms should be investigated to suggest an algorithm

more suitable to the domain.

While previous research cluster crimes were based on spatial data, temporal data

or single MO characteristics, this work extends this by also utilizing the additional

MO data into the proposed combined distance metric that is used for clustering

crimes. This enables the possibility to group burglaries based on MO characteristics.

3. Data

The data set consists of residential burglary reports, collected by law enforcement

o±cers according to a two page structured digital form that has been developed in

close cooperation between law enforcement and academia. The content of the form is

based on collected knowledge from crime analysts as well as relevant theory in the

¯eld. In total, the form consists of 114 binary parameters that captures speci¯cs

about the burglar's MO. All 114 parameters are represented as checkboxes in the

form, and as such the values are either 1 or 0 depending on whether checkboxes are

ticked or not. Each form is divided into 11 subsections, as described in Table 1.Asan

example one of the sections, including its parameters, is shown in Fig. 1. In addition

to the binary parameters, the form also includes input ¯elds concerning temporal and

spatial data, i.e., date and time intervals as well as geographical position (latitude,

longitude, and address).

Clustering Residential Burglaries 25

The form is integrated with a structured data collection process that increases the

quality of the collected data compared to traditional open text reports. This is

mainly because the form works as a checklist that guides the law enforcement o±cers

though mandatory questions to ask. Another positive e®ect that comes from using

the form, is due to the tick-based checkbox layout, which instantly discretized the

collected data, making it more easily interpreted by suitable analysis algorithms.

Once a form is ¯lled out, it is automatically veri¯ed and the law enforcement

o±cer is noti¯ed on any inconsistencies. When the automatic veri¯cation process is

passed, and any inconsistencies have been addressed, the form is registered in a

database and made accessible through a custom developed software-based analysis

system. In June 2015, there were approximately 12,000 residential burglary forms

stored in the database, all collected in the southern part of Sweden and the Stock-

holm area. More details regarding both the form and the associated analysis system

are available.

5

In addition to the data collected in the form, law enforcement o±cers have pro-

vided anonymized data about suspects connected to the residential burglary forms.

Using these labeled burglary forms, it is possible to connect cases that share at least

one, or more suspects, i.e., linking cases together into series. As such, a linked crime

pair is a pair of residential burglaries that share one or more suspects.

Table 1. Summary of parameters collected from crime scenes using the digital form.

Name of subsection Description #Parameters

Time and place Date and time range as well as residence address 7

Residential area Rural or urban, number of neighbors, etc. 7

Type of residency fVilla, townhouse, apartment, farmg, number of °ats, etc. 12

Burglary alarm If alarm existed, if it was enabled, activated, sabotaged 5

Object description Lights lit in/outside, member in neighborhood watch, etc. 10

Plainti® Plainti® away or home, prior suspicious events, etc. 15

Break in Method and location of break in 26

Search strategy How the residence was searched for goods 3

Stolen goods Categories of stolen goods, e.g., cash, gold, medicine, etc. 7

Trace evidence Trace evidence secured, e.g., DNA, ¯ngerprint, etc. 18

Miscellaneous Witness, con¯dential hints, and searchable goods 4

Total: 114

Fig. 1. An example of the residential area section depicting a residence located in an urban area with a

single neighbor and located next to a forest or ¯eld.

26 A. Borg & M. Boldt

The present work uses two di®erent data sets created by randomly sampling 100

burglary forms into each of the data sets from the original data set with 226 burglary

forms. The two data sets are denoted D1and D2henceforth and, thus, contain 100

o®enses each. As can be seen in Table 2, the labeled cases contain repeat o®enders

accounting for series that include between two and ¯ve burglaries. However, the

labeled cases also include single o®enders that law enforcement o±cers could only

connect to a single o®ense. The reason for including single o®enders in the study is

because they are used when calculating the Rand evaluation metric further described

in Sec. 4.3.

4. Method

This section describes the distance metrics and clustering algorithms that are eval-

uated using data from the burglary form introduced in the previous section.

Two distance metrics and a set of clustering algorithms are compared over two

data sets. The two data sets are sampled using simple random sampling, where

each data set has 100 instances. The two data sets are denoted D1and D2henceforth.

Two distance metrics are evaluated. The ¯rst is based on spatial data, considered

baseline, and the second is based on a combination of crime location data. Both are

explained further in Sec. 4.1.

A set of clustering algorithms is evaluated using the two distance functions on the

two data sets. The clustering algorithms used are described in Sec. 4.2. Each clus-

tering algorithm and distance function is evaluated on each data set 10 times, where

each data run is randomized so the clustering method produces di®erent clustering

results 10 times. This is done, e.g., by changing the seed if applicable. The number of

clusters is based on the prior knowledge of the series. As such, the number of clusters,

k, is set to the number of series available in each data set, e.g., as shown in Table 2.

It should be noted that a priori knowledge concerning kmight not always be

available, or for some algorithms not necessary. The value of ka®ects the perfor-

mance of the clustering and should be set appropriately, with a number of available

methods for ¯nding k.

24

Methods for investigating the optimal number of clusters,

however, are considered outside the scope of this study.

Table 2. Summary of labeled crimes and series for data set D1and D2.

Crime series size D1count Proportion D1(%) D2count Proportion D1(%)

5 1515

452014

3 6 18 4 12

2 16321020

1a25 25 59 59

Total: 100 100 100 100

aNot actual series but crimes where burglars could only be tied to one single crime.

Clustering Residential Burglaries 27

A set of evaluation metrics is recorded for each run. The evaluation metrics used in

the experiments are described in Sec. 4.3. For the RI evaluation, the clustering solution

is evaluated against the true clustering solution provided by law enforcement. This

enables the comparison of the distance metrics as well as the algorithms investigated.

4.1. Distance metric

Based on checkbox values within the 11 sections of the burglary form, it is possible

to calculate pair-wise similarity measures between cases using the Jaccard index.

Given two cases C1and C2, it is possible to calculate the resulting Jaccard index

by comparing attributes, i.e., the checkbox values, between the two cases according

to Eq. (1). Note that since a checkbox represents a binary value the equation for

calculating the similarity between binary asymmetric attributes is used instead of the

traditional Jaccard index.

JðC1;C2Þ¼ A11

A10 þA01 þA11

:ð1Þ

In Eq. (1), A11 represents attributes that are checked, i.e., given a value of 1, in

both case C1and C2.A10 and A01 represent attributes that are checked in C1but not

in C2, and vice versa. In this study, it is rather the distance between cases that is of

interest, and as such the Jaccard distance is used instead. The Jaccard distance is

complementary to the Jaccard index and is calculated according to Eq. (2).

dJðC1;C2Þ¼ A10 þA01

A10 þA01 þA11

:ð2Þ

By calculating pair-wise Jaccard distances, it is possible to compare burglary

cases with regard to the variables collected. Similarity analyses of burglaries have to

a large extent focused on a single variable as the basis for estimating the similarity

between cases. However, similarity between cases can also be measured using a

combination of multiple variables, e.g., both spatial and MO similarity. Studies that

have investigated linking crime pairs suggested that a combination of multiple

variables performed better than single alternatives.

In this study, a multivariate distance metric is investigated as basis for evaluating

similarity between cases. Table 3shows the mean distance between pairs of crime

cases for the di®erent variables. The table shows the mean for all pairs, not just

linked pairs. If just looking at the linked pairs, the mean (and standard deviation) for

the spatial characteristic is 27:149ð27:073Þkilometers, temporal ¼29:267ð27:943Þ

days, target ¼0:352ð0:133Þ,entrance ¼0:422ð0:102Þ, stolen goods ¼0:219ð0:132Þ,

victim behavior ¼0:161ð0:169Þ, and physical trace ¼0:384ð0:132Þ. Generally, the

linked mean is lower than for all pairs. The target selection variable, however, is not

lower for the linked pairs.

The multivariate distance metric is a weighted Euclidean distance that is calcu-

lated from the compounding variables shown in Table 3. The table also presents the

number of parameters from the structured burglary form that are included within

28 A. Borg & M. Boldt

each of the seven categories. The weights of each variable, shown in Table 3, are

based on the coe±cients from a Logistic regression analysis model previously de-

veloped based on the data, but in accordance with previous research.

14

As such, the

logistic regression analysis used the same feature-rich data as in this study and the

resulting regression coe±cients from that model are used as weights in the proposed

multivariate distance metric within this study. This is one way of deriving the

weights based on prior knowledge. The weights are important because it factors in

that the characteristics are not equally important.

5

It would also allow law en-

forcement to adjust weights according to other considerations, e.g., a speci¯c MO.

The total weighted combined Euclidean distance, dcombined , is calculated accord-

ing to Eq. (3), where Dspatial,Dtemporal,Dtarget ,Dentrance ,Dgoods,Dvictim, and Dtrace are

the included variables, and w1,w2,w3,w4,w5,w6, and w7are the associated weights

extracted from the regression model, as presented in Table 3.

dcombined ¼ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

w1ðDspatialÞ2þw2ðDtemporal Þ2þþw7ðDtrace Þ2

q:ð3Þ

The second distance metric used is the spatial distance metric. This is considered

state of the art. It is based on the euclidean distance, according to (4). It only

comprises the spatial distance between two crime locations, i.e., Dspatial .

dspatial ¼ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

ðDspatialÞ2

q:ð4Þ

4.2. Clustering algorithms

In this subsection, the four clustering algorithms used to evaluate the premise are

presented. The algorithms are chosen either because they are widely used or because

related studies have indicated the suitability. Whilst the K-means clustering algo-

rithm is one of the more popular algorithms, it does not function reliably on binary

data. Consequently, the K-means clustering algorithm was excluded. The default

implementation of the four clustering algorithms within the Weka machine-learning

software suite

a

were used, except for the Cut-clustering algorithm since it was not

a

http://www.cs.waikato.ac.nz/ml/weka/.

Table 3. Data characteristics.

Variable Metric Par Weight Min Max Median Mean (D)

Spatial Kilometers 3 1.025 0.0 558.140 197 248.061 (229)

Temporal Days 4 1.072 0.0 462.0 150 121.215 (95)

Target selection Jaccard 34 0.0 0.0 0.682 0.545 0.353 (0.135)

Entrance method Jaccard 26 4.799 0.0 0.737 0.677 0.452 (0.134)

Stolen goods Jaccard 10 2829 0.0 0.667 0.529 0.298 (0.157)

Victim behavior Jaccard 15 15.899 0.0 0.695 0.631 0.357 (0.151)

Physical trace Jaccard 22 2884 0.0 0.842 0.642 0.402 (0.181)

Par: shows how many parameters are used for the characteristic.

Clustering Residential Burglaries 29

included in Weka. The Cut-clustering algorithm was therefore implemented

according to the speci¯cation.

25

The default options were used for the weka algo-

rithms. All algorithms had access to a priori information regarding the number of

clusters to use in the analysis. In a real world setting, it would instead be possible to

use on methods for estimating the number of clusters. For instance, the self-tuning

variant of the Spectral clustering algorithm could be used.

26

However, this was not

investigated further in this study.

The Cut-clustering algorithm is a graph-based clustering algorithm relying on

minimum cut tree algorithms to cluster the input data, which is represented by an

undirected adjacency graph.

25

Each node in the graph is an instance and these nodes

are connected if the similarity between the corresponding instances is positive, and if

so the edge is weighted by the corresponding similarity score. The algorithm works

by adding the arti¯cial node to the existing graph and then connecting all nodes in

the graph with it. Then a minimum cut tree is computed and the arti¯cial node

removed. The clusters consist of the nodes connected after the arti¯cial node has

been removed. A high value, results in a higher number of clusters produced, and

vice versa. Using a binary search approach, it is possible to ¯nd the value pro-

ducing a speci¯c number of clusters. The current implementation uses a distance

function and converts the distance to a similarity according to the equation described

for the spectral clustering algorithm.

b

Alternative convertion formulas were tested,

e.g., 1=ð1þdðx;yÞ, but did not impact performance. In order to be comparable

against the spectral clustering algorithm, the same formula was used.

The Expectation-Maximization (EM ) clustering algorithm is a probability-based

clustering algorithm.

27,28

As such it does not use a distance metric. Instead, a set of

kprobability distributions assign attributes to instances within the a priori decided

kclusters. The clustering process is two-fold, ¯rst the initial values of the means

and standard deviations for each of the kprobability distributions are estimated.

Then, each probability that an instance belongs to each cluster is calculated. Second,

the means and standard deviation of each cluster distribution is recalculated

based on the latest clustering result. This process is continued until the classes that

instances are assigned to remain unchanged, which means the EM clustering

algorithm has converged to a maximum. Unfortunately, this might be a local instead

of the global maximum. Therefore, the whole process is repeated multiple times, with

di®erent initial estimate values of the means and standard deviations, to increase the

chance of ¯nding the global maxima. Finally, the largest maxima is selected and its

related kprobability distributions are used in any further clustering.

Hierarchical clustering algorithm is implemented using a either a top-down or

bottom-up (agglomerative) approach.

28

The agglomerative approach begins by

considering each instance as its own cluster. Next, the two clusters with the least

distance between them are identi¯ed and merged together into one new cluster.

Then, the process of ¯nding the two closest clusters and merging them is continued

b

http://www.luigidragone.com/software/spectral-clusterer-for-weka/.

30 A. Borg & M. Boldt

until only one ¯nal cluster exists. The output of the clustering is the sequence of

mergings that could be represented as a hierarchical clustering structure in the form

of a binary tree (dendrogram). A key part of the Hierarchical clustering algorithm

concerns the distance calculation between clusters. Several di®erent methods are

available, such as the single-linkage method that makes use of the minimum distance

between two clusters, which also makes it sensitive to outliers. Another method is the

centroid-linkage that calculates the centroid of a cluster based on its members' in-

ternal distances, and then uses the distance between centroids to determine the

closest clusters. The complete-linkage method computes the maximum distance

between two clusters.

28

The adjusted complete-linkage method, similar to the

complete linkage-method, computes the maximum distance between two nodes from

two clusters. The method then ¯nds the largest distance between nodes within either

of the two clusters and subtracts that from the maximum distance between the two

clusters.

28

The Hierarchical clustering algorithm in this paper uses three di®erent

approaches to calculate the distance between clusters, single-link, complete-link, and

adjusted complete-link.

Spectral clustering is a graph-based clustering algorithm that has been found to

generally detect good clustering solutions.

29,30

The algorithm takes number of clus-

ters and a similarity matrix as input, and calculates an nna±nity matrix for n

instances, where nis the number of instances in the data set.

30

Using Principle

Component Analysis it is possible to identify relevant Eigenvalues and their asso-

ciated Eigenvectors. Next, the Eigenvectors with su±ciently large Eigenvalues are

extracted, and the number of extracted Eigenvectors is equal to the number of

dimensions in the data set. Finally, dimension reduction is carried out by mapping

the extracted Eigenvectors into a new space where the instances could be more

e±ciently clustered. The currently used implementation is adapted to the Weka

framework and, as such, uses a distance function and converts the distance to a

similarity measure.

c

4.3. Evaluation metrics

One of the most important aspects of cluster analysis is the validation of clustering

results. Research into clustering has indicated that it is not reliable to use only a

single cluster validation measure.

23

It is preferable to use multiple measures that

re°ect di®erent aspects of a partitioning. In this study, ¯ve di®erent validation

measures are implemented. The quality of the clustering solution is estimated using

two validity indices, Connectivity and Silhouette index (SI). The connectivity is used

for measuring connectedness.

31

The SI is used for assessing compactness and sepa-

ration properties of a partitioning.

32

For evaluating the stability of a clustering

method, the Jaccard index is used.

33

RI and Series Rand index (SRI) are used for

assessing accuracy.

34

This measure is applied to calculate the agreement between

the clustering solution and the known clustering solution. The traditional RI is

c

http://www.luigidragone.com/software/spectral-clusterer-for-weka/.

Clustering Residential Burglaries 31

calculated using all instances and the SRI is calculated using only the instances that

belong in a series.

Connectivity captures the degree to which cases are connected within a cluster by

keeping track of whether the neighboring cases are put into the same cluster.

31

Let

miðjÞbe the jth nearest neighbor of case i, and let imiðjÞbe zero if iand jare in

the same cluster and 1=jotherwise. Then for a particular clustering solution (par-

tition) P¼fC1;C2;...;Ckgof data set M, which contains minstances (rows) with

ndi®erent experimental conditions or attributes (columns), the Connectivity is

de¯ned according to Eq. (5). It has a value between zero and in¯nity that should

be minimized.

ConnðPÞ¼X

m

i¼1X

n

j¼1

imiðjÞ:ð5Þ

Silhouette index re°ects the compactness and separation of clusters.

32

Let P¼

fC1;C2;...;Ckgbe a clustering solution (partition) of data set M, which contains m

cases. Then the SI is de¯ned according to Eq. (6). In the equation, airepresents the

average distance of case ito the other cases of the cluster to which the case is

assigned, and birepresents the minimum of the average distances of case ito cases of

the other clusters. The SI vary between 1 to 1 and higher value indicates better

clustering results.

sðPÞ¼ 1

mX

m

i¼1

ðbiaiÞ=maxfai;big:ð6Þ

The Jaccard index is used to evaluate the stability of a clustering method.

33

The

considered clustering method is randomized so it produces di®erent clustering results

p¼10 times. The averaged Jaccard index is computed over all pðp1Þ=2 pairs of p

outcomes for each of the data sets D1and D2individually. The Jaccard index is

calculated as follows. Given a pair of clustering solutions of the same data set (M), P1

and P2,ais de¯ned as the number of pairs that belong to the same cluster in P1as

well as in P2. Let bbe the number of pairs that belong to the same cluster in P1but

not in P2. Further, cis de¯ned to be the number of pairs that belong to the same

cluster in P2but not in P1. The Jaccard index between P1and P2is then de¯ned as

in Eq. (7).

JðP1;P2Þ¼ a

aþbþc:ð7Þ

The Rand index is used to calculate the accuracy of cluster solutions (partitions).

This allows for a measure of agreement between two partitions, P1and P2, of the

same data set (M). Each partition is viewed as a collection of mðm1Þ=2 pairwise

decisions, where mis the number of cases. For each pair of cases giand gjin M,

the partition either assigns them to the same cluster or to di®erent clusters. Let abe

the number of decisions where giis in the same cluster as gjin P1and in P2. Let bbe

32 A. Borg & M. Boldt

the number of decisions where the two cases are placed in di®erent clusters in both

partitions. Total agreement, thus accuracy, can then be calculated using Eq. (8). The

RI ranges between 0 to 1, where a higher value indicates a higher accuracy. P2is

known beforehand and is based on labeled data.

RandðP1;P2Þ¼ aþb

mðm1Þ=2:ð8Þ

The Series Rand index is used to calculate the accuracy, but with emphasis on

series. This is implemented similar to the traditional RI, but instead only measures

the agreement of two clustering solutions with regard to cases that are part of a

series, i.e., disregarding from crimes that do not belong to a series.

5. Results

The results are presented in four mnmatrixes (one for each metric) per algorithm

and distance measure. The Cut-clustering algorithm failed to produce nontrivial

clustering solutions when using the combined distance metric, and only produced

nontrivial clustering solutions in 50%of the runs when using the spatial distance

metric. As such, there are no metrics available for the Cut-clustering algorithm when

using the combined distance metric. The Connectivity (Table 5) and SI (Table 4)

indicate the clustering quality. The measured SI can be seen in Table 4. It seems that

while the Spectral clustering algorithm performs better using the combined metric,

the Silhouette indexes of the other algorithms are quite similar.

Table 4. Mean SI for the algorithms and distance functions.

Combined1Combined2Spatial1Spatial 2

Cut 0.46 0.18

EM 0.81 0.86 0.82 0.88

HierarchicalClusterer (Adj. Complete) 0.46 0.44 0.47 0.46

HierarchicalClusterer (Complete) 0.46 0.45 0.48 0.44

HierarchicalClusterer (Single) 0.46 0.45 0.48 0.47

Spectral 0.66 0.62 0.50 0.44

Table 5. Mean connectivity index for the algorithms and distance functions.

Combined1Combined2Spatial1Spatial 2

Cut 49.50 99.00

EM 90.70 97.50 90.70 97.50

HierarchicalClusterer (Adj. Complete) 85.80 91.20 92.50 86.30

HierarchicalClusterer (Complete) 96.70 97.40 97.80 96.50

HierarchicalClusterer (Single) 87.10 94.20 82.60 95.60

Spectral 98.20 97.90 97.40 96.80

Clustering Residential Burglaries 33

The connectivity index do not show any distinct di®erences between the spatial

and combined metric. In fact, for the Hierarchical clustering algorithm there is only

minor di®erence between the two distance functions, as can be observed in Table 5.

Tables 6and 7show the accuracy of the clustering solutions measured by the RI and

SRI, respectively. For both metrics, there are only negligible di®erences between the

combined and spatial metric, but the SRI shows a lower score than the RI. This is

because the accuracy of the clustering solutions are not in°ated by crimes not part of

a series, as the SRI only includes crimes part of a series of residential burglaries.

Table 8shows the stability of the clustering algorithms for the di®erent data sets

using the Jaccard index. The Jaccard index is used to indicate the stability of

the clustering solutions. The EM algorithm shows best performance with a Jaccard

index of around 0.5. The Cut-clustering algorithm only produced nontrivial clus-

tering solutions using the combined metric, and it produced trivial clustering

solutions in 50% of the cases when using the spatial metric. Therefore, the results of

Table 6. Mean RI for the algorithms and distance functions.

Combined1Combined2Spatial1Spatial 2

Cut 0.04 0.09

EM 0.96 0.97 0.96 0.97

HierarchicalClusterer (Adj. Complete) 0.89 0.92 0.89 0.92

HierarchicalClusterer (Complete) 0.97 0.97 0.97 0.97

HierarchicalClusterer (Single) 0.91 0.95 0.91 0.95

Spectral 0.98 0.98 0.98 0.98

Table 7. Mean SRI for the algorithms and distance functions.

Combined1Combined2Spatial1Spatial 2

Cut 0.10 0.12

EM 0.92 0.93 0.92 0.93

HierarchicalClusterer (Adj. Complete) 0.85 0.87 0.85 0.87

HierarchicalClusterer (Complete) 0.92 0.93 0.92 0.93

HierarchicalClusterer (Single) 0.86 0.92 0.86 0.92

Spectral 0.93 0.94 0.94 0.95

Table 8. Mean Jaccard index for the algorithms and distance functions.

Combined1Combined2Spatial1Spatial 2

Cut 0.47 0.65

EM 0.45 0.59 0.45 0.59

HierarchicalClusterer (Adj. Complete) 0.10 0.11 0.10 0.10

HierarchicalClusterer (Complete) 0.21 0.19 0.21 0.19

HierarchicalClusterer (Single) 0.10 0.12 0.10 0.12

Spectral 0.31 0.30 0.22 0.18

34 A. Borg & M. Boldt

Cut-clustering algorithm for the Jaccard metric should be discarded. The Spectral

algorithm produces more stable clustering solutions using the combined metric

compared to the spatial, around 0:3 and 0:2, respectively.

6. Analysis

The results evaluation is two-fold. First, the di®erence between the algorithms

performance for the two distance functions are evaluated using Wilcoxon's test.

Second, the performance of the di®erent algorithms is evaluated using Friedman's

test. The algorithm that has the best mean performance over multiple evaluation

metrics is investigated further using a Nemenyi post hoc test.

6.1. Distance metric comparison

For the Spectral clustering algorithm, the combined distance metric was signi¯cantly

better than the spatial distance metric with regard to SI(W¼12;p<0:05), RI

(W¼80;p<0:05), Jaccard index (W¼105;p<0:05), but not for Connectivity

(W¼138:5;p>0:05). With regard to SRI (W¼400;p<0:05), the spatial distance

metric performed signi¯cantly better. This can be observed in Figs. 2–5where the

observations of the Spectral clustering algorithm for both data samples have been

visualized using box-plots. While there are some outliers, the ¯gures show that the

two distance functions do not overlap. A signi¯cant di®erence was detected for the

Hierarchical Clusterer (Single) clustering algorithm (W¼278;p<0:05) with regard

to the SI, but not for the other metrics. There were no signi¯cant di®erences found

between the distance functions for Hierarchical Clusterer (Single) or Hierarchical

●

●

Combined Spatial

0.45 0.50 0.55 0.60 0.65 0.70

Silhouette Index

Distance function

Fig. 2. SI per distance metric for the Spectral clustering algorithm, indicating cluster solution quality.

●●●●

Combined Spatial

0.965 0.970 0.975 0.980

Rand Index

Distance function

Fig. 3. RI per distance metric for the Spectral clustering algorithm, indicating cluster solution accuracy.

Clustering Residential Burglaries 35

Clusterer (Complete) clustering algorithms. Since EM does not use a distance metric,

there was no reason to test this. As the Cut-clustering algorithm failed to produce

clustering solutions for the combined distance metric, it must be concluded that the

spatial distance metric is preferable in that case.

6.2. Algorithm comparison

Friedman's test was applied to the di®erent metrics to evaluate whether any algo-

rithm performed signi¯cantly better than another algorithm. Friedman's test found

signi¯cant di®erences between the algorithms for the RI (2¼14:428;df ¼3;

p<0:05) and the SRI (2¼12:149;df ¼3;p<0:05). The test found no signi¯cant

di®erences for the SI (2¼1:75;df ¼3;p>0:05) or the Connectivity index

(2¼12:28;df ¼3;p>0:05). Friedman's test found no signi¯cant di®erence for

the Jaccard index (2¼6:473;df ¼3;p>0:05).

The Nemenyi test for the RI shows that, in this case, the Spectral clustering

algorithm performed signi¯cantly better than the Cut-clustering algorithm and the

Hierarchical Clustering algorithm (using an adjusted complete link approach) at

p¼0:05 and p¼0:01, respectively (Table 9). The Hierarchical clustering algorithm

(using a complete link approach) also performed signi¯cantly better than the Cut-

clustering algorithm. For the SRI, Friedman's test found that the Spectral clustering

algorithm performed signi¯cantly better than the Cut-clustering algorithm and

the Hierarchical Clustering algorithm (using an adjusted complete link approach)

at p¼0:05 and p¼0:01, respectively (Table 10). No signi¯cant di®erence can be

detected between the other algorithms.

●●●

Combined Spatial

0.91 0.92 0.93 0.94 0.95

Series Rand Index

Distance function

Fig. 4. SRI per distance metric for the Spectral clustering algorithm, indicating cluster solution accuracy.

Combined Spatial

0.20 0.25 0.30 0.35

Jaccard Index

Distance function

Fig. 5. Jaccard index per distance metric for the Spectral clustering algorithm, indicating cluster solution

stability.

36 A. Borg & M. Boldt

6.3. Evaluation metric analysis

A correlation matrix between the variables was investigated to see if there were any

unlabeled evaluation metrics that could be used to indicate a higher RI. Tables 11

and 12 show how the di®erent variables correlate to each other for the spatial and

combined distance functions. Similar to the box-plots (Figs. 2–4), the data is limited

to the observations for the spectral clustering algorithm. As can be expected,

the RI and SRI closely correlate to each other regardless of the distance metric. The

Table 9. Nemenyi test results for RI.

Cut EM HC1HC2HC3Spectral

Cut

EM

HierarchicalClusterer (Adj. Complete)

HierarchicalClusterer (Complete) *

HierarchicalClusterer (Single)

Spectral ** *

Average Rank 6 3 5 2 4 1

Critical di®erence at p¼0:05 :3:769, Critical di®erence at p¼0:01 :4:449.

*denotes signi¯cant di®erence at p¼0:05, **denotes signi¯cant di®erence at p¼0:01

HC13: HierarchicalClusterer (Adj. Complete), HierarchicalClusterer (Complete), and

HierarchicalClusterer (Single).

Table 10. Nemenyi test results for SRI.

Cut EM HC1HC2HC3Spectral

Cut

EM

HierarchicalClusterer (Adj. Complete)

HierarchicalClusterer (Complete)

HierarchicalClusterer (Single)

Spectral ** *

Average Rank 6 2.5 5 2.5 4 1

Critical di®erence at p¼0:05 :3:769, Critical di®erence at p¼0:01 :4:449.

*denotes signi¯cant di®erence at p¼0:05, **denotes signi¯cant di®erence at p¼0:01

HC13: HierarchicalClusterer (Adj. Complete), HierarchicalClusterer (Complete), and

HierarchicalClusterer (Single).

Table 11. Correlation matrix for the Combined distance metric.

Connectivity SI RI SRI

Connectivity 1.00 0.10 0.07 0.07

SI 0.10 1.00 0.11 0.06

RI 0.07 0.11 1.00 0.97

SRI 0.07 0.06 0.97 1.00

Clustering Residential Burglaries 37

connectivity correlates negatively to the RI and SRI, also independent of distance

metric. This correlation is not surprising as a lower connectivity indicates a better

cluster solution. For the combined distance metric, there is a positive correlation,

albeit small, between the SI and RI. Surprisingly, there is a negative correlation

between the SRI and SI. This would indicate that, for the spatial distance metric, a

cluster solution which has problems separating clusters potentially has a higher

accuracy.

There is no clear metric that has a high correlation with either the RI or the

SRI. As such, using an evaluation metric which relies on unlabeled data to indicate a

high accuracy seems to be without basis.

7. Discussion

The results and the analysis showed that the combined distance metric performed

as good as or in certain cases better than the spatial distance metric. While

there were exceptions to this, the di®erence between the two in those cases were

negligible. There are advantages to the combined distance metric that are not

available to a single characteristics distance metric.

An advantage is the increased amount of information used. While the spatial

distance metric performs with similar results to the combined distance metric, it

could be argued that increasing the amount of information the clustering solution is

based on allows more robust decision making support. Also, while spatial analysis of

residential burglaries or other types of crimes, i.e., hotspot analysis, can be a good

indicator of crimes part of a series or indicating crime waves, there is no possibility

of identifying series of crimes committed over a longer time period or identifying

a series within a high risk area where multiple criminals operate frequently. In

these cases other information must be included, e.g., MO information. Whilst this

can be done manually by law enforcement o±cers, manual analysis is often resource

demanding, often limited to, e.g., violent crimes, and subject to increased risk of

operator error.

A second advantage to the combined distance metric is that it would allow law

enforcement o±cers to provide their own weights to the di®erent characteristics

based on their expert opinions. Providing clustering solutions that can be deemed

to be adapted to each individual investigation. However, default weights can be

provided based on solved crimes. A drawback of basing the default weights on solved

Table 12. Correlation matrix for the Spatial distance metric.

Connectivity SI RI SRI

Connectivity 1.00 0.27 0.13 0.16

SI 0.27 1.00 0.52 0.53

RI 0.13 0.52 1.00 1.00

SRI 0.16 0.53 1.00 1.00

38 A. Borg & M. Boldt

cases would be that they are biased towards the cases that law enforcement are able

to solve. At the moment, that is cases that have a close spatial and temporal dis-

tance. This could be remedied using organizational improvement, something that,

e.g., Swedish law enforcement is currently working on.

Another reason for using weights is that not all data collected are indicative of a

link between cases. In this case, the target selection characteristics does not seem to

di®er between linked and unlinked cases. It can be questionable whether such data

should be used in the clustering analysis. In certain cases, it might be bene¯cial,

according to law enforcement o±cers, and in such cases the weight for that char-

acteristic should probably be increased. In other cases, it could be necessary to

decrease the weight or remove the characteristic altogether. There are also quality

aspects that might indicate that certain data should not be included. In this study,

unstructured text is excluded as it is di±cult to translate it to structured form

without data quality loss, due to, e.g., use of synonyms, spelling mistakes, etc. Such

considerations must be made when considering analyzing the crime data.

21

A potential drawback to the combined distance metric is that not all clustering

algorithms can be used with it. This is due to the inclusion of binary data in the

instances. Algorithms such as the K-means clustering algorithm require non-binary

data. However, the Spectral clustering algorithm seems to be a good candidate. The

Spectral clustering algorithm performed signi¯cantly better than the Cut-clustering

and Hierarchical clustering algorithms regardless of which distance metric was used.

When evaluating clustering solutions with multiple singletons, the True Nega-

tives in°ate the RI. This is also true of clustering solutions with multiple smaller

clusters. The SRI provides accuracy based on how well the series has been clustered,

without taking into account crimes not part of any series. The SRI, however, is also

susceptible to the problem of multiple small clusters, albeit to a lesser extent than the

RI. The f-measure might be an alternative to the RI.

It should be noted that the number of clusters a®ects the clustering solution. In

this study, prior knowledge of the series in the data set was used to decide the

number of clusters. This information, however, is not always available. As such, it

could be that the value for kin this study is optimal and the results should be

interpreted as optimal. In practice, the value of kmight not always be optimal and

the results might be a®ected. It is worth noting that methods for ¯nding the value for

khave been investigated.

24

There is no cluster evaluation metric that has a high correlation with either the RI

or the SRI. Consequently, it is not possible from the results to identify an evaluation

metric that relies on unlabeled data capable of indicating a high accuracy. This is

unfortunate as the amount of labeled data for residential burglaries is likely to be

sparse. However, it is our opinion that the SI is still a reasonable evaluation metric

when labeled data is missing. The SI re°ects the compactness and separation of

clusters.

32

Each series of residential burglaries should have a high intra-series simi-

larity score and a low inter-series similarity score, which is similar to what the

SI evaluates.

2

The use of multiple evaluation metrics makes it possible to view the

Clustering Residential Burglaries 39

clustering as a multiple criteria decision making (MCDM) problem. Methods exist

for resolving disagreements among evaluation metrics.

35,36

8. Conclusion

The contributions of this paper include, but are not limited to, investigating a

method based on combined distance metrics for analyzing similarities between res-

idential burglaries. Further, its e®ective use by multiple clustering algorithms to

provide a decision based on several variables has also been investigated. Clustering

residential burglaries based on di®erent similarity aspects would potentially

allow clustering solutions with a better accuracy and a broader decision basis than

relying on individual characteristics, providing enhanced decision support for law

enforcement o±cers.

A combined distance metric for clustering residential burglaries has been inves-

tigated. The performance was evaluated based on multiple evaluation metrics using

¯ve clustering algorithms. The combined distance metric was compared against a

spatial distance metric representing the baseline. Wilcoxon's test show that the

combined distance metric generally performed similar or with a higher performance

than the spatial distance metric, but in a few cases it performed negligibly worse.

However, the combined distance metric has the advantage of using a more complete

picture of the residential burglary as the basis for the clustering of the burglary. As

such, it provides a better ground for clustering crime cases than single characteristics.

If burglary series extends both spatially and temporally the additional MO infor-

mation utilized in the present study could aid in linking crimes and thereby creating

useful crime clusters.

The choice of clustering algorithms impacts the performance as measured by the

evaluation metrics. Multiple algorithms were investigated. The evaluation metrics of

the algorithms were evaluated using Friedman's test and the Nemenyi test. The

Spectral clustering algorithm was the highest ranking algorithm and performed with

signi¯cantly better accuracy than the Cut-clustering algorithm and hierarchical

clustering algorithm. This suggests the feasibility of using the spectral clustering

algorithm in the criminology domain.

As knowledge of perpetrators is not common, it is argued that the SI is a rea-

sonable metric to use when evaluating cluster solutions of data without any

knowledge of the perpetrators. However, no clear correlation could be found between

the SI and the accuracy indices for the combined distance metric. This suggest that

for this domain the SI cannot be used to indicate high accuracy clustering solutions.

9. Future work

Two venues for future work have been identi¯ed. First, a study based on more

labeled data would allow the results to be more generalizable. Second, the approach

should be investigated for other crime categories, such as vehicle theft or various

40 A. Borg & M. Boldt

frauds. Di®erent crime categories have di®erent behavioral characteristics, and

whether clustering can be used to group series of crimes has not been investigated

using MO characteristics.

References

1. M. Tonkin, J. Woodhams, R. Bull, J. W. Bond and E. J. Palmer, Linking di®erent types

of crime using geographical and temporal proximity, Criminal Justice and Behavior 38

(11) (2011) 1069–1088.

2. J. Woodhams, C. R. Hollin and R. Bull, The psychology of linking crimes: A review of the

evidence, Legal and Criminological Psychology 12(2) (2010) 233–249.

3. J. H. Ratcli®e, The hotspot matrix: A framework for the spatio-temporal targeting

of crime reduction, Police Practice and Research: An International Journal 5(1) (2004)

5–23.

4. J. E. Eck, Crime hot spots: What they are, why we have them, and how to map them, in

Mapping Crime: Understanding Hot Spots (National Institute of Justice, Washington DC,

2004).

5. A. Borg, M. Boldt, N. Lavesson, U. Melander and V. Boeva, Detecting serial residential

burglaries using clustering, Expert Systems with Applications 44(11) (2014) 5252–5266.

6. M. Maguire and T. John, Intelligence led policing, managerialism and community en-

gagement: Competing priorities and the role of the national intelligence model in the UK,

Policing and Society: An International Journal of Research and Policy 16(1) (2006)

67–85.

7. K. Bowers and S. Johnson, Who commits near repeats? A test of the boost explanation,

Western Criminology Review 5(3) (2004) 12–24.

8. W. Bernasco, Them again?: Same-o®ender involvement in repeat and near repeat bur-

glaries, European Journal of Criminology 5(4) (2008) 411–431.

9. D. Johnson, The space/time behaviour of dwelling burglars: Finding near repeat patterns

in serial o®ender data, Applied Geography 41 (2013) 139–146.

10. C. Bennell and D. V. Canter, Linking commercial burglaries by modus operandi: Tests

using regression and ROC analysis, Science & Justice: Journal of the Forensic Science

Society 42(3) (2002) 153.

11. C. Bennell, N. J. Jones and T. Melnyk, Addressing problems with traditional crime

linking methods using receiver operating characteristic analysis, Legal and Criminological

Psychology 14(2) (2010) 293–310.

12. C. Bennell, D. Gauthier, D. Gauthier, T. Melnyk and E. Musolino, The impact of data

degradation and sample size on the performance of two similarity coe±cients used in

behavioural linkage analysis, Forensic Science International 199(1–3) (2010) 85–92.

13. C. Bennell, S. Bloom¯eld, B. Snook, P. Taylor and C. Barnes, Linkage analysis in cases of

serial burglary: Comparing the performance of university students, police professionals,

and a logistic regression model, Psychology, Crime & Law 16(6) (2010) 507–524.

14. L. Markson, J. Woodhams and J. W. Bond, Linking serial residential burglary: Com-

paring the utility of modus operandi behaviours, geographical proximity, and temporal

proximity, Journal of Investigative Psychology and O®ender Pro¯ling 7(2) (2010) 91–107.

15. B. J. Reich and M. D. Porter, Partially supervised spatiotemporal clustering for burglary

crime series identi¯cation, Journal of the Royal Statistical Society: Series A (Statistics in

Society) 178(2) (2015) 465–480.

16. Y. Xue and D. E. Brown, A decision model for spatial site selection by criminals: A

foundation for law enforcement decision support, IEEE Transactions on Systems, Man,

and Cybernetics, Part C: Applications and Reviews 33(1) (2003) 78–85.

Clustering Residential Burglaries 41

17. S. Wang, X. Li, Y. Cai and J. Tian, Spatial and temporal distribution and statistic

method applied in crime events analysis, 19th Int. Conf. Geoinformatics, 2011, Shanghai,

China, (2011), pp. 1–6.

18. G. Zhou, J. Lin and W. Zheng, A web-based geographical information system for crime

mapping and decision support, Int. Conf. Computational Problem-Solving (ICCP), 2012,

Leshan, China, (2012), pp. 147–150.

19. P. Phillips and I. Lee, Crime analysis through spatial areal aggregated density patterns,

Geoinformatica 15(1) (2011) 49–74.

20. G. Oatley, B. Ewart and J. Zeleznikow, Decision support systems for police: Lessons from

the application of data mining techniques to \soft" forensic evidence, Arti¯cial Intelli-

gence and Law 14(1–2) (2006) 35–100.

21. S. Chainey and J. Ratcli®e, GIS and Crime Mapping (John Wiley & Sons, US, 2005).

22. C. Bennell and N. J. Jones, Between a ROC and a hard place: A method for linking serial

burglaries bymodus operandi, Journal of Investigative Psychology and O®ender Pro¯ling

2(1) (2005) 23–41.

23. A. Borg, N. Lavesson and V. Boeva, Comparison of clustering approaches for gene

expression data, The 12th Scandinavian AI Conf. (Scai), Aalborg, Denmark, (2013),

pp. 55–64.

24. C. A. Sugar and G. M. James, Finding the number of clusters in a dataset, Journal of the

American Statistical Association 98(463) (2003) 750–763.

25. G. W. Flake et al., Graph clustering and minimum cut trees, Internet Mathematics 1(4)

(2004) 385–408.

26. L. Zelnik-Manor and P. Perona, Self-Tuning Spectral Clustering (MIT Press, Cambridge

MA, 2004).

27. R. Xu and D. Wunsch, Survey of clustering algorithms, IEEE Transactions on Neural

Networks 16(3) (2005) 645–678.

28. I. H. Witten, E. Frank and M. A. Hall, Data Mining Practical Machine Learning Tools

and Techniques, 3rd ed. (Elsevier Morgan Kaufman, 2011).

29. S. E. Schae®er, Graph clustering, Computer Science Review 1(1) (2007) 27–64.

30. J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Transactions on

Pattern Analysis and Machine Intelligence 22(8) (2000) 888–905.

31. J. Handl et al., Computational cluster validation in post-genomic data analysis, Bioin-

formatics 21 (2005) 3201–3212.

32. P. Rousseeuw, Silhouettes: A graphical aid to the interpretation and validation of cluster

analysis, Journal of Computational Applied Mathematics 20 (1987) 53–65.

33. P. Jaccard, The distribution of °ora in the alpine zone, New Phytologist 11 (1912) 37–50.

34. W. M. Rand, Objective criteria for the evaluation of clustering methods, Journal of the

American Statistical Association 66 (1971) 846–850.

35. G. Kou, Y. Lu, Y. Peng and Y. Shi, Evaluation of classi¯cation algorithms using MCDM

and rank correlation, International Journal of Information Technology & Decision

Making 11(1) (2012) 197–225.

36. G. Kou, Y. Peng and G. Wang, Evaluation of clustering algorithms for ¯nancial risk

analysis using MCDM methods, Information Sciences 275 (2014) 1–12.

42 A. Borg & M. Boldt