Content uploaded by Martin Boldt

Author content

All content in this area was uploaded by Martin Boldt on Feb 24, 2020

Content may be subject to copyright.

Detecting serial residential burglaries using clustering

Anton Borg

a,

⇑

, Martin Boldt

a

, Niklas Lavesson

a

, Ulf Melander

b

, Veselka Boeva

c

a

Blekinge Institute of Technology, School of Computing, SE-371 79 Karlskrona, Sweden

b

Blekinge County Police, Box 315, SE-371 25 Karlskrona, Sweden

c

Computer Systems & Technologies Department, Technical University of Soﬁa, Bulgaria

article info

Keywords:

Cut clustering

Residential burglary analysis

Crime clustering

Decision support system

abstract

According to the Swedish National Council for Crime Prevention, law enforcement agencies solved

approximately three to ﬁve percent of the reported residential burglaries in 2012. Internationally, studies

suggest that a large proportion of crimes are committed by a minority of offenders. Law enforcement

agencies, consequently, are required to detect series of crimes, or linked crimes. Comparison of crime

reports today is difﬁcult as no systematic or structured way of reporting crimes exists, and no ability

to search multiple crime reports exist.

This study presents a systematic data collection method for residential burglaries. A decision support

system for comparing and analysing residential burglaries is also presented. The decision support system

consists of an advanced search tool and a plugin-based analytical framework. In order to ﬁnd similar

crimes, law enforcement ofﬁcers have to review a large amount of crimes. The potential use of the

cut-clustering algorithm to group crimes to reduce the amount of crimes to review for residential

burglary analysis based on characteristics is investigated. The characteristics used are modus operandi,

residential characteristics, stolen goods, spatial similarity, or temporal similarity.

Clustering quality is measured using the modularity index and accuracy is measured using the rand

index. The clustering solution with the best quality performance score were residential characteristics,

spatial proximity, and modus operandi, suggesting that the choice of which characteristic to use when

grouping crimes can positively affect the end result. The results suggest that a high quality clustering

solution performs signiﬁcantly better than a random guesser. In terms of practical signiﬁcance, the

presented clustering approach is capable of reduce the amounts of cases to review while keeping most

connected cases. While the approach might miss some connections, it is also capable of suggesting

new connections. The results also suggest that while crime series clustering is feasible, further investiga-

tion is needed.

Ó2014 Elsevier Ltd. All rights reserved.

1. Introduction

Studies suggest that a large proportion of crimes are committed

by a minority of offenders, e.g. in the USA, researchers suggest that

5% of offenders are involved in 30% of the convictions (Tonkin,

Woodhams, Bull, Bond, & Palmer, 2011). Law enforcement agen-

cies, consequently, are required to detect series of crime, or linked

crimes. A series can be deﬁned as multiple offences committed by a

serial offender. A serial offender can be deﬁned as someone who

has committed two or more crimes of the same type (Woodhams,

Hollin, & Bull, 2010). It is suggested by law enforcement in Sweden

that, similarly to the international ﬁndings, a large proportion of

the residential burglaries are committed by professional criminals

that travel across large areas of Sweden. Simultaneously, according

to the Swedish National Council for Crime Prevention, law enforce-

ment agencies solved approximately three to ﬁve percent of the

21,300 reported residential burglaries in 2012.

The detection of linked crimes is helpful to law enforcement for

several reasons. Firstly, the aggregation of information from crime

scenes increases the amount of available evidence. Secondly, the

joint investigation of multiple crimes enables a more efﬁcient

use of law enforcement resources (Woodhams et al., 2010).

Law enforcement needs to handle a large amount of reported

crimes, and the detection of series of crimes are often carried out

manually. A decision support system that enables law enforcement

to decrease the amount of cases when reviewing crimes would in-

crease resource efﬁciency.

Forensic evidence, e.g. DNA, and ﬁngerprints, has been used to

detect linked crimes (Bennell & Canter, 2002; Tonkin et al.,

2011). The availability of forensic evidence is, however, limited

http://dx.doi.org/10.1016/j.eswa.2014.02.035

0957-4174/Ó2014 Elsevier Ltd. All rights reserved.

⇑

Corresponding author. Tel.: +46 455385854.

E-mail addresses: anton.borg@bth.se (A. Borg), martin.boldt@bth.se (M. Boldt),

niklas.lavesson@bth.se (N. Lavesson), vboeva@tu-plovdiv.bg (V. Boeva).

Expert Systems with Applications 41 (2014) 5252–5266

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier.com/locate/eswa

(Tonkin et al., 2011). In the absence of forensic evidence, behav-

ioural information can be used as an alternative data source

(Bennell & Canter, 2002). A criminal committing a series of crimes

has been found to have a high intra-crime behavioural similarity

(Woodhams et al., 2010). Similarly, behavioural consistency tends

to be lower between criminals in similar situations (Woodhams

et al., 2010).

This article presents a new decision support system (DSS) that

can be used to systematically collect burglary data and to perform

visualisations, analyses, and interpretations of the collected data.

The article evaluates a key component of the DSS: the use of clus-

tering techniques to group burglaries based on different deﬁnitions

of similarity between burglaries, described in Fig. 1. Clustering has

been used to group data according to similarity between data

points, or to ﬁnd communities in the data. Clustering residential

burglaries based on different similarity aspects would potentially

allow law enforcement to ﬁnd series whilst reviewing a smaller

amount of residential burglaries, i.e. used as a case selection DSS.

Consequently, the use of this DSS would allow law enforcement

agencies to save resources, whilst providing individual investiga-

tors with increased support. The clustering is performed using

the cut clustering algorithm (Flake, Tarjan, & Tsioutsiouliklis,

2004).

1.1. Purpose statement

The purpose of this study is twofold. First, a DSS for collecting,

managing and analysing residential burglary information is pre-

sented. Secondly, the potential of minimum cut based graph clus-

tering of crimes is investigated to reduce the amount of crimes to

review to detect series of residential burglaries. The impact of dif-

ferent edge representations and edge removal criteria on cluster

quality and accuracy is investigated. Clustering quality is measured

using the modularity index and accuracy is evaluated by applying

the rand index.

The data comprises residential burglary reports gathered from

southern Sweden and the Stockholm area.

1.2. Outline

The remainder of this work is organized as follows: Section 2

presents a DSS for residential burglary analysis. In Section 3, the re-

lated work is reviewed. Section 4then describes the minimum cut

clustering algorithm. In Sections 5and 6, the methodology and

experimental procedure is described. The results of the experi-

ments are presented in Section 7and analysed in Section 8. Conclu-

sions and future work is presented in Section 9.

2. Decision support system for residential burglary analysis

Since 2011, researchers from Blekinge Institute of Technology

collaborate with law enforcement ofﬁcers and analysts from the

Blekinge county police as well as four additional county police

authorities from southern Sweden. The aim is to develop Informa-

tion and Communication Technology (ICT) solutions for law

enforcement. The scope is currently limited to solutions that target

residential burglaries. The strategies, tactics, and overall organisa-

tional structure of the police vary between countries but the main

issues are shared between many countries.

In Sweden, the police is organised into 21 county police author-

ities, or regional units, where each correspond to a particular

county. The National Police Board (NPB) is the central administra-

tive and supervisory authority of the police service. The NPB com-

prises The National Bureau of Investigation and the Swedish

Security Service. In addition, the Swedish police includes the Swed-

ish National Laboratory of Forensic Science. In 2015, the Swedish

police will be re-organised into one national authority.

The collaboration between Blekinge Institute of Technology and

the Swedish police was formed to improve the capability to solve

residential burglary cases. In particular, the police are interested

in ICT software, and organisational changes, that improve the data

exchange and collaborative efforts of multiple county police

authorities when addressing serial crime. Engineers and research-

ers at Blekinge Institute of Technology developed a prototype DSS

for this purpose in 2012. Since then, the collaboration between

Fig. 1. A view of local crimes with red markers denoting similar crimes in the suggested DSS.

A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266 5253

academia and police has been extended to encompass authorities

responsible for two thirds of the Swedish population.

The DSS uses a web-based graphical user interface, which is

connected through program logic to a database with structured

information about residential burglaries. The crime data is col-

lected through a digital form, which can be observed in Appendix

B, which is being continuously developed in close collaboration be-

tween Blekinge Institute of Technology and the Swedish police. The

form forces police ofﬁcers at the crime scene to acquire speciﬁc

pieces of information about the modus operandi, the physical loca-

tion, and other types of information related to each crime. Before

the introduction of this form, the data collected varied extensively

between crime scenes with respect to quality, amount, personal

bias, and perspective.

The program logic in the DSS is centered around a straight-

forward search engine interface, which makes it possible to search,

ﬁlter, group, and compare crime scenes with respect to various

properties related to modus operandi, location, and so on. This

can be seen in Fig. 2. In addition to the comprehensive search en-

gine, the DSS features a plugin-based analysis framework, which

makes it possible to develop speciﬁc types of descriptive and infer-

ential statistical analyses of the crime scene data.

This article is focused on an analysis component developed for

the DSS. The component makes it possible to perform clustering on

crime scene data for various purposes. The aim of this article can

therefore be described as twofold: to introduce and describe the

DSS and the structured data collection of crime scene data as well

as to evaluate one particular type of analysis component.

3. Related work

The problem of linking reported crimes has mostly been inves-

tigated from a psychological or criminological perspective. The

research has focused on crimes conducted that can be considered

violent, e.g. sexual offences, rapes, homicides, and different types

of burglaries, including violent burglaries (Bennell & Canter,

2002; Bennell, Bloomﬁeld, Snook, Taylor, & Barnes, 2010a; Bennell,

Gauthier, Gauthier, Melnyk, & Musolino, 2010b; Bennell, Jones, &

Melnyk, 2010c; Markson, Woodhams, & Bond, 2010; Woodhams

et al., 2010).

The research conducted suggests that behavioural consistency

is present among offenders and that there exists an inter-individ-

ual variation (Woodhams et al., 2010). The behavioural consistency

between similar situations tends to increase with the experience of

the perpetrator. More speciﬁcally, an individual tend to behave

similarly in similar situation. Multiple individuals tend to behave

differently, to a certain degree, in similar situation (Woodhams

et al., 2010). The smaller temporal proximity between situations

usually results in an increased similarity for a perpetrator.

Different aspects of behaviours can be used for comparison, e.g.

modus operandi (MO), spatial proximity, and temporal proximity.

The MO can be further divided into three domains; entry behav-

iour, target characteristics, and goods stolen (Bennell & Jones,

2005). Entry behaviour describes the procedure used to enter the

premises, e.g. broke and entered through a window on the second

ﬂoor. Target characteristics denote characteristics of the residence

being targeted, i.e. isolated location, two-story building, alarm, etc.

Recent research on using MO characteristics have suggested the

effectiveness of the characteristics (Woodhams et al., 2010). Spatial

proximity has been shown to increase the hit ratio, i.e. the number

of detected linked cases, for some crime types, e.g. burglaries

(Woodhams et al., 2010). Spatial proximity have also been investi-

gated for use in groupings of crimes to detect where crimes con-

centrate in space and time, e.g. to detect hotspots, or to predict

future crime locations (Chainey & Ratcliffe, 2005; Oatley, Ewart,

& Zeleznikow, 2006;Phillips & Lee, 2011;Xue & Brown, 2003;

Fig. 2. A view of local crimes for a speciﬁc search in the suggested DSS.

5254 A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266

Wang, Li, Cai, & Tian, 2011; Zhou, Lin, & Zheng, 2012). Spatiotem-

poral correlations over longer time periods have been investigated

to further enhance hotspot detection (Toole, Eagle, & Plotkin,

2011). These approaches differs from crime linkage in that they de-

tect areas which are more likely to have crimes committed,

whereas crime linkage ﬁnds connections between crimesover larger

areas (Oatley et al., 2006). Different hotspot methods are used in

DSS for law enforcement agencies, e.g. to detect areas for resource

prioritization (Chainey & Ratcliffe, 2005;Phillips & Lee, 2011).

Some researchers have computed the similarity between pairs of

crimes based on various behaviours. Many of these studies have

used similarity coefﬁcients between cases, such as the Jaccard coef-

ﬁcient, to represent behavioural consistency (Woodhams et al.,

2010). The similarity scores have been used as input for logistic

regression analysis as well as used to plot a receiver operating char-

acteristics (ROC) curve for linked and unlinked cases (Bennell &

Jones, 2005; Bennell et al., 2010c; Markson et al., 2010; Tonkin

et al., 2011). The results have suggested that spatial proximity, and

temporal proximity, are better indicators to determine linked crime

than the MO characteristics (Markson et al., 2010). The MO charac-

teristics, however, was still found to be a signiﬁcant indicator (Ben-

nell & Jones, 2005; Markson et al., 2010). Using only temporal and

spatial proximity, a model was created which was able to correctly

classify 86:9%of crime pairs in the sample, compared to 80%for a

model using spatial proximity and 75:6%for a model using temporal

proximity (Markson et al., 2010). The MO characteristics-based

selection achieved an accuracy between 54:4%and 58:1%.

The data used in many of the reviewed studies were extracted

from law enforcement agencies, in some cases according to a

checklist (Markson et al., 2010). Since the data extraction was done

after the case information was reported, the case information

might be incomplete, as law enforcement ofﬁcers might not have

reported crimes in a systematic way, e.g. different aspects are con-

sidered important.

The overarching theme studied in the previous articles can be

described as detecting crimes that are similar. Detecting crimes that

are similar can be considered similar to the purpose of clustering,

where the goal is to distribute objects into separate groups.

Several investigations have tried to compute similarity scores

between pairs of crimes. Such scores can easily be translated into

a graph structure or an adjacency matrix. A graph can be described

as a set of nodes that can be connected with vertices of different

weights, e.g. a set of crimes as nodes with a similarity score as

vertice weight. A survey of the graph clustering domain was

conducted in 2007 (Schaeffer, 2007). Graph clustering have

successfully been used to identify communities/networks in other

settings (Flake, Lawrence, & Giles, 2000; Fortunato, 2010;

Newman, 2006). Community detection have been investigated

extensively and different methods summarized (Fortunato, 2010).

A graph can be divided into clusters based on a split criterion. This

approach is denoted divisive clustering. The split criterion can be

computed using several methods, e.g. maximum-ﬂow, spectral

methods, and Schaeffer (2007).

Previous work on the problem of linking residential burglaries

have suggested that there is a difference between the similarities

of linked and unlinked residential burglaries. The difference have

been investigated using pairs of crimes (Bennell & Jones, 2005;

Bennell et al., 2010c; Markson et al., 2010; Tonkin et al., 2011).

While the pair-wise comparison have suggested a possibility of

detecting links between cases, the studies have not investigated

approaches for detecting series of crimes. Consequently, each ser-

ies of residential burglaries should have a high intra-series similar-

ity score and a low inter-series similarity score, similar to the

description of community detection in graphs (Fortunato, 2010).

As such, clustering residential burglaries can be described as a

problem of grouping instances or detecting communities within

the data. One of the more recent graph clustering algorithms suit-

able for detecting communities in graphs is the cut clustering algo-

rithm suggested by Flake et al. (2004), Fortunato (2010), Görke,

Hartmann, and Wagner (2009).

4. Cut clustering algorithm

The cut clustering algorithm is a graph-based clustering algo-

rithm based on minimum cut tree algorithms to cluster the input

data (Flake et al., 2004). The input data used is an undirected graph

where the edges between nodes could represent a similarity or dis-

tance measure.

The algorithm can be described as follows: an artiﬁcial node is

added to the existing graph and connected to all nodes in the graph

with the edge value

a

. A tree is created from the graph using the

minimum cut tree algorithm (Cohen et al., 2011). The artiﬁcial

node is then removed from the tree and the nodes that are still

connected are considered part of different clusters (Flake et al.,

2004).

Algorithm 1. Cut clustering algorithm (Flake et al., 2004).

Input is a graph (G) with nodes (V) and edge weights (E).

1: function C

UT

C

LUSTERING

GðV;EÞ;

a

2: V

0

V\t

3: for all nodes

v

2Vdo

4: Connect tto

v

with edge weight

a

5: end for

6: G

0

(V

0

;E

0

) is the expanded graph after connecting tto V

7: Calculate the Min-cut Tree T

0

of G

0

8: Remove tfrom T

0

9: return all connected components as clusters of G

10: end function

The cut clustering algorithm (see Algorithm 1) is implemented

according to the original description (Flake et al., 2004). The min-

imum cut tree algorithm is implemented according to Gusﬁeld’s

speciﬁcation (Cohen et al., 2011). Gusﬁeld’s algorithm is described

further in Section 4.2. To ﬁnd the minimum cuts between two

nodes in the adjacency matrix, the Edmond-Karp maximum ﬂow

algorithm is used.

A property of the maximum ﬂow algorithm is that complete

graphs, or near complete graphs, can result in trivial clustering

solutions. This is due to that, in a complete graph, the minimum

cut can be trivial, i.e. cutting either the source or target node. In

such cases, the trees created will be either star-shaped, i.e. each

node is connected directly to the root node, or unary, i.e. each par-

ent containing one node. The clustering produced from such a tree

will be trivial. Consequently, this needs to be considered when cre-

ating the graph.

4.1. The

a

value

The

a

value is used when the artiﬁcial node is attached to the

other nodes. The outcome of the minimum cut clustering algo-

rithm is determined by the

a

value (Flake et al., 2004). The behav-

iour of the

a

value can be predicted. Given a high

a

value, several

clusters will be produced. As the

a

value decreases, fewer clusters

will be produced.

The

a

value can, when the number of desired clusters is known,

be discovered using e.g. a binary search for alpha values until the

wanted number of clusters is found (Flake et al., 2004). If the number

of desired clusters is unknown, a binary search can iterate over the

a

value until trivial clusters are no longer produced or the number of

A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266 5255

clusters produced is stabilized. This has been implemented accord-

ing to Algorithm 2 (Hamann, 2011). The boundary values are chosen

so that the clustering solutions produced with the boundary values

will be trivial, i.e. either a single cluster or several singletons.

Algorithm 2. Binary search for iterating alpha values.

1: min MinðGðEÞÞ

2: max MaxðGðEÞÞ

3: Cl 1

4: Cr jV

G

j

5: while min <max 1and Cl <Cr do

6: alpha ðmin þmaxÞ=2

7: c jCutClusteringðGðV;EÞ;alphaÞj .c gets the number of

clusters in the clustering

8: if c¼Cl then

9: min alpha

10: else if c¼Cr then

11: max alpha

12: else

13: if c>Cl and c<ðCr=2Þthen

14: Cl c

15: else if c<Cr and c>ðCr=2Þthen

16: Cr c

17: else

18: End Loop

19: end if

20: end if

21: end while

4.2. Minimum cut tree

Minimum cut trees can be created using, for example, two well-

known algorithms, the Gomorra-Hu algorithm or Gusﬁeld’s algo-

rithm (Cohen et al., 2011). In both, the maximum-ﬂow algorithm

is used n1 times. However, Gusﬁeld’s algorithm is considered

simpler in its implementation as the algorithm operates on a adja-

cency matrix as a representation and requires no contractions or

expansions of the graph, contrary to the Gomorra-Hu algorithm.

Parallel implementations are supported by both algorithms. Gus-

ﬁeld’s algorithm is presented in Algorithm 3.

In Gusﬁeld’s algorithm the parent of each node in the tree is

tracked. Initially all the nodes in the graph are pointed to the ﬁrst

node. In each iteration, the source node, s, is picked such that it has

not been used before and the target node, t, is the parent of the s

node. Using the maximum ﬂow algorithm, the minimum cut is

then found between sand t. Any neighbour belonging to tthat is

on the same side of the cut as sand have not been used as source,

have their parent changed to s.

Algorithm 3. Gusﬁelds Minimum Cut Tree Algorithm (Cohen

et al., 2011).

1: function M

IN

C

UT

T

REE

GðV;EÞ;c.A weighted, undirected

graph

2: for i¼1!jV

G

jdo

3: tree

i

1

4: end for

5: for s¼2!jV

G

jdo .jV

G

j1 maximum ﬂow

iterations

6: t tree

s

7: flow

s

max-ﬂow (s;t)

8: {X;X} minimum s-tcut

9: for u2V

G

;u>sdo

10: if u2Xthen

11: tree

u

s

12: end if

13: end for

14: end for

15: V

T

V

G

.Build the minimum cut tree

16: E

T

;

17: for s¼2!jV

G

jdo

18: E

T

E

T

[fs;tree

s

g

19: fðfs;tree

s

gÞ flow

s

20: end for

21: return T¼ðV

T

;E

T

;fÞ

22: end function

5. Data and method

5.1. Data collection

The data consist of residential burglary incident reports col-

lected in a systematic way by law enforcement ofﬁcers over a per-

iod of six months. The incident reports are collected through a

checkbox-based form, providing a common base of data collected.

Table 1

Mean clustering measurement for Experiment 1.

1st-Quantile 2nd-Quantile 3rd-Quantile Average MeanTest None JTD

(a) Modularity

d1 0:250ð0Þ0:174ð0:119Þ0:133ð0:123Þ0:250ð0:000Þ0:088ð0:112Þ0:125ð0:114Þ0:102ð0:094Þ

d2 0:250ð0Þ0:042ð0:081Þ0:074ð0:097Þ0:070ð0:100Þ0:011ð0:019Þ0:070ð0:102Þ0:108ð0:083Þ

d3 0:250ð0Þ0:086ð0:102Þ0:110ð0:121Þ0:049ð0:075Þ0:077ð0:096Þ0:082ð0:116Þ0:087ð0:076Þ

d4 0:250ð0Þ0:129ð0:113Þ0:025ð0:038Þ0:056ð0:227Þ

0:041ð0:079Þ0:086ð0:101Þ0:082ð0:072Þ

d5 0:250ð0Þ0:062ð0:100Þ0:131ð0:121Þ0:150ð0:143Þ0:082ð0:102Þ0:086ð0:114Þ0:128ð0:090Þ

(b) Coverage

d1 0:000ð0Þ0:000ð0:000Þ0:001ð0:002Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ

d2 0:000ð0Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ

d3 0:000ð0Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:001ð0:002Þ

d4 0:000ð0Þ0:003ð0:006Þ0:001ð0:001Þ0:063ð0:200Þ0:000ð0:000Þ0:000ð0:000Þ0:001ð0:002Þ

d5 0:000ð0Þ0:016ð0:050Þ0:004ð0:012Þ0:018ð0:056Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ

(c) Number of Clusters

d1 250:0 180:8 147:6 250:0 142:0 164:2 165:7

d2 250:092:6 134:8 122:553:3 117:7 171:2

d3 250:0 136:1 152:8 114:6 135:2 126:9 159:4

d4 250:0 166:786:1 149:693:9 135:9 156:8

d5 250:0 132:5 181:9 193:5 141:6 136:0 186:2

d1-d5: Jaccard Goods, Jaccard Residence, Jaccard MO, Spatial Proximity, Temporal Proximity.

Standard Deviation within parentheses.

5256 A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266

The form used consists of eleven sections and 107 checkboxes. In

addition to the checkboxes, information about time, date and geo-

graphical position (longitude, latitude and street address) of the re-

ported incident is also gathered. If required, a ﬁeld for unstructured

textual descriptions or observations also exists. This ﬁeld allows

law enforcement ofﬁcers to enter additional information of

importance.

The incident reports have been gathered from the southern part

of Sweden and the Stockholm area. The reports comprise 2,416 re-

ported residential burglaries. Of the incident reports, law enforce-

ment ofﬁcers have provided anonymized information about

suspects for 24 residential burglaries, allowing connections be-

tween cases to be established.

5.2. Data representation

The instances are inserted into an nnadjacency matrix, and

for each pair in the adjacency matrix a similarity index is com-

puted as an edge representation. This process is repeated so that

adjacency matrices exist for several similarity indices.

The produced clustering solutions are saved using the DIMACS

format.

1

5.2.1. Edge representation

The edges in the graph are represented by different similarity

coefﬁcients, making the edge weights a measure of similarity be-

tween nodes. The similarity coefﬁcients have been chosen based

on results suggested in previous research. First, the Jaccard index

is computed between crime pairs based on three different MO

characteristics, complete MO characteristics, residential character-

istics, and stolen goods information. Secondly, spatial and temporal

proximity is computed between crime pairs based on geodesic dis-

tance and temporal distance (measured in days) respectively. The

Jaccard calculation is expanded upon in Appendix A.

5.2.2. Edge removal criteria

The minimum cut tree algorithm, when given complete graphs

or near-complete graphs, can produce trees that are star-shaped,

i.e. each node is connected directly to the root node, or unary

(see Section 4). Consequently, it is possible that the clustering

can be improved by converting complete graphs into incomplete

graphs. Two approaches for this conversion are investigated.

In the ﬁrst approach, several threshold values are computed and

the graphs are pruned based on these values, by keeping only the

edges where the nodes are considered similar to a certain degree.

Threshold edge removal for graph transformation can be consid-

ered a global approach, in that a single threshold value is computed

and used for all edges in the graph. Only edges where the nodes are

considered similar to a certain degree, e.g. below the threshold va-

lue, are kept. The thresholds are deﬁned as the mean and the quar-

tile values.

The second approach use time and distance based measures,

and given the outcome the edge is either removed or the weight

is changed to indicate lesser similarity. The distance-based edge re-

moval can be considered local, i.e. only a single edge is investigated

at a time. Given this, the criteria for removing an edge can be dif-

ferent for each edge. The measures are based on The Mantel Cross

product adaption, and the Journey Time Distance (JTD) (Chainey &

Ratcliffe, 2005). The JTD criteria removes edges between cases that

are physically impossible to have been committed by the same

burglars, i.e. the spatial distance is too large for the temporal span.

The Mantel Cross product adaption is based on the Mantel index,

which is a correlation test between time and distance for pairs of

instances (Levine, 2010). Both measures are expanded upon in

Appendix A.

5.3. Cluster validation measurements

The following cluster validation measurements are used to

measure the quality and accuracy of the minimum cut clustering

algorithm.

True Positive (TP) is a pair of nodes in the same cluster that are

linked to each other. False Negative (FN) is a pair of nodes in differ-

ent clusters that are linked to each other. True Negative (TN) is a

pair of nodes in different clusters that are not linked to each other.

False positive (FP) is a pair of nodes in the same cluster that are not

linked to each other.

Rand Index (RI) is the percentage of correct decisions, i.e. how

well the clustering algorithm has grouped the residential burglar-

ies. RI for clustering can also be denoted Accuracy. One problem

with RI is that, in certain cases, as the number of clusters increase,

the RI increases (Santos & Embrechts, 2009). The RI is computed as:

RI ¼TN þTP

TN þTP þFP þFN ð1Þ

Modularity is a cluster quality index that can be used to measure

how well the clusters group and separate instances, i.e. intra-clus-

ter density and inter-cluster sparsity. It is based on the premise that

the fraction of edges between nodes in a cluster should be higher

than the expected fraction of edges between nodes in a cluster to

indicate signiﬁcant group structure, see Eq. (2) (Brandes, Delling,

Gaertler, Görke, & Hoefer, 2007; Newman, 2003, 2006). The modu-

larity index maps onto [-1,1].

Q¼X

c2C

jEðcÞj

jEjP

v

2c

degð

v

Þ

2jEj

2

"# ð2Þ

Coverage is a cluster quality index based on intra-cluster den-

sity. It is related to modularity, as modularity is in essence cover-

age subtracted with the expected coverage. Coverage computes

the edges within a cluster divided by the total number of edges,

see Eq. (3).

Co

v

¼X

c2C

jEðcÞj

jEjð3Þ

6. Experiment design

The following two aspects of residential burglary clustering are

investigated. First, the impact of different similarity indices as edge

representations and of different edge removal criteria on the qual-

ity of the clusters produced. Second, the performance with which

the minimum cut algorithm is able to group residential burglaries

without splitting series of crimes.

6.1. Hypothesis

The following hypotheses are investigated in this study.

Experiment 1. The hypotheses of Experiment 1 can be described

as follows:

The choice of edge representation and edge removal criteria can

positively affect the quality of the clusters produced. If the null

hypothesis is not supported, the alternate hypothesis states that

the choice of edge representation affects the quality of the

clustering.

Experiment 2. The hypothesis of Experiment 2 is that high qual-

ity clustering solutions of residential burglaries can result in fewer

crimes to analyze whilst keeping series intact.

1

http://lpsolve.sourceforge.net/5.5/DIMACS_maxf.htm, 2013-02-24.

A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266 5257

6.2. Experiment 1: cluster quality

The ﬁrst experiment investigates how different edge represen-

tations and edge removal criteria affect the quality of the clusters

created by the minimum cut clustering algorithm. The experiment

consists of two independent variables: edge representation and

edge removal criteria. Each variable has several levels as described

in Sections 5.2.1 and 5.2.2. As such a XYfactorial design, where

X and Y corresponds to the variable levels, is used as an experimen-

tal design (Shadish, Cook, & Campbell, 2002).

The dependent variable of the experiment design is the modu-

larity. Each combination of variable levels is tested 10 times. For

each repetition, a subsample of the dataset is created using simple

random sampling with replacement. The subsample consists of

250 instances.

A between-subjects factorial analysis of variance (ANOVA) is

used to evaluate a factorial experiment design. The between-

subjects factorial analysis of variance allows evaluation of possible

interaction between variables, as well as evaluating signiﬁcant dif-

ference between variables and levels. Interaction is when the com-

bination of two variables affect each other and thus the dependent

variable in a unpredictable way (Sheskin, 2007).

If there is a signiﬁcant difference between the factorial combi-

nations, a post hoc test is used after the between-subjects factorial

analysis of variance to detect which factorial combination per-

forms better. In this case, the post hoc test used is Fisher’s LSD test

(Sheskin, 2007). Fisher’s LSD test is vulnerable to type II errors, i.e.

incorrectly supporting a null hypothesis, but have a lower chance

of making type I errors, i.e. incorrectly rejecting a null hypothesis.

If the difference for a comparison is less than Fisher’s LSD value

(CD

LSD

), the null hypothesis is supported. The statistical tests are

conducted using R and the ezAnova package.

6.3. Experiment 2: crime distinction

The second experiment investigates whether residential bur-

glaries can be clustered with high quality whilst keeping series

of crime intact. The accuracy of the clustering is measured using

the RI. The experiment design is similar to Experiment 1 and con-

sists of two independent variables: edge representation and edge

removal criteria. Each variable has several levels as described in

Section 5.2.1 and 5.2.2. Similar to Experiment 1, XYfactorial de-

sign, is used. The dependent variables of the experiment design are

the modularity and RI.

This experiment uses the labeled instances. Labeled instances

are instances where law enforcement agencies have provided

information whether the instance is known to be part of a series

or not. For each repetition, a subsample of the dataset is created

using simple random sampling with replacement from the labeled

instances. The subsample consists of 24 instances. The experiment

uses an identical design to Experiment 1, with the exception of an

additional dependent variable, RI.

For the second experiment, the statistical test outlined in Sec-

tion 6.2 is carried for both dependent variables. To detect relation-

ships between the modularity and RI, Pearson’s correlation

coefﬁcient is used.

7. Results

7.1. Experiment 1

According to the modularity cluster validation measure (see

Table 1(a)), the 1st-Quantile has the worst performance of all the

different edge removal criteria. Similarly, the edge Jaccard Goods

and Temporal proximity representations are performing worse

than other representations. The performances of these edge repre-

sentations indicate that Jaccard Goods and Temporal proximity are

unsuitable for representing differences between crime cases. The

goods available in the form are a few general items, e.g. such as

electronics.

It should be noted that none of the modularity results are posi-

tive, indicating that the cluster solutions produced have a lower

fraction of edges within clusters than the expected fraction of

edges between clusters. As such, the clustering solution can not

be said to have separated clusters well. The clustering solution

has most likely created to many clusters of crimes, meaning that

crimes that are most likely supposed to be grouped together are

not. This makes the job of the analyst harder in that the number

of crimes is reduced too much and connections could be

overlooked.

The coverage of several different edge representations and edge

removal criteria scored 0, indicating that a high number of single-

ton clusters were produced (see Table 1(b)), i.e. that crimes are not

considered to be connected to any other crimes. The coverage clus-

ter validation score is not surprising the negative modularity score,

as the modularity index incorporate similar measure similar as-

pects. However, as the coverage score is mostly 0, focus will be

placed on the modularity score henceforth.

7.2. Experiment 2

In the results of the clustering solutions produced for Experi-

ment 2, a pattern can be observed in that none of the clustering

solutions yield a high modularity index (see Table 2(a)). The highest

mean modularity index that can be observed is 0:018, followed by

0:019. As the modularity index is an index that ranges from 1to

1 and where a positive index value indicate a higher number of

edges within the clusters, the indices produced cannot be consid-

ered good. In fact, all pairings of edge representations and edge re-

moval criteria produce a negative modularity index. Of the edge

removal criteria, the ﬁrst quartile and the JTD function have the

worst modularity score. Looking at the corresponding pairings in

Table 2(d), these edge removal criteria have produced clustering

solutions that almost consist of singular clusters, i.e. the number

of clusters is close to the number of nodes. Looking at the number

of clusters produced (see Table 2(d)), several factor combinations

produce a high number of singleton cluster solutions, i.e. groups

of crimes can not be produced. The coverage cluster validation mea-

surement (see Table 2(b)) indicates that most of the clustering solu-

tions produced have a low intra-cluster cluster density on average.

This would indicate that the crimes have been separated into too

many groups or individual crimes.

In Table 2(c), for the groupings that have produced singular

clustering solutions, a high Rand Index has been observed. This re-

sult however, can be attributed to the data, which in singular clus-

ters, have several instances should not be connected and thus

increase the Rand Index score. That is to say, when the algorithms

creates groups of singular crimes, the fact that it successfully sep-

arates crimes that should not be connected takes precedence over

connecting crimes that should be connected. But the goal is to pro-

duce smaller groups of crimes for analysis, not single crimes. As

such, when looking at the accuracy of the grouped crimes (i.e. rand

index), one must also consider secondary cluster validation mea-

surements, e.g. the number of clusters produced or the modularity

index. A low number of clusters and a high Rand Index can be con-

sidered positive, as this would mean that the number of crimes to

analyse is reduced. Similarly, a high modularity and a high Rand In-

dex can also be considered indicative of a good clustering solution.

A high number of clusters together with a high Rand Index indi-

cates a clustering solution that have scattered known crimes, e.g.

series of crimes are not grouped. An example of a clustering

5258 A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266

solution can be seen in Fig. 3. The modularity and Rand Index of the

cluster solution is 0:0103 and 0:5579 respectively. In this exam-

ple, the cut clustering algorithm has not been able to keep series

intact, which is indicated in the low rand index.

8. Analysis

The results of the experiments are analysed using an ANOVA

test to detect if there exist any statistically signiﬁcant difference

between the variables. Fisher’s LSD test is used to detect between

which variables statistical signiﬁcant differences exist.

8.1. Experiment 1

The means and standard deviations are presented in Table 1. For

the modularity cluster validation measurement, two factor analy-

sis of variance showed a signiﬁcant effect for the edge representa-

tion, Fð4;315Þ¼6:112;p<:05; a signiﬁcant effect for the edge

removal criteria, Fð6;315Þ¼20:001;p<0:005; but not any

Table 2

Mean clustering measurement for Experiment 2.

1st-Quantile 2nd-Quantile 3rd-Quantile Average MeanTest None JTD

(a) Modularity

d1 0:250ð0Þ0:091ð0:107Þ0:046ð0:060Þ0:250ð0:000Þ0:053ð0:088Þ0:057ð0:088Þ0:225ð0:035Þ

d2 0:250ð0Þ0:035ð0:024Þ0:045ð0:075Þ0:070ð0:092Þ0:018ð0:099Þ0:032ð0:052Þ0:207ð0:029Þ

d3 0:250ð0Þ0:037ð0:089Þ0:020ð0:096Þ0:029ð0:012Þ0:019ð0:012Þ0:072ð0:095Þ0:224ð0:024Þ

d4 0:250ð0Þ0:031ð0:068Þ0:021ð0:207Þ0:030ð0:094Þ

0:081ð0:095Þ0:120ð0:103Þ0:235ð0:025Þ

d5 0:250ð0Þ0:089ð0:103Þ0:048ð0:059Þ0:060ð0:092Þ0:072ð0:092Þ0:045ð0:075Þ0:246ð0:012Þ

(b) Coverage

d1 0:000ð0Þ0:021ð0:067Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:006ð0:019Þ0:000ð0:000Þ

d2 0:000ð0Þ0:007ð0:023Þ0:000ð0:000Þ0:005ð0:016Þ0:018ð0:016Þ0:000ð0:000Þ0:000ð0:000Þ

d3 0:000ð0Þ0:009ð0:029Þ0:025ð0:079Þ0:020ð0:065Þ0:000ð0:065Þ0:000ð0:000Þ0:000ð0:000Þ

d4 0:000ð0Þ0:005ð0:017Þ0:079ð0:205Þ0:000ð0:000Þ0:000ð0:000Þ0:007ð0:024Þ0:000ð0:000Þ

d5 0:000ð0Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ

(c) Rand Index

d1 0:993ð0:006Þ0:854ð0:144Þ0:600ð0:346Þ0:988ð0:014Þ0:512ð0:348Þ0:558ð0:309Þ0:988ð0:006Þ

d2 0:986ð0:011Þ0:706ð0:198Þ0:634ð0:249Þ0:690ð0:294Þ0:464ð0:302Þ0:605ð0:215Þ0:986ð0:009Þ

d3 0:987ð0:011Þ0:589ð0:255Þ0:612ð0:244Þ0:665ð0:282Þ0:639ð0:103Þ0:767ð0:148Þ0:985ð0:010Þ

d4 0:984ð0:011Þ0:535ð0:283Þ0:809ð0:155Þ0:522ð0:220Þ0:619ð0:370Þ0:759ð0:351Þ0:984ð0:012Þ

d5 0:988ð0:011Þ0:699ð0:276Þ0:651ð0:263Þ0:648ð0:306Þ0:658ð0:301Þ0:607ð0:259Þ0:982ð0:013Þ

(d) Number of Clusters

d1 24:017:310:924:010:010:023:3

d2 24:012:410:712:68:59:922:8

d3 24:09:710:611:89:513:823:3

d4 24:09:715:39:012:716:423:6

d5 24:014:011:911:712:310:823:9

d1-d5: Jaccard Goods, Jaccard Residence, Jaccard MO, Spatial Proximity, Temporal Proximity.

Standard Deviation within parentheses.

Series A Series B No known series

Fig. 3. Cluster solution example for spatial proximity and 3rd-Quantile based on labeled data. Clusters are connected by edges.

A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266 5259

signiﬁcant effect for the interaction of the two factors,

Fð24;315Þ¼2:733, non-signiﬁcant. As such, there is a difference

in the performance between the edge representation and between

the edge removal criteria, but they do not affect each other.

For the edge representation, Fisher’s LSD post hoc test found

that the Jaccard Residence edge representation scored signiﬁcantly

better than both the Temporal proximity and Jaccard Goods repre-

sentation (see Table 3(a)). Similarly, Spatial proximity and Jaccard

MO scored signiﬁcantly better than Jaccard Goods edge representa-

tion. This indicates that residential characteristics, followed by

Spatial proximity, is to prefer when representing similarity or dis-

tance between crimes.

For the edge removal criteria, Fisher’s LSD test indicates that the

MeanTest criterion performs signiﬁcantly better than the other cri-

teria. The best performance, however, was achieved when not

applying any edge removal criterion. The 3rd-Quantile and 2nd-

Quantile perform signiﬁcantly better than the 1st-Quantile. The

poor performance of the 1st-Quantile can be attributed to the high

number of edges being removed from the graph, with the conse-

quence that the nodes cannot be connected. As the performance

of not applying an edge removal criterion was similar to the best

performing edge removal criteria, there is little reason to apply

an edge removal criterion. Using edge removal criteria would in-

crease the number of computational steps required when grouping

crimes.

8.2. Experiment 2

The means and standard deviations are presented in Table 2. For

the modularity cluster validation measurement, two factor analy-

sis of variance showed a signiﬁcant effect for the edge representa-

tion, Fð4;315Þ¼4:339;p<:05; a signiﬁcant effect for the edge

removal criteria, Fð6;315Þ¼69:735;p<0:001; and a signiﬁcant

effect for the interaction of the two factors, Fð24;315Þ¼2:733;

p<0:001. As such, there is an signiﬁcant difference between edge

removal criteria, between different edge representations, and the

two variables affect eachother.

Fisher’s LSD post hoc test for the edge representation found that

there is a difference between the Jaccard MO, Jaccard Residence,

spatial proximity representations and the Jaccard Goods represen-

tation. This is similar to the results in Experiment 1, suggesting

similarity of clustering ability over the data sets. For the edge re-

moval criteria, the LSD test a found signiﬁcant difference between

multiple criteria. The different groups can be seen in Table 4(a).

Criteria belonging to different groups are signiﬁcantly different,

with group a performing signiﬁcantly better than group b. Group

aperforms signiﬁcantly better than group band c. Fisher’s LSD test

found, for the interaction between the factors, that the Jaccard Res-

idence with MeanTest criteria, Jaccard MO with MeanTest and 3rd-

Quantile criteria, and spatial proximity with 3rd-Quantile criteria

performed signiﬁcantly better than factors paired with JTD and

1st-Quantile.

For RI, two factor analysis of variance did not show a signiﬁcant

effect for edge representation, Fð4;315Þ¼0:663, non-signiﬁcant; a

signiﬁcant effect for edge removal criteria, Fð6;315Þ¼27:729;p<

0:05; and a signiﬁcant effect for the interaction of the two factors,

Fð24;315Þ¼2:198;p<0:05. As such, the two variables interact

and affect the performance of each other.

The results of Fisher’s LSD post hoc test for the interaction be-

tween the two factors, Edge Representation and Edge Removal Cri-

teria, see Table 5. Interaction for both the Modularity and Rand

Index is shown. Best modularity mean was found for combinations

of Jaccard Residence or Jaccard MO edge representation together

with Mean-Test edge removal criteria; Jaccard MO or Spatial prox-

imity edge representation together with 3rd-Quantile edge re-

moval criteria. Together with the results of Experiment 1, this

further supports the suggestion that crime similarity is best repre-

sented using Residence, MO or spatial proximity characteristics.

The modularity cluster validation measurement results suggest

that the use of JTD or 1st-Quantile edge removal criteria produces

clustering solutions that are signiﬁcantly worse than other factor

combinations. This suggests the unsuitability of these edge re-

moval criteria. The interaction result of Fisher’s LSD post hoc test

for the Rand Index validation measurement, found signiﬁcant dif-

ference between several groups of edge removal criteria (see

Table 4(b)). Similar to the modularity index, this suggests that cer-

tain characteristics are better representations when the goal is the

accuracy of grouping crimes. The 1st-Quantile and JTD performed

signiﬁcantly better than the other criteria. Fisher’s LSD test for

the interaction between the factors found that pairings with the

JTD and 1st-Quantile performed signiﬁcantly better than other

pairings, except for Jaccard Goods with the 2nd-Quantile edge re-

moval criteria and spatial proximity with the 3rd-Quantile edge re-

moval criteria. The resulting clustering solutions, however, were

trivial or near trivial, i.e. crimes were separated into singletons or

too small groups. If the clustering solutions consist of too small

groups, or singletons, using it as a selection system is inadvisable.

Table 3

Fisher’s LSD post hoc test for Experiment 1. The group column contains

letters which denotes group belonging and where different letters represent

groups that are statistical signiﬁcantly different from each other.

Edge Representation Means Group

(a) Edge representation

a

Jaccard Residence 0:0893 a

Spatial proximity 0:0955 ab

Jaccard MO 0:1060 ab

Temporal Proximity 0:1269 b

Jaccard Goods 0:1603 c

(b) Edge removal criteria

b

MeanTest 0:0597 a

None 0:0897 a

3rd-Quantile 0:0947 b

2nd-Quantile 0:0985 b

JTD 0:1014 bc

Average 0:1151 bc

1st-Quantile 0:2500 c

a

CD

LSD

:0:0383.

b

CD

LSD

:0:0324

Table 4

Fisher’s LSD post hoc test for edge removal criteria. The group column

contains letters which denote group belonging. Different letters represent

groups that are statistical signiﬁcantly different from each other.

Edge Removal Criteria Means Group

(a) Modularity

a

3rd-Quantile 0:0362 a

MeanTest 0:0490 a

2nd-Quantile 0:0570 a

None 0:0656 ab

Average 0:0881 b

JTD 0:2279 c

1st-Quantile 0:2500 c

(b) Rand Index

b

1st-Quantile 0:9880 a

JTD 0:9854 a

Average 0:7032 b

2nd-Quantile 0:6771 b

3rd-Quantile 0:6617 bc

None 0:6595 bc

MeanTest 0:5788 c

a

CD

LSD

:0:0298

b

CD

LSD

:0:0875

5260 A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266

An interesting aspect to note is that there are factorial combina-

tions that belong to high performing groups for both the Modular-

ity and Rand Index. This would indicate that the clustering

solutions provide non-trivial group separation with a above

average accuracy and that these combinations are more suitable

to crime separation than others. The Spatial proximity edge repre-

sentation and the 3rd-Quantile edge removal criteria is one exam-

ple where a high cluster separation is found, as well as a good

accuracy in assigning instances to clusters. Similar can be said of

the Jaccard MO edge representation and where no edge removal

criteria have been applied (marked as None in Table 5).

The calculated Pearson’s correlation coefﬁcient between the

Rand Index and the Modularity cluster validation measurement

is found to be 0:768. This can be seen in Fig. 4. This indicates that

for Experiment 2, the Modularity score of the cluster solutions de-

creases as the Rand Index increases, i.e. as crime groupings get

worse the accuracy increases. It should be noted that this is for

all different factor combinations, including ones creating trivial

clustering solutions. A few outliers can be found with a Modularity

score of 0:2 or higher and with a high Rand Index.

8.3. Validity threats

Validity threats against this study are outlined and discussed in

this section. It should be noted that over time, most of these valid-

ity threats will be corrected with the help of larger data sets and

more labeled information about the data.

First, accuracy can be question as the number of labeled in-

stances is quite small. Consequently, one cannot conclude strong

generalization when it comes to accuracy. There is not much one

can do about this, as the systematic gathering of crime reports in

Sweden is still quite young and have not been adopted by all police

counties. Also, the number of solved cases is quite low, three to ﬁve

percent. As the amount of labeled data is quite small, it is quite

possible that some cases should be part of a series but the informa-

tion is not available yet.

Second, the crime reports have been gathered during a rather

short time period of one year. It could be that during this time,

Table 5

Fisher’s LSD post hoc test for combinations of edge representation and edge removal

criteria. The group column contains letters which denote group belonging. Different

letters represent groups that are statistical signiﬁcantly different from each other.

Modularity Rand Index

Edge

Representation

Edge Removal

Criteria

Means Group Means Group

Jaccard Residence 1st-Quantile 0:250 d 0:986 a

2nd-Quantile 0:035 ab 0:706 bcde

3rd-Quantile 0:045 ab 0:635 cdef

Average 0:070 abc 0:691 bcde

MeanTest 0:018 a 0:464 f

None 0:032 ab 0:606 def

JTD 0:208 d 0:986 a

Jaccard MO 1st-Quantile 0:250 d 0:987 a

2nd-Quantile 0:037 ab 0:590 def

3rd-Quantile 0:021 a 0:612 def

Average 0:029 ab 0:666 bcde

MeanTest 0:019 a 0:639 cdef

None 0:073 abc 0:767 bcd

JTD 0:224 d 0:985 a

Jaccard Goods 1st-Quantile 0:250 d 0:993 a

2nd-Quantile 0:092 bc 0:855 ab

3rd-Quantile 0:046 ab 0:600 def

Average 0:250 d 0:988 a

MeanTest 0:053 ab 0:512 ef

None 0:057 abc 0:558 ef

JTD 0:226 d 0:988 a

Spatial Proximity 1st-Quantile 0:250 d 0:984 a

2nd-Quantile 0:032 ab 0:535 ef

3rd-Quantile 0:021 a 0:810 abc

Average 0:030 ab 0:523 ef

MeanTest 0:081 abc 0:619 cdef

None 0:121 c 0:759 bcd

JTD 0:235 d 0:984 a

Temporal

Proximity

1st-Quantile 0:250 d 0:988 a

2nd-Quantile 0:089 bc 0:610 bcde

3rd-Quantile 0:048 ab 0:651 cdef

Average 0:060 abc 0:648 cdef

MeanTest 0:072 abc 0:659 cdef

None 0:045 ab 0:607 def

JTD 0:246 d 0:983 a

CD

LSD

:0:066 CD

LSD

:0:195

0.0 0.2 0.4 0.6 0.8 1.0

−0.2 0.0 0.2 0.4

Rand Index

Modularity

Fig. 4. Correlation plot between modularity and Rand Index for clustering solutions from Experiment 2.

A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266 5261

crimes have occurred in a pattern that can be considered non-

representative of criminals. Similarly can be said of the counties

from which the crime reports have been collected. This is being

rectiﬁed as data gathering takes place and more counties chooses

to join the systematic process.

8.4. Discussion

The results of the experiments highlight two interesting things.

First is the idea of edge removal criteria, which the results indicate

have no beneﬁt on the grouping of crimes. Consequently, when

grouping residential burglaries there is little incentive to not have

similarity information between cases that fulﬁll edge removal crite-

ria. While the use of edge removal criteria might have more impact

on the cut-clustering adaption than for the domain, it also suggests

that retaining information between cases is important. Initially,

edge removal criteria such as Journey Time Distance would remove

edges/connections between crimes that was not feasible to be con-

nected. However, Journey Time Distance turned out to be one of the

worst performing ways of removing edges between crimes, which

is surprising. It, intuitively, seemed that by removing edges be-

tween cases that were impossible to connect, the clustering algo-

rithm would perform better, which was not the case.

The second interesting outcome is the different suitability for

representing similarity between crimes. The edge representations

with the best performance were Residence characteristics, MO

characteristics, and Spatial proximity. This is unsurprising, as sim-

ilar results have been suggested in previous research conducted.

The goods representation has the worst performance. This is most

likely due to the fact that the goods attributes available are quite

limited and certain items tend to be more attractive to criminals.

That the data are gathered during a limited time period might also

affect the impact of the goods. The high performance of the MO and

Residence characteristics suggests that there exists a certain

behavioural consistency within crimes, and that this consistency

differ between criminals to such an extent that it is possible to

differentiate.

Something to consider is that law enforcement ofﬁcers have are

more likely to solve crimes committed by local criminals. Crimes

committed by non-local criminals that are active over a larger area,

during a limited time, are more complicated to solve. There might

be a difference in which representation of crime similarities have

the best performance between local and non-local criminals.

This also means that there will be an imbalance in the data, i.e. a

series will not be a large part of the crime cases. Imbalance will al-

ways exist, as serial residential burglars do not conduct all burglar-

ies and of 2~2,000 residential burglaries a year, one group might

only have committed a minority. As can be observed in Fig. 4,as

the groupings get worse, the accuracy score is increased. This is

mostly due to the correct classiﬁcation of singleton cases out-

weighing the false classiﬁcation of connected cases. Consequently,

one of the problems is to ﬁnd a balance between group size and

series quality.

Given the nature of the area, there will be crimes that are sin-

gletons. These cases could be a singleton in the clustering solution,

or be part of a larger group of crimes. As the purpose is to reduce

the number of crimes an analyst has to analyse, as well as to sug-

gest new connections among crimes, this is only a problem as long

as clustering solutions are mostly singletons. This is not always the

case (see Fig. 3). It is possible that an increased performance from

the cut-clustering algorithm, or another clustering algorithm, can

create better clustering solutions. Similarly, the method might also

fail to cluster cases that should be connected. As of today no clus-

tering methods are employed for clustering crimes in swedish law

enforcement agencies and if a speciﬁc investigator does not have a

notion of a connection between cases, they will be missed entirely.

So while an approach such as this might miss a crime when group-

ing, the systematic selection of cases is still preferable as a comple-

ment to the expertise of a single investigator, who otherwise keep

earlier cases and suspects in memory.

It should be noted that as the clustering uses distance-based

metrics, e.g. Jaccard index, when the MO for a perpetrator changes

(which it most likely will over time), the distance between newer

and older crimes will increase. This could be problematic, and

should be something to be kept in mind by investigators. A recom-

mendation is that a time window for crimes to investigate is lim-

ited to 6 months, due to change in MO and other variables

(McCue, 2007). A similar issue is the MO is often a consequence

of circumstances. A criminal known for a certain way of entering

the building might, when encountering an unlocked door not use

the same MO and as a result, that speciﬁc case will not be same

MO. This is a consequence of only using a single MO characteristic

as a basis for the distance measurement. A potential remedy to this

would be to use a measurement that combines multiple MO char-

acteristics for calculating the distance between crimes. However,

this problem is also mitigated by the fact that the implemented

DSS has knowledge of physical evidence gathered at the scene.

As such, the investigator might have a set of cases where shoe

prints and ﬁngerprints have been found, and can search for local

cases with similar physical evidence and see if matches exist. Such

an approach can detect other MO used by the same burglars.

9. Conclusion

In this article a DSS for managing and analysing systematically

gathered residential burglary reports have been presented. The DSS

allows law enforcement to easily search and compareresidential bur-

glary reports. The DSS contains, among other modules, an analytical

framework. The use of clustering to group residential burglaries in

the DSS has been investigated, using several similarity criteria.

While results of the modularity cluster validation measurement

indicate that the separation between clusters is poor, the Rand In-

dex results still supports further investigation into this area. The

ﬁrst experiment concerned which representation of residential

burglaries for use with clustering, and whether edge removal

methods increased performance. The results of the ﬁrst experiment

shows that the choice of edge representation, but not the edge re-

moval criteria, positively affected the modularity score. The second

experiment concerned the whether clustering solutions where able

to correctly cluster crime series. The results in Experiment 2 sug-

gest that, when excluding trivial clustering solutions, a high quality

clustering solution results in with an above chance accuracy, i.e.

0:5>RI >0:8. The experiments have suggested that the choice of

which edge representation to use when grouping crimes can posi-

tively affect the clustering solution. Best performance is found

using Spatial proximity or Residential characteristics as a basis

for comparing crimes. As such, this would indicate that these char-

acteristics are to prefer when law enforcement investigates related

residential burglaries.

The clustering solutions without any edge removal criteria per-

formed within the same group as the highest scoring edge removal

criteria in most cases. The mean modularity score of the experi-

ments were, however, suggesting that the cut clustering algorithm

is not optimal for this domain. An increase in the ability to cor-

rectly cluster crimes would allow law enforcement ofﬁcers to

investigate fewer amounts of criminal cases with a lower chance

of missing cases in a series of crimes. The results suggest that while

clustering crime series are feasible using cut clustering, further

investigation is needed.

We have identiﬁed six aspects for future work. First, we only

had access to a low number of labeled cases and the experiment

5262 A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266

needs to be investigated using a larger labeled data set. Getting ac-

cess to a larger labeled data set is not trivial since identifying series

of crimes is hard. Second, in this study only individual edge repre-

sentations have been investigated. Researchers have found that in

some cases, combinations of edge representation scores have had a

better performance than the individual edge representation. A dis-

tance index using combined crime characteristics, such as spatial

and MO characteristics, needs to be developed. Third, it should

be noted that the performance of other clustering algorithms on this

domain has not been investigated, and a comparison of algorithms

should be conducted. Fourth, an accuracy index that takes into account

the imbalance of the data would be better suited than the Rand Index.

Appendix A. Edge representation and removal criteria

A.1. Edge representation

The edges in the graph are represented by different similarity

coefﬁcients, making the edge weights a measure of similarity be-

tween nodes. The different similarity coefﬁcients are explained in

this section. The similarity coefﬁcients have been chosen based

on results suggested in previous research.

The Jaccard coefﬁcient (otherwise known as the Jaccard index or

Tanimoto coefﬁcient) is a measure of similarity between two pairs,

A and B, based on the data shared and the data unique to each set,

as shown in Eq. (A.1). A similarity value between 0 and 1 is com-

puted, where a value of 0 indicates that the two sets are identical.

Jaccard ¼1jA\Bj

jA[BjðA:1Þ

The Jaccard coefﬁcient is used to compute the similarity between

incident reports based on the complete binary data available, data

representing stolen goods and data representing the target, i.e. res-

idential characteristics.

Temporal proximity between instances is also used as a similar-

ity measure. The data gathered by law enforcement ofﬁcers con-

tains information to compute the temporal proximity between

residential burglaries. Due to the nature of these crimes, i.e. the

crimes are often committed when the residents are away, the accu-

racy of the reported occurrence time and dates is often low. Conse-

quently, the reported occurrence is often limited to a day of the

week, but reporting would preferable describe when crime oc-

curred within a range of hours. The proximity in time between

cases is computed as A

time

B

time

if A occurred after B and vice

versa.

Similar to temporal proximity, the spatial proximity between in-

stances is used as a similarity measure. Data gathered by law

enforcement ofﬁcers contains the address where the residential

burglary took place, to a degree that allows us to ﬁnd the longitude

and latitude. From the coordinates, the proximity between the lon-

gitude and latitude are computed and converted to meters. It

should be noted that the distance computed is the shortest path

between the two cases, i.e. the geodesic distance.

A.2. Edge removal criteria

As discussed in Section 4, the minimum cut tree algorithm,

when given complete graphs or near-complete graphs, can produce

trees that are star-shaped, i.e. each node is connected directly to

the root node, or unary. Consequently, it is possible that the clus-

tering can be improved by converting complete graphs into incom-

plete graphs. Two approaches for this conversion are investigated.

In the ﬁrst approach, several threshold values are computed and

the graphs are pruned with these values, only keeping edges where

the nodes are considered similar to a certain degree. Threshold

edge removal for graph transformation can be considered a global

approach, in that a single threshold value is computed and used for

all edges in the graph.

The second approach use time and distance based measures,

and given the outcome the edge is removed or the weight is chan-

ged to indicate lesser similarity. The distance-based edge removal

can be considered local, i.e. only a single edge is investigated at a

time. Given this, the criteria for removing an edge can be different

for each edge.

Thresholded edge removal. Thresholded edge removal for graph

transformation can be considered a global approach, in that a sin-

gle threshold value is computed and used for all edges in the graph.

Only edges where the nodes are considered similar to a certain de-

gree, e.g. below the threshold value, are kept. Three different

threshold values, and their ability to produce quality clusters, are

investigated.

The mean value is the sum of values of the similarity indices of a

set of pairs of instances divided by the number of instances. Every

edge in the graph, whose value is above the threshold value, is

removed.

The quartile value is considered the value separating an ordered

set into a number of subsets. The median is the 2-quartile value

separating a set divided into two parts. The quartile value is a set

of three values that divide a set into four groups. The different

quartiles used are denoted as 2nd quartile (Q

2

), also known as

the median, and 3rd quartile (Q

3

).

The Q

2

value

2

is computed as described in Eq. (A.2).

Q

2

ðXÞ¼ X

ðNþ1Þ=

2

if Nis odd

1

2

ðX

N=

2

þX

1þðN=

2

Þ

Þif Nis even

(ðA:2Þ

If the number of items in the set is odd, the Q

2

value is middle value

of the set. If the number of items in the set is even, the Q

2

value is

the mean value of the two items in the middle of the set.

A.2.1. Distance-based edge removal

differs from the threshold edge removal in that it is based on

spatial and temporal proximities (independent of the underlying

edge representation). An additional difference is that distance-

based edge removal can be considered local. That is, only a single

edge is investigated at a time. Given this, the criterion for removing

an edge can be different for each edge.

The Mantel cross product adaption is based on the Mantel index,

which is a correlation test between time and distance for pairs of

instances (Levine, 2010). The Mantel index was designed to allevi-

ate some of the problems with previous indices, where cut off

points affect result and results can be signiﬁcant both if the

time/space distance is short or long. It is used to detect correlations

between two matrices, and as such needs to be adapted to compare

between two instances only. The Mantel index cross product is de-

ﬁned as follows:

T¼X

N

i¼0

X

N

j¼0

ðX

i;j

MeanðXÞÞðY

i;j

MeanðYÞÞ ðA:3Þ

The variables can be explained as follows: N is the number of in-

stances, X a set of similarities of one index (e.g. space) between

two instances, and Y a set of similarities of another index (e.g. time)

between the same instances. The following equations are used as

base to remove edges where time and space has exceeded a certain

point in the dataset. The time and space proximity between two in-

stances are compared against the mean time or space proximity. If

one of the conditions is negative, the weight of the edge is increased

by half. If both conditions are negative, the edge is removed.

2

http://mathworld.wolfram.com/StatisticalMedian.html, 2013-02-18.

A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266 5263

ðX

i;j

MeanðXÞÞ1ðA:4Þ

ðY

i;j

MeanðYÞÞ1ðA:5Þ

Journey Time Distance (JTD) is a measure used in Geographical

Information Systems (GIS), to determine whether time/space dis-

tances between cases are reasonable (Chainey & Ratcliffe, 2005).

The measure used here is a simpliﬁed version that assumes a

straight travel distance and a ﬁxed speed. The JTD is determined

by calculating distance divided by speed equals time, e.g. whether

a criminal reasonably can travel between cases. The equation used

to investigate this is as follows:

X

i;j

100;000 >ðY

i;j

24ÞðA:6Þ

The distance in meters (X

i;j

) between two cases divided by 100;000

(100 km/h) gives the time it would take to travel between the two

cases. If that is larger than the temporal proximity (Y

i;j

) the cases are

reasonably not connected and the edge is removed.

It has been argued that a temporal proximity no greater than

6 months is the longest period that a dataset should span, as on

longer time period the movement of people affects the outcome

(McCue, 2007). This constraint on time span is considered by

removing the edge, if the temporal proximity is longer than

3 months.

Appendix B. Standardised complaint routine

5264 A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266

References

Bennell, C., Bloomﬁeld, S., Snook, B., Taylor, P., & Barnes, C. (2010a). Linkage analysis

in cases of serial burglary: Comparing the performance of university students,

police professionals, and a logistic regression model. Psychology, Crime & Law,

16(6), 507–524.

Bennell, C., & Canter, D. V. (2002). Linking commercial burglaries by modus

operandi: Tests using regression and ROC analysis. Science & Justice: Journal of

the Forensic Science Society, 42(3), 153.

Bennell, C., Gauthier, D., Gauthier, D., Melnyk, T., & Musolino, E. (2010b). The impact

of data degradation and sample size on the performance of two similarity

coefﬁcients used in behavioural linkage analysis. Forensic Science International,

199(1–3), 85–92.

Bennell, C., & Jones, N. J. (2005). Between a ROC and a hard place: A method for

linking serial burglaries bymodus operandi. Journal of Investigative Psychology

and Offender Proﬁling, 2(1), 23–41.

Bennell, C., Jones, N. J., & Melnyk, T. (2010c). Addressing problems with traditional

crime linking methods using receiver operating characteristic analysis. Legal

and Criminological Psychology, 14(2), 293–310.

Brandes, U., Delling, D., Gaertler, M., Görke, R., Hoefer, M. Nikoloski, Z., et al. (2007).

On ﬁnding graph clusterings with maximum modularity. In: Lecture notes in

computer science, graph-theoretic concepts in computer science (pp. 121–132),

Berlin, Heidelberg.

Chainey, S., & Ratcliffe, J. (2005). GIS and crime mapping. John Wiley & Sons, Ltd.

Cohen, J., Rodrigues, L. A., Silva, F., Carmo, R., Guedes, A. L. P., & Duarte, E. P. (2011).

Parallel implementations of Gusﬁeld’s cut tree algorithm. In: 11th International

conference on algorithms and architectures for parallel processing (ICA3PP 2011)

(pp. 258–269), Berlin, Heidelberg.

Flake, G. W., Lawrence, S., & Giles, C. L. (2000). Efﬁcient identiﬁcation of web

communities. In: KDD ’00: proceedings of the sixth ACM SIGKDD international

conference on knowledge discovery and data mining, ACM Request Permissions.

Flake, G. W., Tarjan, R. E., & Tsioutsiouliklis, K. (2004). Graph clustering and

minimum cut trees. Internet Mathematics, 1(4), 385–408.

Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3–5),

75–174.

Görke, R., Hartmann, T., & Wagner, D. (2009). Dynamic graph clustering using

minimum-cut trees. Berlin Heidelberg: Springer.

Hamann, M. (2011). Complete hierarchical cut-clustering: An analysis of guarantee

and quality, Karlsruhe Institute of Technology.

A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266 5265

Levine, N. (2010). CrimeStat III: A spatial statistics program for the analysis of crime

incident locations (version 3.3).

Markson, L., Woodhams, J., & Bond, J. W. (2010). Linking serial residential burglary:

Comparing the utility of modus operandi behaviours, geographical proximity,

and temporal proximity. Journal of Investigative Psychology and Offender

Proﬁling, 91–107.

McCue, C. (2007). Data mining and predictive analysis.Intelligence gathering and crime

analysis (1st ed., ). Butterworth-Heinemann.

Newman, M. (2003). Mixing patterns in networks. Physical Review E, 67(2), 026126.

Newman, M. E. J. (2006). Modularity and community structure in networks.

Proceedings of the national academy of sciences, 103(23), 8573–8574.

Oatley, G., Ewart, B., & Zeleznikow, J. (2006). Decision support systems for police:

Lessons from the application of data mining techniques to ‘‘soft’’ forensic

evidence. Artiﬁcial Intelligence and Law, 14(1–2), 35–100.

Phillips, P., & Lee, I. (2011). Crime analysis through spatial areal aggregated density

patterns. Geoinformatica, 15(1), 49–74.

Santos, J. M., & Embrechts, M. (2009). On the use of the adjusted rand index as a

metric for evaluating supervised classiﬁcation. In Artiﬁcial neural networks–

ICANN 2009 (pp. 175–184). Berlin Heidelberg: Springer.

Schaeffer, S. E. (2007). Graph clustering. Computer Science Review, 1(1), 27–64.

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-

experimental designs for generalized causal inference. Wadsworth Pub Co.

Sheskin, D. (2007). Handbook of parametric and nonparametric statistical procedures.

Chapman & Hall.

Tonkin, M., Woodhams, J., Bull, R., Bond, J. W., & Palmer, E. J. (2011). Linking

different types of crime using geographical and temporal proximity. Criminal

Justice and Behavior, 38(11), 1069–1088.

Toole, J. L., Eagle, N., & Plotkin, J. B. (2011). Spatiotemporal correlations in criminal

offenserecords. Transactionson Intelligent Systemsand Technology (TIST),2(4), 1–18.

Wang, S., Li, X., Cai, Y., & Tian, J. (2011). Spatial and temporal distribution and

statistic method applied in crime events analysis. In: 2011 19th international

conference on geoinformatics (pp. 1–6).

Woodhams, J., Hollin, C. R., & Bull, R. (2010). The psychology of linking crimes: A

review of the evidence. Legal and Criminological Psychology, 12(2), 233–249.

Xue, Y., & Brown, D. E. (2003). A decision model for spatial site selection by

criminals: A foundation for law enforcement decision support. IEEE Transactions

on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 33(1), 78–85.

Zhou, G., Lin, J., & Zheng, W. (2012). A web-based geographical information system

for crime mapping and decision support. In: 2012 International conference on

computational problem-solving (ICCP) (pp. 147–150).

5266 A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266