Content uploaded by Martin Boldt
Author content
All content in this area was uploaded by Martin Boldt on Feb 24, 2020
Content may be subject to copyright.
Detecting serial residential burglaries using clustering
Anton Borg
a,
⇑
, Martin Boldt
a
, Niklas Lavesson
a
, Ulf Melander
b
, Veselka Boeva
c
a
Blekinge Institute of Technology, School of Computing, SE-371 79 Karlskrona, Sweden
b
Blekinge County Police, Box 315, SE-371 25 Karlskrona, Sweden
c
Computer Systems & Technologies Department, Technical University of Sofia, Bulgaria
article info
Keywords:
Cut clustering
Residential burglary analysis
Crime clustering
Decision support system
abstract
According to the Swedish National Council for Crime Prevention, law enforcement agencies solved
approximately three to five percent of the reported residential burglaries in 2012. Internationally, studies
suggest that a large proportion of crimes are committed by a minority of offenders. Law enforcement
agencies, consequently, are required to detect series of crimes, or linked crimes. Comparison of crime
reports today is difficult as no systematic or structured way of reporting crimes exists, and no ability
to search multiple crime reports exist.
This study presents a systematic data collection method for residential burglaries. A decision support
system for comparing and analysing residential burglaries is also presented. The decision support system
consists of an advanced search tool and a plugin-based analytical framework. In order to find similar
crimes, law enforcement officers have to review a large amount of crimes. The potential use of the
cut-clustering algorithm to group crimes to reduce the amount of crimes to review for residential
burglary analysis based on characteristics is investigated. The characteristics used are modus operandi,
residential characteristics, stolen goods, spatial similarity, or temporal similarity.
Clustering quality is measured using the modularity index and accuracy is measured using the rand
index. The clustering solution with the best quality performance score were residential characteristics,
spatial proximity, and modus operandi, suggesting that the choice of which characteristic to use when
grouping crimes can positively affect the end result. The results suggest that a high quality clustering
solution performs significantly better than a random guesser. In terms of practical significance, the
presented clustering approach is capable of reduce the amounts of cases to review while keeping most
connected cases. While the approach might miss some connections, it is also capable of suggesting
new connections. The results also suggest that while crime series clustering is feasible, further investiga-
tion is needed.
Ó2014 Elsevier Ltd. All rights reserved.
1. Introduction
Studies suggest that a large proportion of crimes are committed
by a minority of offenders, e.g. in the USA, researchers suggest that
5% of offenders are involved in 30% of the convictions (Tonkin,
Woodhams, Bull, Bond, & Palmer, 2011). Law enforcement agen-
cies, consequently, are required to detect series of crime, or linked
crimes. A series can be defined as multiple offences committed by a
serial offender. A serial offender can be defined as someone who
has committed two or more crimes of the same type (Woodhams,
Hollin, & Bull, 2010). It is suggested by law enforcement in Sweden
that, similarly to the international findings, a large proportion of
the residential burglaries are committed by professional criminals
that travel across large areas of Sweden. Simultaneously, according
to the Swedish National Council for Crime Prevention, law enforce-
ment agencies solved approximately three to five percent of the
21,300 reported residential burglaries in 2012.
The detection of linked crimes is helpful to law enforcement for
several reasons. Firstly, the aggregation of information from crime
scenes increases the amount of available evidence. Secondly, the
joint investigation of multiple crimes enables a more efficient
use of law enforcement resources (Woodhams et al., 2010).
Law enforcement needs to handle a large amount of reported
crimes, and the detection of series of crimes are often carried out
manually. A decision support system that enables law enforcement
to decrease the amount of cases when reviewing crimes would in-
crease resource efficiency.
Forensic evidence, e.g. DNA, and fingerprints, has been used to
detect linked crimes (Bennell & Canter, 2002; Tonkin et al.,
2011). The availability of forensic evidence is, however, limited
http://dx.doi.org/10.1016/j.eswa.2014.02.035
0957-4174/Ó2014 Elsevier Ltd. All rights reserved.
⇑
Corresponding author. Tel.: +46 455385854.
E-mail addresses: anton.borg@bth.se (A. Borg), martin.boldt@bth.se (M. Boldt),
niklas.lavesson@bth.se (N. Lavesson), vboeva@tu-plovdiv.bg (V. Boeva).
Expert Systems with Applications 41 (2014) 5252–5266
Contents lists available at ScienceDirect
Expert Systems with Applications
journal homepage: www.elsevier.com/locate/eswa
(Tonkin et al., 2011). In the absence of forensic evidence, behav-
ioural information can be used as an alternative data source
(Bennell & Canter, 2002). A criminal committing a series of crimes
has been found to have a high intra-crime behavioural similarity
(Woodhams et al., 2010). Similarly, behavioural consistency tends
to be lower between criminals in similar situations (Woodhams
et al., 2010).
This article presents a new decision support system (DSS) that
can be used to systematically collect burglary data and to perform
visualisations, analyses, and interpretations of the collected data.
The article evaluates a key component of the DSS: the use of clus-
tering techniques to group burglaries based on different definitions
of similarity between burglaries, described in Fig. 1. Clustering has
been used to group data according to similarity between data
points, or to find communities in the data. Clustering residential
burglaries based on different similarity aspects would potentially
allow law enforcement to find series whilst reviewing a smaller
amount of residential burglaries, i.e. used as a case selection DSS.
Consequently, the use of this DSS would allow law enforcement
agencies to save resources, whilst providing individual investiga-
tors with increased support. The clustering is performed using
the cut clustering algorithm (Flake, Tarjan, & Tsioutsiouliklis,
2004).
1.1. Purpose statement
The purpose of this study is twofold. First, a DSS for collecting,
managing and analysing residential burglary information is pre-
sented. Secondly, the potential of minimum cut based graph clus-
tering of crimes is investigated to reduce the amount of crimes to
review to detect series of residential burglaries. The impact of dif-
ferent edge representations and edge removal criteria on cluster
quality and accuracy is investigated. Clustering quality is measured
using the modularity index and accuracy is evaluated by applying
the rand index.
The data comprises residential burglary reports gathered from
southern Sweden and the Stockholm area.
1.2. Outline
The remainder of this work is organized as follows: Section 2
presents a DSS for residential burglary analysis. In Section 3, the re-
lated work is reviewed. Section 4then describes the minimum cut
clustering algorithm. In Sections 5and 6, the methodology and
experimental procedure is described. The results of the experi-
ments are presented in Section 7and analysed in Section 8. Conclu-
sions and future work is presented in Section 9.
2. Decision support system for residential burglary analysis
Since 2011, researchers from Blekinge Institute of Technology
collaborate with law enforcement officers and analysts from the
Blekinge county police as well as four additional county police
authorities from southern Sweden. The aim is to develop Informa-
tion and Communication Technology (ICT) solutions for law
enforcement. The scope is currently limited to solutions that target
residential burglaries. The strategies, tactics, and overall organisa-
tional structure of the police vary between countries but the main
issues are shared between many countries.
In Sweden, the police is organised into 21 county police author-
ities, or regional units, where each correspond to a particular
county. The National Police Board (NPB) is the central administra-
tive and supervisory authority of the police service. The NPB com-
prises The National Bureau of Investigation and the Swedish
Security Service. In addition, the Swedish police includes the Swed-
ish National Laboratory of Forensic Science. In 2015, the Swedish
police will be re-organised into one national authority.
The collaboration between Blekinge Institute of Technology and
the Swedish police was formed to improve the capability to solve
residential burglary cases. In particular, the police are interested
in ICT software, and organisational changes, that improve the data
exchange and collaborative efforts of multiple county police
authorities when addressing serial crime. Engineers and research-
ers at Blekinge Institute of Technology developed a prototype DSS
for this purpose in 2012. Since then, the collaboration between
Fig. 1. A view of local crimes with red markers denoting similar crimes in the suggested DSS.
A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266 5253
academia and police has been extended to encompass authorities
responsible for two thirds of the Swedish population.
The DSS uses a web-based graphical user interface, which is
connected through program logic to a database with structured
information about residential burglaries. The crime data is col-
lected through a digital form, which can be observed in Appendix
B, which is being continuously developed in close collaboration be-
tween Blekinge Institute of Technology and the Swedish police. The
form forces police officers at the crime scene to acquire specific
pieces of information about the modus operandi, the physical loca-
tion, and other types of information related to each crime. Before
the introduction of this form, the data collected varied extensively
between crime scenes with respect to quality, amount, personal
bias, and perspective.
The program logic in the DSS is centered around a straight-
forward search engine interface, which makes it possible to search,
filter, group, and compare crime scenes with respect to various
properties related to modus operandi, location, and so on. This
can be seen in Fig. 2. In addition to the comprehensive search en-
gine, the DSS features a plugin-based analysis framework, which
makes it possible to develop specific types of descriptive and infer-
ential statistical analyses of the crime scene data.
This article is focused on an analysis component developed for
the DSS. The component makes it possible to perform clustering on
crime scene data for various purposes. The aim of this article can
therefore be described as twofold: to introduce and describe the
DSS and the structured data collection of crime scene data as well
as to evaluate one particular type of analysis component.
3. Related work
The problem of linking reported crimes has mostly been inves-
tigated from a psychological or criminological perspective. The
research has focused on crimes conducted that can be considered
violent, e.g. sexual offences, rapes, homicides, and different types
of burglaries, including violent burglaries (Bennell & Canter,
2002; Bennell, Bloomfield, Snook, Taylor, & Barnes, 2010a; Bennell,
Gauthier, Gauthier, Melnyk, & Musolino, 2010b; Bennell, Jones, &
Melnyk, 2010c; Markson, Woodhams, & Bond, 2010; Woodhams
et al., 2010).
The research conducted suggests that behavioural consistency
is present among offenders and that there exists an inter-individ-
ual variation (Woodhams et al., 2010). The behavioural consistency
between similar situations tends to increase with the experience of
the perpetrator. More specifically, an individual tend to behave
similarly in similar situation. Multiple individuals tend to behave
differently, to a certain degree, in similar situation (Woodhams
et al., 2010). The smaller temporal proximity between situations
usually results in an increased similarity for a perpetrator.
Different aspects of behaviours can be used for comparison, e.g.
modus operandi (MO), spatial proximity, and temporal proximity.
The MO can be further divided into three domains; entry behav-
iour, target characteristics, and goods stolen (Bennell & Jones,
2005). Entry behaviour describes the procedure used to enter the
premises, e.g. broke and entered through a window on the second
floor. Target characteristics denote characteristics of the residence
being targeted, i.e. isolated location, two-story building, alarm, etc.
Recent research on using MO characteristics have suggested the
effectiveness of the characteristics (Woodhams et al., 2010). Spatial
proximity has been shown to increase the hit ratio, i.e. the number
of detected linked cases, for some crime types, e.g. burglaries
(Woodhams et al., 2010). Spatial proximity have also been investi-
gated for use in groupings of crimes to detect where crimes con-
centrate in space and time, e.g. to detect hotspots, or to predict
future crime locations (Chainey & Ratcliffe, 2005; Oatley, Ewart,
& Zeleznikow, 2006;Phillips & Lee, 2011;Xue & Brown, 2003;
Fig. 2. A view of local crimes for a specific search in the suggested DSS.
5254 A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266
Wang, Li, Cai, & Tian, 2011; Zhou, Lin, & Zheng, 2012). Spatiotem-
poral correlations over longer time periods have been investigated
to further enhance hotspot detection (Toole, Eagle, & Plotkin,
2011). These approaches differs from crime linkage in that they de-
tect areas which are more likely to have crimes committed,
whereas crime linkage finds connections between crimesover larger
areas (Oatley et al., 2006). Different hotspot methods are used in
DSS for law enforcement agencies, e.g. to detect areas for resource
prioritization (Chainey & Ratcliffe, 2005;Phillips & Lee, 2011).
Some researchers have computed the similarity between pairs of
crimes based on various behaviours. Many of these studies have
used similarity coefficients between cases, such as the Jaccard coef-
ficient, to represent behavioural consistency (Woodhams et al.,
2010). The similarity scores have been used as input for logistic
regression analysis as well as used to plot a receiver operating char-
acteristics (ROC) curve for linked and unlinked cases (Bennell &
Jones, 2005; Bennell et al., 2010c; Markson et al., 2010; Tonkin
et al., 2011). The results have suggested that spatial proximity, and
temporal proximity, are better indicators to determine linked crime
than the MO characteristics (Markson et al., 2010). The MO charac-
teristics, however, was still found to be a significant indicator (Ben-
nell & Jones, 2005; Markson et al., 2010). Using only temporal and
spatial proximity, a model was created which was able to correctly
classify 86:9%of crime pairs in the sample, compared to 80%for a
model using spatial proximity and 75:6%for a model using temporal
proximity (Markson et al., 2010). The MO characteristics-based
selection achieved an accuracy between 54:4%and 58:1%.
The data used in many of the reviewed studies were extracted
from law enforcement agencies, in some cases according to a
checklist (Markson et al., 2010). Since the data extraction was done
after the case information was reported, the case information
might be incomplete, as law enforcement officers might not have
reported crimes in a systematic way, e.g. different aspects are con-
sidered important.
The overarching theme studied in the previous articles can be
described as detecting crimes that are similar. Detecting crimes that
are similar can be considered similar to the purpose of clustering,
where the goal is to distribute objects into separate groups.
Several investigations have tried to compute similarity scores
between pairs of crimes. Such scores can easily be translated into
a graph structure or an adjacency matrix. A graph can be described
as a set of nodes that can be connected with vertices of different
weights, e.g. a set of crimes as nodes with a similarity score as
vertice weight. A survey of the graph clustering domain was
conducted in 2007 (Schaeffer, 2007). Graph clustering have
successfully been used to identify communities/networks in other
settings (Flake, Lawrence, & Giles, 2000; Fortunato, 2010;
Newman, 2006). Community detection have been investigated
extensively and different methods summarized (Fortunato, 2010).
A graph can be divided into clusters based on a split criterion. This
approach is denoted divisive clustering. The split criterion can be
computed using several methods, e.g. maximum-flow, spectral
methods, and Schaeffer (2007).
Previous work on the problem of linking residential burglaries
have suggested that there is a difference between the similarities
of linked and unlinked residential burglaries. The difference have
been investigated using pairs of crimes (Bennell & Jones, 2005;
Bennell et al., 2010c; Markson et al., 2010; Tonkin et al., 2011).
While the pair-wise comparison have suggested a possibility of
detecting links between cases, the studies have not investigated
approaches for detecting series of crimes. Consequently, each ser-
ies of residential burglaries should have a high intra-series similar-
ity score and a low inter-series similarity score, similar to the
description of community detection in graphs (Fortunato, 2010).
As such, clustering residential burglaries can be described as a
problem of grouping instances or detecting communities within
the data. One of the more recent graph clustering algorithms suit-
able for detecting communities in graphs is the cut clustering algo-
rithm suggested by Flake et al. (2004), Fortunato (2010), Görke,
Hartmann, and Wagner (2009).
4. Cut clustering algorithm
The cut clustering algorithm is a graph-based clustering algo-
rithm based on minimum cut tree algorithms to cluster the input
data (Flake et al., 2004). The input data used is an undirected graph
where the edges between nodes could represent a similarity or dis-
tance measure.
The algorithm can be described as follows: an artificial node is
added to the existing graph and connected to all nodes in the graph
with the edge value
a
. A tree is created from the graph using the
minimum cut tree algorithm (Cohen et al., 2011). The artificial
node is then removed from the tree and the nodes that are still
connected are considered part of different clusters (Flake et al.,
2004).
Algorithm 1. Cut clustering algorithm (Flake et al., 2004).
Input is a graph (G) with nodes (V) and edge weights (E).
1: function C
UT
C
LUSTERING
GðV;EÞ;
a
2: V
0
V\t
3: for all nodes
v
2Vdo
4: Connect tto
v
with edge weight
a
5: end for
6: G
0
(V
0
;E
0
) is the expanded graph after connecting tto V
7: Calculate the Min-cut Tree T
0
of G
0
8: Remove tfrom T
0
9: return all connected components as clusters of G
10: end function
The cut clustering algorithm (see Algorithm 1) is implemented
according to the original description (Flake et al., 2004). The min-
imum cut tree algorithm is implemented according to Gusfield’s
specification (Cohen et al., 2011). Gusfield’s algorithm is described
further in Section 4.2. To find the minimum cuts between two
nodes in the adjacency matrix, the Edmond-Karp maximum flow
algorithm is used.
A property of the maximum flow algorithm is that complete
graphs, or near complete graphs, can result in trivial clustering
solutions. This is due to that, in a complete graph, the minimum
cut can be trivial, i.e. cutting either the source or target node. In
such cases, the trees created will be either star-shaped, i.e. each
node is connected directly to the root node, or unary, i.e. each par-
ent containing one node. The clustering produced from such a tree
will be trivial. Consequently, this needs to be considered when cre-
ating the graph.
4.1. The
a
value
The
a
value is used when the artificial node is attached to the
other nodes. The outcome of the minimum cut clustering algo-
rithm is determined by the
a
value (Flake et al., 2004). The behav-
iour of the
a
value can be predicted. Given a high
a
value, several
clusters will be produced. As the
a
value decreases, fewer clusters
will be produced.
The
a
value can, when the number of desired clusters is known,
be discovered using e.g. a binary search for alpha values until the
wanted number of clusters is found (Flake et al., 2004). If the number
of desired clusters is unknown, a binary search can iterate over the
a
value until trivial clusters are no longer produced or the number of
A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266 5255
clusters produced is stabilized. This has been implemented accord-
ing to Algorithm 2 (Hamann, 2011). The boundary values are chosen
so that the clustering solutions produced with the boundary values
will be trivial, i.e. either a single cluster or several singletons.
Algorithm 2. Binary search for iterating alpha values.
1: min MinðGðEÞÞ
2: max MaxðGðEÞÞ
3: Cl 1
4: Cr jV
G
j
5: while min <max 1and Cl <Cr do
6: alpha ðmin þmaxÞ=2
7: c jCutClusteringðGðV;EÞ;alphaÞj .c gets the number of
clusters in the clustering
8: if c¼Cl then
9: min alpha
10: else if c¼Cr then
11: max alpha
12: else
13: if c>Cl and c<ðCr=2Þthen
14: Cl c
15: else if c<Cr and c>ðCr=2Þthen
16: Cr c
17: else
18: End Loop
19: end if
20: end if
21: end while
4.2. Minimum cut tree
Minimum cut trees can be created using, for example, two well-
known algorithms, the Gomorra-Hu algorithm or Gusfield’s algo-
rithm (Cohen et al., 2011). In both, the maximum-flow algorithm
is used n1 times. However, Gusfield’s algorithm is considered
simpler in its implementation as the algorithm operates on a adja-
cency matrix as a representation and requires no contractions or
expansions of the graph, contrary to the Gomorra-Hu algorithm.
Parallel implementations are supported by both algorithms. Gus-
field’s algorithm is presented in Algorithm 3.
In Gusfield’s algorithm the parent of each node in the tree is
tracked. Initially all the nodes in the graph are pointed to the first
node. In each iteration, the source node, s, is picked such that it has
not been used before and the target node, t, is the parent of the s
node. Using the maximum flow algorithm, the minimum cut is
then found between sand t. Any neighbour belonging to tthat is
on the same side of the cut as sand have not been used as source,
have their parent changed to s.
Algorithm 3. Gusfields Minimum Cut Tree Algorithm (Cohen
et al., 2011).
1: function M
IN
C
UT
T
REE
GðV;EÞ;c.A weighted, undirected
graph
2: for i¼1!jV
G
jdo
3: tree
i
1
4: end for
5: for s¼2!jV
G
jdo .jV
G
j1 maximum flow
iterations
6: t tree
s
7: flow
s
max-flow (s;t)
8: {X;X} minimum s-tcut
9: for u2V
G
;u>sdo
10: if u2Xthen
11: tree
u
s
12: end if
13: end for
14: end for
15: V
T
V
G
.Build the minimum cut tree
16: E
T
;
17: for s¼2!jV
G
jdo
18: E
T
E
T
[fs;tree
s
g
19: fðfs;tree
s
gÞ flow
s
20: end for
21: return T¼ðV
T
;E
T
;fÞ
22: end function
5. Data and method
5.1. Data collection
The data consist of residential burglary incident reports col-
lected in a systematic way by law enforcement officers over a per-
iod of six months. The incident reports are collected through a
checkbox-based form, providing a common base of data collected.
Table 1
Mean clustering measurement for Experiment 1.
1st-Quantile 2nd-Quantile 3rd-Quantile Average MeanTest None JTD
(a) Modularity
d1 0:250ð0Þ0:174ð0:119Þ0:133ð0:123Þ0:250ð0:000Þ0:088ð0:112Þ0:125ð0:114Þ0:102ð0:094Þ
d2 0:250ð0Þ0:042ð0:081Þ0:074ð0:097Þ0:070ð0:100Þ0:011ð0:019Þ0:070ð0:102Þ0:108ð0:083Þ
d3 0:250ð0Þ0:086ð0:102Þ0:110ð0:121Þ0:049ð0:075Þ0:077ð0:096Þ0:082ð0:116Þ0:087ð0:076Þ
d4 0:250ð0Þ0:129ð0:113Þ0:025ð0:038Þ0:056ð0:227Þ
0:041ð0:079Þ0:086ð0:101Þ0:082ð0:072Þ
d5 0:250ð0Þ0:062ð0:100Þ0:131ð0:121Þ0:150ð0:143Þ0:082ð0:102Þ0:086ð0:114Þ0:128ð0:090Þ
(b) Coverage
d1 0:000ð0Þ0:000ð0:000Þ0:001ð0:002Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ
d2 0:000ð0Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ
d3 0:000ð0Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:001ð0:002Þ
d4 0:000ð0Þ0:003ð0:006Þ0:001ð0:001Þ0:063ð0:200Þ0:000ð0:000Þ0:000ð0:000Þ0:001ð0:002Þ
d5 0:000ð0Þ0:016ð0:050Þ0:004ð0:012Þ0:018ð0:056Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ
(c) Number of Clusters
d1 250:0 180:8 147:6 250:0 142:0 164:2 165:7
d2 250:092:6 134:8 122:553:3 117:7 171:2
d3 250:0 136:1 152:8 114:6 135:2 126:9 159:4
d4 250:0 166:786:1 149:693:9 135:9 156:8
d5 250:0 132:5 181:9 193:5 141:6 136:0 186:2
d1-d5: Jaccard Goods, Jaccard Residence, Jaccard MO, Spatial Proximity, Temporal Proximity.
Standard Deviation within parentheses.
5256 A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266
The form used consists of eleven sections and 107 checkboxes. In
addition to the checkboxes, information about time, date and geo-
graphical position (longitude, latitude and street address) of the re-
ported incident is also gathered. If required, a field for unstructured
textual descriptions or observations also exists. This field allows
law enforcement officers to enter additional information of
importance.
The incident reports have been gathered from the southern part
of Sweden and the Stockholm area. The reports comprise 2,416 re-
ported residential burglaries. Of the incident reports, law enforce-
ment officers have provided anonymized information about
suspects for 24 residential burglaries, allowing connections be-
tween cases to be established.
5.2. Data representation
The instances are inserted into an nnadjacency matrix, and
for each pair in the adjacency matrix a similarity index is com-
puted as an edge representation. This process is repeated so that
adjacency matrices exist for several similarity indices.
The produced clustering solutions are saved using the DIMACS
format.
1
5.2.1. Edge representation
The edges in the graph are represented by different similarity
coefficients, making the edge weights a measure of similarity be-
tween nodes. The similarity coefficients have been chosen based
on results suggested in previous research. First, the Jaccard index
is computed between crime pairs based on three different MO
characteristics, complete MO characteristics, residential character-
istics, and stolen goods information. Secondly, spatial and temporal
proximity is computed between crime pairs based on geodesic dis-
tance and temporal distance (measured in days) respectively. The
Jaccard calculation is expanded upon in Appendix A.
5.2.2. Edge removal criteria
The minimum cut tree algorithm, when given complete graphs
or near-complete graphs, can produce trees that are star-shaped,
i.e. each node is connected directly to the root node, or unary
(see Section 4). Consequently, it is possible that the clustering
can be improved by converting complete graphs into incomplete
graphs. Two approaches for this conversion are investigated.
In the first approach, several threshold values are computed and
the graphs are pruned based on these values, by keeping only the
edges where the nodes are considered similar to a certain degree.
Threshold edge removal for graph transformation can be consid-
ered a global approach, in that a single threshold value is computed
and used for all edges in the graph. Only edges where the nodes are
considered similar to a certain degree, e.g. below the threshold va-
lue, are kept. The thresholds are defined as the mean and the quar-
tile values.
The second approach use time and distance based measures,
and given the outcome the edge is either removed or the weight
is changed to indicate lesser similarity. The distance-based edge re-
moval can be considered local, i.e. only a single edge is investigated
at a time. Given this, the criteria for removing an edge can be dif-
ferent for each edge. The measures are based on The Mantel Cross
product adaption, and the Journey Time Distance (JTD) (Chainey &
Ratcliffe, 2005). The JTD criteria removes edges between cases that
are physically impossible to have been committed by the same
burglars, i.e. the spatial distance is too large for the temporal span.
The Mantel Cross product adaption is based on the Mantel index,
which is a correlation test between time and distance for pairs of
instances (Levine, 2010). Both measures are expanded upon in
Appendix A.
5.3. Cluster validation measurements
The following cluster validation measurements are used to
measure the quality and accuracy of the minimum cut clustering
algorithm.
True Positive (TP) is a pair of nodes in the same cluster that are
linked to each other. False Negative (FN) is a pair of nodes in differ-
ent clusters that are linked to each other. True Negative (TN) is a
pair of nodes in different clusters that are not linked to each other.
False positive (FP) is a pair of nodes in the same cluster that are not
linked to each other.
Rand Index (RI) is the percentage of correct decisions, i.e. how
well the clustering algorithm has grouped the residential burglar-
ies. RI for clustering can also be denoted Accuracy. One problem
with RI is that, in certain cases, as the number of clusters increase,
the RI increases (Santos & Embrechts, 2009). The RI is computed as:
RI ¼TN þTP
TN þTP þFP þFN ð1Þ
Modularity is a cluster quality index that can be used to measure
how well the clusters group and separate instances, i.e. intra-clus-
ter density and inter-cluster sparsity. It is based on the premise that
the fraction of edges between nodes in a cluster should be higher
than the expected fraction of edges between nodes in a cluster to
indicate significant group structure, see Eq. (2) (Brandes, Delling,
Gaertler, Görke, & Hoefer, 2007; Newman, 2003, 2006). The modu-
larity index maps onto [-1,1].
Q¼X
c2C
jEðcÞj
jEjP
v
2c
degð
v
Þ
2jEj
2
"# ð2Þ
Coverage is a cluster quality index based on intra-cluster den-
sity. It is related to modularity, as modularity is in essence cover-
age subtracted with the expected coverage. Coverage computes
the edges within a cluster divided by the total number of edges,
see Eq. (3).
Co
v
¼X
c2C
jEðcÞj
jEjð3Þ
6. Experiment design
The following two aspects of residential burglary clustering are
investigated. First, the impact of different similarity indices as edge
representations and of different edge removal criteria on the qual-
ity of the clusters produced. Second, the performance with which
the minimum cut algorithm is able to group residential burglaries
without splitting series of crimes.
6.1. Hypothesis
The following hypotheses are investigated in this study.
Experiment 1. The hypotheses of Experiment 1 can be described
as follows:
The choice of edge representation and edge removal criteria can
positively affect the quality of the clusters produced. If the null
hypothesis is not supported, the alternate hypothesis states that
the choice of edge representation affects the quality of the
clustering.
Experiment 2. The hypothesis of Experiment 2 is that high qual-
ity clustering solutions of residential burglaries can result in fewer
crimes to analyze whilst keeping series intact.
1
http://lpsolve.sourceforge.net/5.5/DIMACS_maxf.htm, 2013-02-24.
A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266 5257
6.2. Experiment 1: cluster quality
The first experiment investigates how different edge represen-
tations and edge removal criteria affect the quality of the clusters
created by the minimum cut clustering algorithm. The experiment
consists of two independent variables: edge representation and
edge removal criteria. Each variable has several levels as described
in Sections 5.2.1 and 5.2.2. As such a XYfactorial design, where
X and Y corresponds to the variable levels, is used as an experimen-
tal design (Shadish, Cook, & Campbell, 2002).
The dependent variable of the experiment design is the modu-
larity. Each combination of variable levels is tested 10 times. For
each repetition, a subsample of the dataset is created using simple
random sampling with replacement. The subsample consists of
250 instances.
A between-subjects factorial analysis of variance (ANOVA) is
used to evaluate a factorial experiment design. The between-
subjects factorial analysis of variance allows evaluation of possible
interaction between variables, as well as evaluating significant dif-
ference between variables and levels. Interaction is when the com-
bination of two variables affect each other and thus the dependent
variable in a unpredictable way (Sheskin, 2007).
If there is a significant difference between the factorial combi-
nations, a post hoc test is used after the between-subjects factorial
analysis of variance to detect which factorial combination per-
forms better. In this case, the post hoc test used is Fisher’s LSD test
(Sheskin, 2007). Fisher’s LSD test is vulnerable to type II errors, i.e.
incorrectly supporting a null hypothesis, but have a lower chance
of making type I errors, i.e. incorrectly rejecting a null hypothesis.
If the difference for a comparison is less than Fisher’s LSD value
(CD
LSD
), the null hypothesis is supported. The statistical tests are
conducted using R and the ezAnova package.
6.3. Experiment 2: crime distinction
The second experiment investigates whether residential bur-
glaries can be clustered with high quality whilst keeping series
of crime intact. The accuracy of the clustering is measured using
the RI. The experiment design is similar to Experiment 1 and con-
sists of two independent variables: edge representation and edge
removal criteria. Each variable has several levels as described in
Section 5.2.1 and 5.2.2. Similar to Experiment 1, XYfactorial de-
sign, is used. The dependent variables of the experiment design are
the modularity and RI.
This experiment uses the labeled instances. Labeled instances
are instances where law enforcement agencies have provided
information whether the instance is known to be part of a series
or not. For each repetition, a subsample of the dataset is created
using simple random sampling with replacement from the labeled
instances. The subsample consists of 24 instances. The experiment
uses an identical design to Experiment 1, with the exception of an
additional dependent variable, RI.
For the second experiment, the statistical test outlined in Sec-
tion 6.2 is carried for both dependent variables. To detect relation-
ships between the modularity and RI, Pearson’s correlation
coefficient is used.
7. Results
7.1. Experiment 1
According to the modularity cluster validation measure (see
Table 1(a)), the 1st-Quantile has the worst performance of all the
different edge removal criteria. Similarly, the edge Jaccard Goods
and Temporal proximity representations are performing worse
than other representations. The performances of these edge repre-
sentations indicate that Jaccard Goods and Temporal proximity are
unsuitable for representing differences between crime cases. The
goods available in the form are a few general items, e.g. such as
electronics.
It should be noted that none of the modularity results are posi-
tive, indicating that the cluster solutions produced have a lower
fraction of edges within clusters than the expected fraction of
edges between clusters. As such, the clustering solution can not
be said to have separated clusters well. The clustering solution
has most likely created to many clusters of crimes, meaning that
crimes that are most likely supposed to be grouped together are
not. This makes the job of the analyst harder in that the number
of crimes is reduced too much and connections could be
overlooked.
The coverage of several different edge representations and edge
removal criteria scored 0, indicating that a high number of single-
ton clusters were produced (see Table 1(b)), i.e. that crimes are not
considered to be connected to any other crimes. The coverage clus-
ter validation score is not surprising the negative modularity score,
as the modularity index incorporate similar measure similar as-
pects. However, as the coverage score is mostly 0, focus will be
placed on the modularity score henceforth.
7.2. Experiment 2
In the results of the clustering solutions produced for Experi-
ment 2, a pattern can be observed in that none of the clustering
solutions yield a high modularity index (see Table 2(a)). The highest
mean modularity index that can be observed is 0:018, followed by
0:019. As the modularity index is an index that ranges from 1to
1 and where a positive index value indicate a higher number of
edges within the clusters, the indices produced cannot be consid-
ered good. In fact, all pairings of edge representations and edge re-
moval criteria produce a negative modularity index. Of the edge
removal criteria, the first quartile and the JTD function have the
worst modularity score. Looking at the corresponding pairings in
Table 2(d), these edge removal criteria have produced clustering
solutions that almost consist of singular clusters, i.e. the number
of clusters is close to the number of nodes. Looking at the number
of clusters produced (see Table 2(d)), several factor combinations
produce a high number of singleton cluster solutions, i.e. groups
of crimes can not be produced. The coverage cluster validation mea-
surement (see Table 2(b)) indicates that most of the clustering solu-
tions produced have a low intra-cluster cluster density on average.
This would indicate that the crimes have been separated into too
many groups or individual crimes.
In Table 2(c), for the groupings that have produced singular
clustering solutions, a high Rand Index has been observed. This re-
sult however, can be attributed to the data, which in singular clus-
ters, have several instances should not be connected and thus
increase the Rand Index score. That is to say, when the algorithms
creates groups of singular crimes, the fact that it successfully sep-
arates crimes that should not be connected takes precedence over
connecting crimes that should be connected. But the goal is to pro-
duce smaller groups of crimes for analysis, not single crimes. As
such, when looking at the accuracy of the grouped crimes (i.e. rand
index), one must also consider secondary cluster validation mea-
surements, e.g. the number of clusters produced or the modularity
index. A low number of clusters and a high Rand Index can be con-
sidered positive, as this would mean that the number of crimes to
analyse is reduced. Similarly, a high modularity and a high Rand In-
dex can also be considered indicative of a good clustering solution.
A high number of clusters together with a high Rand Index indi-
cates a clustering solution that have scattered known crimes, e.g.
series of crimes are not grouped. An example of a clustering
5258 A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266
solution can be seen in Fig. 3. The modularity and Rand Index of the
cluster solution is 0:0103 and 0:5579 respectively. In this exam-
ple, the cut clustering algorithm has not been able to keep series
intact, which is indicated in the low rand index.
8. Analysis
The results of the experiments are analysed using an ANOVA
test to detect if there exist any statistically significant difference
between the variables. Fisher’s LSD test is used to detect between
which variables statistical significant differences exist.
8.1. Experiment 1
The means and standard deviations are presented in Table 1. For
the modularity cluster validation measurement, two factor analy-
sis of variance showed a significant effect for the edge representa-
tion, Fð4;315Þ¼6:112;p<:05; a significant effect for the edge
removal criteria, Fð6;315Þ¼20:001;p<0:005; but not any
Table 2
Mean clustering measurement for Experiment 2.
1st-Quantile 2nd-Quantile 3rd-Quantile Average MeanTest None JTD
(a) Modularity
d1 0:250ð0Þ0:091ð0:107Þ0:046ð0:060Þ0:250ð0:000Þ0:053ð0:088Þ0:057ð0:088Þ0:225ð0:035Þ
d2 0:250ð0Þ0:035ð0:024Þ0:045ð0:075Þ0:070ð0:092Þ0:018ð0:099Þ0:032ð0:052Þ0:207ð0:029Þ
d3 0:250ð0Þ0:037ð0:089Þ0:020ð0:096Þ0:029ð0:012Þ0:019ð0:012Þ0:072ð0:095Þ0:224ð0:024Þ
d4 0:250ð0Þ0:031ð0:068Þ0:021ð0:207Þ0:030ð0:094Þ
0:081ð0:095Þ0:120ð0:103Þ0:235ð0:025Þ
d5 0:250ð0Þ0:089ð0:103Þ0:048ð0:059Þ0:060ð0:092Þ0:072ð0:092Þ0:045ð0:075Þ0:246ð0:012Þ
(b) Coverage
d1 0:000ð0Þ0:021ð0:067Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:006ð0:019Þ0:000ð0:000Þ
d2 0:000ð0Þ0:007ð0:023Þ0:000ð0:000Þ0:005ð0:016Þ0:018ð0:016Þ0:000ð0:000Þ0:000ð0:000Þ
d3 0:000ð0Þ0:009ð0:029Þ0:025ð0:079Þ0:020ð0:065Þ0:000ð0:065Þ0:000ð0:000Þ0:000ð0:000Þ
d4 0:000ð0Þ0:005ð0:017Þ0:079ð0:205Þ0:000ð0:000Þ0:000ð0:000Þ0:007ð0:024Þ0:000ð0:000Þ
d5 0:000ð0Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ0:000ð0:000Þ
(c) Rand Index
d1 0:993ð0:006Þ0:854ð0:144Þ0:600ð0:346Þ0:988ð0:014Þ0:512ð0:348Þ0:558ð0:309Þ0:988ð0:006Þ
d2 0:986ð0:011Þ0:706ð0:198Þ0:634ð0:249Þ0:690ð0:294Þ0:464ð0:302Þ0:605ð0:215Þ0:986ð0:009Þ
d3 0:987ð0:011Þ0:589ð0:255Þ0:612ð0:244Þ0:665ð0:282Þ0:639ð0:103Þ0:767ð0:148Þ0:985ð0:010Þ
d4 0:984ð0:011Þ0:535ð0:283Þ0:809ð0:155Þ0:522ð0:220Þ0:619ð0:370Þ0:759ð0:351Þ0:984ð0:012Þ
d5 0:988ð0:011Þ0:699ð0:276Þ0:651ð0:263Þ0:648ð0:306Þ0:658ð0:301Þ0:607ð0:259Þ0:982ð0:013Þ
(d) Number of Clusters
d1 24:017:310:924:010:010:023:3
d2 24:012:410:712:68:59:922:8
d3 24:09:710:611:89:513:823:3
d4 24:09:715:39:012:716:423:6
d5 24:014:011:911:712:310:823:9
d1-d5: Jaccard Goods, Jaccard Residence, Jaccard MO, Spatial Proximity, Temporal Proximity.
Standard Deviation within parentheses.
Series A Series B No known series
Fig. 3. Cluster solution example for spatial proximity and 3rd-Quantile based on labeled data. Clusters are connected by edges.
A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266 5259
significant effect for the interaction of the two factors,
Fð24;315Þ¼2:733, non-significant. As such, there is a difference
in the performance between the edge representation and between
the edge removal criteria, but they do not affect each other.
For the edge representation, Fisher’s LSD post hoc test found
that the Jaccard Residence edge representation scored significantly
better than both the Temporal proximity and Jaccard Goods repre-
sentation (see Table 3(a)). Similarly, Spatial proximity and Jaccard
MO scored significantly better than Jaccard Goods edge representa-
tion. This indicates that residential characteristics, followed by
Spatial proximity, is to prefer when representing similarity or dis-
tance between crimes.
For the edge removal criteria, Fisher’s LSD test indicates that the
MeanTest criterion performs significantly better than the other cri-
teria. The best performance, however, was achieved when not
applying any edge removal criterion. The 3rd-Quantile and 2nd-
Quantile perform significantly better than the 1st-Quantile. The
poor performance of the 1st-Quantile can be attributed to the high
number of edges being removed from the graph, with the conse-
quence that the nodes cannot be connected. As the performance
of not applying an edge removal criterion was similar to the best
performing edge removal criteria, there is little reason to apply
an edge removal criterion. Using edge removal criteria would in-
crease the number of computational steps required when grouping
crimes.
8.2. Experiment 2
The means and standard deviations are presented in Table 2. For
the modularity cluster validation measurement, two factor analy-
sis of variance showed a significant effect for the edge representa-
tion, Fð4;315Þ¼4:339;p<:05; a significant effect for the edge
removal criteria, Fð6;315Þ¼69:735;p<0:001; and a significant
effect for the interaction of the two factors, Fð24;315Þ¼2:733;
p<0:001. As such, there is an significant difference between edge
removal criteria, between different edge representations, and the
two variables affect eachother.
Fisher’s LSD post hoc test for the edge representation found that
there is a difference between the Jaccard MO, Jaccard Residence,
spatial proximity representations and the Jaccard Goods represen-
tation. This is similar to the results in Experiment 1, suggesting
similarity of clustering ability over the data sets. For the edge re-
moval criteria, the LSD test a found significant difference between
multiple criteria. The different groups can be seen in Table 4(a).
Criteria belonging to different groups are significantly different,
with group a performing significantly better than group b. Group
aperforms significantly better than group band c. Fisher’s LSD test
found, for the interaction between the factors, that the Jaccard Res-
idence with MeanTest criteria, Jaccard MO with MeanTest and 3rd-
Quantile criteria, and spatial proximity with 3rd-Quantile criteria
performed significantly better than factors paired with JTD and
1st-Quantile.
For RI, two factor analysis of variance did not show a significant
effect for edge representation, Fð4;315Þ¼0:663, non-significant; a
significant effect for edge removal criteria, Fð6;315Þ¼27:729;p<
0:05; and a significant effect for the interaction of the two factors,
Fð24;315Þ¼2:198;p<0:05. As such, the two variables interact
and affect the performance of each other.
The results of Fisher’s LSD post hoc test for the interaction be-
tween the two factors, Edge Representation and Edge Removal Cri-
teria, see Table 5. Interaction for both the Modularity and Rand
Index is shown. Best modularity mean was found for combinations
of Jaccard Residence or Jaccard MO edge representation together
with Mean-Test edge removal criteria; Jaccard MO or Spatial prox-
imity edge representation together with 3rd-Quantile edge re-
moval criteria. Together with the results of Experiment 1, this
further supports the suggestion that crime similarity is best repre-
sented using Residence, MO or spatial proximity characteristics.
The modularity cluster validation measurement results suggest
that the use of JTD or 1st-Quantile edge removal criteria produces
clustering solutions that are significantly worse than other factor
combinations. This suggests the unsuitability of these edge re-
moval criteria. The interaction result of Fisher’s LSD post hoc test
for the Rand Index validation measurement, found significant dif-
ference between several groups of edge removal criteria (see
Table 4(b)). Similar to the modularity index, this suggests that cer-
tain characteristics are better representations when the goal is the
accuracy of grouping crimes. The 1st-Quantile and JTD performed
significantly better than the other criteria. Fisher’s LSD test for
the interaction between the factors found that pairings with the
JTD and 1st-Quantile performed significantly better than other
pairings, except for Jaccard Goods with the 2nd-Quantile edge re-
moval criteria and spatial proximity with the 3rd-Quantile edge re-
moval criteria. The resulting clustering solutions, however, were
trivial or near trivial, i.e. crimes were separated into singletons or
too small groups. If the clustering solutions consist of too small
groups, or singletons, using it as a selection system is inadvisable.
Table 3
Fisher’s LSD post hoc test for Experiment 1. The group column contains
letters which denotes group belonging and where different letters represent
groups that are statistical significantly different from each other.
Edge Representation Means Group
(a) Edge representation
a
Jaccard Residence 0:0893 a
Spatial proximity 0:0955 ab
Jaccard MO 0:1060 ab
Temporal Proximity 0:1269 b
Jaccard Goods 0:1603 c
(b) Edge removal criteria
b
MeanTest 0:0597 a
None 0:0897 a
3rd-Quantile 0:0947 b
2nd-Quantile 0:0985 b
JTD 0:1014 bc
Average 0:1151 bc
1st-Quantile 0:2500 c
a
CD
LSD
:0:0383.
b
CD
LSD
:0:0324
Table 4
Fisher’s LSD post hoc test for edge removal criteria. The group column
contains letters which denote group belonging. Different letters represent
groups that are statistical significantly different from each other.
Edge Removal Criteria Means Group
(a) Modularity
a
3rd-Quantile 0:0362 a
MeanTest 0:0490 a
2nd-Quantile 0:0570 a
None 0:0656 ab
Average 0:0881 b
JTD 0:2279 c
1st-Quantile 0:2500 c
(b) Rand Index
b
1st-Quantile 0:9880 a
JTD 0:9854 a
Average 0:7032 b
2nd-Quantile 0:6771 b
3rd-Quantile 0:6617 bc
None 0:6595 bc
MeanTest 0:5788 c
a
CD
LSD
:0:0298
b
CD
LSD
:0:0875
5260 A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266
An interesting aspect to note is that there are factorial combina-
tions that belong to high performing groups for both the Modular-
ity and Rand Index. This would indicate that the clustering
solutions provide non-trivial group separation with a above
average accuracy and that these combinations are more suitable
to crime separation than others. The Spatial proximity edge repre-
sentation and the 3rd-Quantile edge removal criteria is one exam-
ple where a high cluster separation is found, as well as a good
accuracy in assigning instances to clusters. Similar can be said of
the Jaccard MO edge representation and where no edge removal
criteria have been applied (marked as None in Table 5).
The calculated Pearson’s correlation coefficient between the
Rand Index and the Modularity cluster validation measurement
is found to be 0:768. This can be seen in Fig. 4. This indicates that
for Experiment 2, the Modularity score of the cluster solutions de-
creases as the Rand Index increases, i.e. as crime groupings get
worse the accuracy increases. It should be noted that this is for
all different factor combinations, including ones creating trivial
clustering solutions. A few outliers can be found with a Modularity
score of 0:2 or higher and with a high Rand Index.
8.3. Validity threats
Validity threats against this study are outlined and discussed in
this section. It should be noted that over time, most of these valid-
ity threats will be corrected with the help of larger data sets and
more labeled information about the data.
First, accuracy can be question as the number of labeled in-
stances is quite small. Consequently, one cannot conclude strong
generalization when it comes to accuracy. There is not much one
can do about this, as the systematic gathering of crime reports in
Sweden is still quite young and have not been adopted by all police
counties. Also, the number of solved cases is quite low, three to five
percent. As the amount of labeled data is quite small, it is quite
possible that some cases should be part of a series but the informa-
tion is not available yet.
Second, the crime reports have been gathered during a rather
short time period of one year. It could be that during this time,
Table 5
Fisher’s LSD post hoc test for combinations of edge representation and edge removal
criteria. The group column contains letters which denote group belonging. Different
letters represent groups that are statistical significantly different from each other.
Modularity Rand Index
Edge
Representation
Edge Removal
Criteria
Means Group Means Group
Jaccard Residence 1st-Quantile 0:250 d 0:986 a
2nd-Quantile 0:035 ab 0:706 bcde
3rd-Quantile 0:045 ab 0:635 cdef
Average 0:070 abc 0:691 bcde
MeanTest 0:018 a 0:464 f
None 0:032 ab 0:606 def
JTD 0:208 d 0:986 a
Jaccard MO 1st-Quantile 0:250 d 0:987 a
2nd-Quantile 0:037 ab 0:590 def
3rd-Quantile 0:021 a 0:612 def
Average 0:029 ab 0:666 bcde
MeanTest 0:019 a 0:639 cdef
None 0:073 abc 0:767 bcd
JTD 0:224 d 0:985 a
Jaccard Goods 1st-Quantile 0:250 d 0:993 a
2nd-Quantile 0:092 bc 0:855 ab
3rd-Quantile 0:046 ab 0:600 def
Average 0:250 d 0:988 a
MeanTest 0:053 ab 0:512 ef
None 0:057 abc 0:558 ef
JTD 0:226 d 0:988 a
Spatial Proximity 1st-Quantile 0:250 d 0:984 a
2nd-Quantile 0:032 ab 0:535 ef
3rd-Quantile 0:021 a 0:810 abc
Average 0:030 ab 0:523 ef
MeanTest 0:081 abc 0:619 cdef
None 0:121 c 0:759 bcd
JTD 0:235 d 0:984 a
Temporal
Proximity
1st-Quantile 0:250 d 0:988 a
2nd-Quantile 0:089 bc 0:610 bcde
3rd-Quantile 0:048 ab 0:651 cdef
Average 0:060 abc 0:648 cdef
MeanTest 0:072 abc 0:659 cdef
None 0:045 ab 0:607 def
JTD 0:246 d 0:983 a
CD
LSD
:0:066 CD
LSD
:0:195
0.0 0.2 0.4 0.6 0.8 1.0
−0.2 0.0 0.2 0.4
Rand Index
Modularity
Fig. 4. Correlation plot between modularity and Rand Index for clustering solutions from Experiment 2.
A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266 5261
crimes have occurred in a pattern that can be considered non-
representative of criminals. Similarly can be said of the counties
from which the crime reports have been collected. This is being
rectified as data gathering takes place and more counties chooses
to join the systematic process.
8.4. Discussion
The results of the experiments highlight two interesting things.
First is the idea of edge removal criteria, which the results indicate
have no benefit on the grouping of crimes. Consequently, when
grouping residential burglaries there is little incentive to not have
similarity information between cases that fulfill edge removal crite-
ria. While the use of edge removal criteria might have more impact
on the cut-clustering adaption than for the domain, it also suggests
that retaining information between cases is important. Initially,
edge removal criteria such as Journey Time Distance would remove
edges/connections between crimes that was not feasible to be con-
nected. However, Journey Time Distance turned out to be one of the
worst performing ways of removing edges between crimes, which
is surprising. It, intuitively, seemed that by removing edges be-
tween cases that were impossible to connect, the clustering algo-
rithm would perform better, which was not the case.
The second interesting outcome is the different suitability for
representing similarity between crimes. The edge representations
with the best performance were Residence characteristics, MO
characteristics, and Spatial proximity. This is unsurprising, as sim-
ilar results have been suggested in previous research conducted.
The goods representation has the worst performance. This is most
likely due to the fact that the goods attributes available are quite
limited and certain items tend to be more attractive to criminals.
That the data are gathered during a limited time period might also
affect the impact of the goods. The high performance of the MO and
Residence characteristics suggests that there exists a certain
behavioural consistency within crimes, and that this consistency
differ between criminals to such an extent that it is possible to
differentiate.
Something to consider is that law enforcement officers have are
more likely to solve crimes committed by local criminals. Crimes
committed by non-local criminals that are active over a larger area,
during a limited time, are more complicated to solve. There might
be a difference in which representation of crime similarities have
the best performance between local and non-local criminals.
This also means that there will be an imbalance in the data, i.e. a
series will not be a large part of the crime cases. Imbalance will al-
ways exist, as serial residential burglars do not conduct all burglar-
ies and of 2~2,000 residential burglaries a year, one group might
only have committed a minority. As can be observed in Fig. 4,as
the groupings get worse, the accuracy score is increased. This is
mostly due to the correct classification of singleton cases out-
weighing the false classification of connected cases. Consequently,
one of the problems is to find a balance between group size and
series quality.
Given the nature of the area, there will be crimes that are sin-
gletons. These cases could be a singleton in the clustering solution,
or be part of a larger group of crimes. As the purpose is to reduce
the number of crimes an analyst has to analyse, as well as to sug-
gest new connections among crimes, this is only a problem as long
as clustering solutions are mostly singletons. This is not always the
case (see Fig. 3). It is possible that an increased performance from
the cut-clustering algorithm, or another clustering algorithm, can
create better clustering solutions. Similarly, the method might also
fail to cluster cases that should be connected. As of today no clus-
tering methods are employed for clustering crimes in swedish law
enforcement agencies and if a specific investigator does not have a
notion of a connection between cases, they will be missed entirely.
So while an approach such as this might miss a crime when group-
ing, the systematic selection of cases is still preferable as a comple-
ment to the expertise of a single investigator, who otherwise keep
earlier cases and suspects in memory.
It should be noted that as the clustering uses distance-based
metrics, e.g. Jaccard index, when the MO for a perpetrator changes
(which it most likely will over time), the distance between newer
and older crimes will increase. This could be problematic, and
should be something to be kept in mind by investigators. A recom-
mendation is that a time window for crimes to investigate is lim-
ited to 6 months, due to change in MO and other variables
(McCue, 2007). A similar issue is the MO is often a consequence
of circumstances. A criminal known for a certain way of entering
the building might, when encountering an unlocked door not use
the same MO and as a result, that specific case will not be same
MO. This is a consequence of only using a single MO characteristic
as a basis for the distance measurement. A potential remedy to this
would be to use a measurement that combines multiple MO char-
acteristics for calculating the distance between crimes. However,
this problem is also mitigated by the fact that the implemented
DSS has knowledge of physical evidence gathered at the scene.
As such, the investigator might have a set of cases where shoe
prints and fingerprints have been found, and can search for local
cases with similar physical evidence and see if matches exist. Such
an approach can detect other MO used by the same burglars.
9. Conclusion
In this article a DSS for managing and analysing systematically
gathered residential burglary reports have been presented. The DSS
allows law enforcement to easily search and compareresidential bur-
glary reports. The DSS contains, among other modules, an analytical
framework. The use of clustering to group residential burglaries in
the DSS has been investigated, using several similarity criteria.
While results of the modularity cluster validation measurement
indicate that the separation between clusters is poor, the Rand In-
dex results still supports further investigation into this area. The
first experiment concerned which representation of residential
burglaries for use with clustering, and whether edge removal
methods increased performance. The results of the first experiment
shows that the choice of edge representation, but not the edge re-
moval criteria, positively affected the modularity score. The second
experiment concerned the whether clustering solutions where able
to correctly cluster crime series. The results in Experiment 2 sug-
gest that, when excluding trivial clustering solutions, a high quality
clustering solution results in with an above chance accuracy, i.e.
0:5>RI >0:8. The experiments have suggested that the choice of
which edge representation to use when grouping crimes can posi-
tively affect the clustering solution. Best performance is found
using Spatial proximity or Residential characteristics as a basis
for comparing crimes. As such, this would indicate that these char-
acteristics are to prefer when law enforcement investigates related
residential burglaries.
The clustering solutions without any edge removal criteria per-
formed within the same group as the highest scoring edge removal
criteria in most cases. The mean modularity score of the experi-
ments were, however, suggesting that the cut clustering algorithm
is not optimal for this domain. An increase in the ability to cor-
rectly cluster crimes would allow law enforcement officers to
investigate fewer amounts of criminal cases with a lower chance
of missing cases in a series of crimes. The results suggest that while
clustering crime series are feasible using cut clustering, further
investigation is needed.
We have identified six aspects for future work. First, we only
had access to a low number of labeled cases and the experiment
5262 A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266
needs to be investigated using a larger labeled data set. Getting ac-
cess to a larger labeled data set is not trivial since identifying series
of crimes is hard. Second, in this study only individual edge repre-
sentations have been investigated. Researchers have found that in
some cases, combinations of edge representation scores have had a
better performance than the individual edge representation. A dis-
tance index using combined crime characteristics, such as spatial
and MO characteristics, needs to be developed. Third, it should
be noted that the performance of other clustering algorithms on this
domain has not been investigated, and a comparison of algorithms
should be conducted. Fourth, an accuracy index that takes into account
the imbalance of the data would be better suited than the Rand Index.
Appendix A. Edge representation and removal criteria
A.1. Edge representation
The edges in the graph are represented by different similarity
coefficients, making the edge weights a measure of similarity be-
tween nodes. The different similarity coefficients are explained in
this section. The similarity coefficients have been chosen based
on results suggested in previous research.
The Jaccard coefficient (otherwise known as the Jaccard index or
Tanimoto coefficient) is a measure of similarity between two pairs,
A and B, based on the data shared and the data unique to each set,
as shown in Eq. (A.1). A similarity value between 0 and 1 is com-
puted, where a value of 0 indicates that the two sets are identical.
Jaccard ¼1jA\Bj
jA[BjðA:1Þ
The Jaccard coefficient is used to compute the similarity between
incident reports based on the complete binary data available, data
representing stolen goods and data representing the target, i.e. res-
idential characteristics.
Temporal proximity between instances is also used as a similar-
ity measure. The data gathered by law enforcement officers con-
tains information to compute the temporal proximity between
residential burglaries. Due to the nature of these crimes, i.e. the
crimes are often committed when the residents are away, the accu-
racy of the reported occurrence time and dates is often low. Conse-
quently, the reported occurrence is often limited to a day of the
week, but reporting would preferable describe when crime oc-
curred within a range of hours. The proximity in time between
cases is computed as A
time
B
time
if A occurred after B and vice
versa.
Similar to temporal proximity, the spatial proximity between in-
stances is used as a similarity measure. Data gathered by law
enforcement officers contains the address where the residential
burglary took place, to a degree that allows us to find the longitude
and latitude. From the coordinates, the proximity between the lon-
gitude and latitude are computed and converted to meters. It
should be noted that the distance computed is the shortest path
between the two cases, i.e. the geodesic distance.
A.2. Edge removal criteria
As discussed in Section 4, the minimum cut tree algorithm,
when given complete graphs or near-complete graphs, can produce
trees that are star-shaped, i.e. each node is connected directly to
the root node, or unary. Consequently, it is possible that the clus-
tering can be improved by converting complete graphs into incom-
plete graphs. Two approaches for this conversion are investigated.
In the first approach, several threshold values are computed and
the graphs are pruned with these values, only keeping edges where
the nodes are considered similar to a certain degree. Threshold
edge removal for graph transformation can be considered a global
approach, in that a single threshold value is computed and used for
all edges in the graph.
The second approach use time and distance based measures,
and given the outcome the edge is removed or the weight is chan-
ged to indicate lesser similarity. The distance-based edge removal
can be considered local, i.e. only a single edge is investigated at a
time. Given this, the criteria for removing an edge can be different
for each edge.
Thresholded edge removal. Thresholded edge removal for graph
transformation can be considered a global approach, in that a sin-
gle threshold value is computed and used for all edges in the graph.
Only edges where the nodes are considered similar to a certain de-
gree, e.g. below the threshold value, are kept. Three different
threshold values, and their ability to produce quality clusters, are
investigated.
The mean value is the sum of values of the similarity indices of a
set of pairs of instances divided by the number of instances. Every
edge in the graph, whose value is above the threshold value, is
removed.
The quartile value is considered the value separating an ordered
set into a number of subsets. The median is the 2-quartile value
separating a set divided into two parts. The quartile value is a set
of three values that divide a set into four groups. The different
quartiles used are denoted as 2nd quartile (Q
2
), also known as
the median, and 3rd quartile (Q
3
).
The Q
2
value
2
is computed as described in Eq. (A.2).
Q
2
ðXÞ¼ X
ðNþ1Þ=
2
if Nis odd
1
2
ðX
N=
2
þX
1þðN=
2
Þ
Þif Nis even
(ðA:2Þ
If the number of items in the set is odd, the Q
2
value is middle value
of the set. If the number of items in the set is even, the Q
2
value is
the mean value of the two items in the middle of the set.
A.2.1. Distance-based edge removal
differs from the threshold edge removal in that it is based on
spatial and temporal proximities (independent of the underlying
edge representation). An additional difference is that distance-
based edge removal can be considered local. That is, only a single
edge is investigated at a time. Given this, the criterion for removing
an edge can be different for each edge.
The Mantel cross product adaption is based on the Mantel index,
which is a correlation test between time and distance for pairs of
instances (Levine, 2010). The Mantel index was designed to allevi-
ate some of the problems with previous indices, where cut off
points affect result and results can be significant both if the
time/space distance is short or long. It is used to detect correlations
between two matrices, and as such needs to be adapted to compare
between two instances only. The Mantel index cross product is de-
fined as follows:
T¼X
N
i¼0
X
N
j¼0
ðX
i;j
MeanðXÞÞðY
i;j
MeanðYÞÞ ðA:3Þ
The variables can be explained as follows: N is the number of in-
stances, X a set of similarities of one index (e.g. space) between
two instances, and Y a set of similarities of another index (e.g. time)
between the same instances. The following equations are used as
base to remove edges where time and space has exceeded a certain
point in the dataset. The time and space proximity between two in-
stances are compared against the mean time or space proximity. If
one of the conditions is negative, the weight of the edge is increased
by half. If both conditions are negative, the edge is removed.
2
http://mathworld.wolfram.com/StatisticalMedian.html, 2013-02-18.
A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266 5263
ðX
i;j
MeanðXÞÞ1ðA:4Þ
ðY
i;j
MeanðYÞÞ1ðA:5Þ
Journey Time Distance (JTD) is a measure used in Geographical
Information Systems (GIS), to determine whether time/space dis-
tances between cases are reasonable (Chainey & Ratcliffe, 2005).
The measure used here is a simplified version that assumes a
straight travel distance and a fixed speed. The JTD is determined
by calculating distance divided by speed equals time, e.g. whether
a criminal reasonably can travel between cases. The equation used
to investigate this is as follows:
X
i;j
100;000 >ðY
i;j
24ÞðA:6Þ
The distance in meters (X
i;j
) between two cases divided by 100;000
(100 km/h) gives the time it would take to travel between the two
cases. If that is larger than the temporal proximity (Y
i;j
) the cases are
reasonably not connected and the edge is removed.
It has been argued that a temporal proximity no greater than
6 months is the longest period that a dataset should span, as on
longer time period the movement of people affects the outcome
(McCue, 2007). This constraint on time span is considered by
removing the edge, if the temporal proximity is longer than
3 months.
Appendix B. Standardised complaint routine
5264 A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266
References
Bennell, C., Bloomfield, S., Snook, B., Taylor, P., & Barnes, C. (2010a). Linkage analysis
in cases of serial burglary: Comparing the performance of university students,
police professionals, and a logistic regression model. Psychology, Crime & Law,
16(6), 507–524.
Bennell, C., & Canter, D. V. (2002). Linking commercial burglaries by modus
operandi: Tests using regression and ROC analysis. Science & Justice: Journal of
the Forensic Science Society, 42(3), 153.
Bennell, C., Gauthier, D., Gauthier, D., Melnyk, T., & Musolino, E. (2010b). The impact
of data degradation and sample size on the performance of two similarity
coefficients used in behavioural linkage analysis. Forensic Science International,
199(1–3), 85–92.
Bennell, C., & Jones, N. J. (2005). Between a ROC and a hard place: A method for
linking serial burglaries bymodus operandi. Journal of Investigative Psychology
and Offender Profiling, 2(1), 23–41.
Bennell, C., Jones, N. J., & Melnyk, T. (2010c). Addressing problems with traditional
crime linking methods using receiver operating characteristic analysis. Legal
and Criminological Psychology, 14(2), 293–310.
Brandes, U., Delling, D., Gaertler, M., Görke, R., Hoefer, M. Nikoloski, Z., et al. (2007).
On finding graph clusterings with maximum modularity. In: Lecture notes in
computer science, graph-theoretic concepts in computer science (pp. 121–132),
Berlin, Heidelberg.
Chainey, S., & Ratcliffe, J. (2005). GIS and crime mapping. John Wiley & Sons, Ltd.
Cohen, J., Rodrigues, L. A., Silva, F., Carmo, R., Guedes, A. L. P., & Duarte, E. P. (2011).
Parallel implementations of Gusfield’s cut tree algorithm. In: 11th International
conference on algorithms and architectures for parallel processing (ICA3PP 2011)
(pp. 258–269), Berlin, Heidelberg.
Flake, G. W., Lawrence, S., & Giles, C. L. (2000). Efficient identification of web
communities. In: KDD ’00: proceedings of the sixth ACM SIGKDD international
conference on knowledge discovery and data mining, ACM Request Permissions.
Flake, G. W., Tarjan, R. E., & Tsioutsiouliklis, K. (2004). Graph clustering and
minimum cut trees. Internet Mathematics, 1(4), 385–408.
Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3–5),
75–174.
Görke, R., Hartmann, T., & Wagner, D. (2009). Dynamic graph clustering using
minimum-cut trees. Berlin Heidelberg: Springer.
Hamann, M. (2011). Complete hierarchical cut-clustering: An analysis of guarantee
and quality, Karlsruhe Institute of Technology.
A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266 5265
Levine, N. (2010). CrimeStat III: A spatial statistics program for the analysis of crime
incident locations (version 3.3).
Markson, L., Woodhams, J., & Bond, J. W. (2010). Linking serial residential burglary:
Comparing the utility of modus operandi behaviours, geographical proximity,
and temporal proximity. Journal of Investigative Psychology and Offender
Profiling, 91–107.
McCue, C. (2007). Data mining and predictive analysis.Intelligence gathering and crime
analysis (1st ed., ). Butterworth-Heinemann.
Newman, M. (2003). Mixing patterns in networks. Physical Review E, 67(2), 026126.
Newman, M. E. J. (2006). Modularity and community structure in networks.
Proceedings of the national academy of sciences, 103(23), 8573–8574.
Oatley, G., Ewart, B., & Zeleznikow, J. (2006). Decision support systems for police:
Lessons from the application of data mining techniques to ‘‘soft’’ forensic
evidence. Artificial Intelligence and Law, 14(1–2), 35–100.
Phillips, P., & Lee, I. (2011). Crime analysis through spatial areal aggregated density
patterns. Geoinformatica, 15(1), 49–74.
Santos, J. M., & Embrechts, M. (2009). On the use of the adjusted rand index as a
metric for evaluating supervised classification. In Artificial neural networks–
ICANN 2009 (pp. 175–184). Berlin Heidelberg: Springer.
Schaeffer, S. E. (2007). Graph clustering. Computer Science Review, 1(1), 27–64.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-
experimental designs for generalized causal inference. Wadsworth Pub Co.
Sheskin, D. (2007). Handbook of parametric and nonparametric statistical procedures.
Chapman & Hall.
Tonkin, M., Woodhams, J., Bull, R., Bond, J. W., & Palmer, E. J. (2011). Linking
different types of crime using geographical and temporal proximity. Criminal
Justice and Behavior, 38(11), 1069–1088.
Toole, J. L., Eagle, N., & Plotkin, J. B. (2011). Spatiotemporal correlations in criminal
offenserecords. Transactionson Intelligent Systemsand Technology (TIST),2(4), 1–18.
Wang, S., Li, X., Cai, Y., & Tian, J. (2011). Spatial and temporal distribution and
statistic method applied in crime events analysis. In: 2011 19th international
conference on geoinformatics (pp. 1–6).
Woodhams, J., Hollin, C. R., & Bull, R. (2010). The psychology of linking crimes: A
review of the evidence. Legal and Criminological Psychology, 12(2), 233–249.
Xue, Y., & Brown, D. E. (2003). A decision model for spatial site selection by
criminals: A foundation for law enforcement decision support. IEEE Transactions
on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 33(1), 78–85.
Zhou, G., Lin, J., & Zheng, W. (2012). A web-based geographical information system
for crime mapping and decision support. In: 2012 International conference on
computational problem-solving (ICCP) (pp. 147–150).
5266 A. Borg et al. / Expert Systems with Applications 41 (2014) 5252–5266