ArticlePDF Available

Abstract and Figures

To identify series of residential burglaries, detecting linked crimes performed by the same constellations of criminals is necessary. Comparison of crime reports today is difficult as crime reports traditionally have been written as unstructured text and often lack a common information-basis. Based on a novel process for collecting structured crime scene information, the present study investigates the use of clustering algorithms to group similar crime reports based on combined crime characteristics from the structured form. Clustering quality is measured using Connectivity and Silhouette index (SI), stability using Jaccard index, and accuracy is measured using Rand index (RI) and a Series Rand index (SRI). The performance of clustering using combined characteristics was compared with spatial characteristic. The results suggest that the combined characteristics perform better or similar to the spatial characteristic. In terms of practical significance, the presented clustering approach is capable of clustering cases using a broader decision basis.
Content may be subject to copyright.
Clustering Residential Burglaries Using Modus
Operandi and Spatiotemporal Information
Anton Borg
*
and Martin Boldt
Department of Computer Science and Engineering
Blekinge Institute of Technology
371 79, Karlskrona, Sweden
*
anton.borg@bth.se
martin.boldt@bth.se
Published 17 December 2015
To identify series of residential burglaries, detecting linked crimes performed by the same
constellations of criminals is necessary. Comparison of crime reports today is di±cult as crime
reports traditionally have been written as unstructured text and often lack a common infor-
mation-basis. Based on a novel process for collecting structured crime scene information, the
present study investigates the use of clustering algorithms to group similar crime reports based
on combined crime characteristics from the structured form. Clustering quality is measured
using Connectivity and Silhouette index (SI), stability using Jaccard index, and accuracy is
measured using Rand index (RI) and a Series Rand index (SRI). The performance of clustering
using combined characteristics was compared with spatial characteristic. The results suggest
that the combined characteristics perform better or similar to the spatial characteristic. In terms
of practical signi¯cance, the presented clustering approach is capable of clustering cases using
a broader decision basis.
Keywords: Crime clustering; residential burglary analysis; decision support system; combined
distance metric.
1. Introduction
Internationally, studies suggest that a large proportion of crimes are committed by a
minority of o®enders, e.g., in the USA research suggests that 5% of o®enders are
involved in 30% of the convictions.
1
This is echoed by the Swedish law enforcement
agencies. Law enforcement agencies, consequently, are required to detect whether a
connection exists between crimes, e.g., whether crimes are linked. In this study a link
exists between residential burglaries that share one or more suspects. The detection
of linked crimes is helpful to law enforcement for several reasons. First, the aggre-
gation of information from crime scenes allows for an increase in available evidence.
Second, the joint investigation of multiple crimes enables a more e±cient use of law
enforcement resources.
2
Third, crime linkage is also bene¯cial for crime prevention,
community safety and other general policing functions.
International Journal of Information Technology & Decision Making
Vol. 15, No. 1 (2016) 2342
°
cWorld Scienti¯c Publishing Company
DOI: 10.1142/S0219622015500339
23
Previously, clustering has been investigated as a method to group crimes based on
characteristics, often spatial and temporal characteristics.
3,4
Recently, other char-
acteristics have been investigated as well, on an individual basis.
5
Research into
estimating linkage using regression analysis has suggested that a combination of
characteristics provides a higher accuracy in linkage estimation. This study inves-
tigates a combined characteristics distance metric for the use in clustering residential
burglaries. Clustering residential burglaries based on di®erent similarity aspects
would potentially allow clustering solutions with a better accuracy and a broader
decision basis than individual characteristics. Similarly, it would potentially allow
law enforcement to ¯nd series whilst reviewing a smaller amount of residential
burglaries, i.e., used as a case selection decision support system (DSS). Consequently,
the use of a combined distance metric would allow law enforcement agencies to save
resources, whilst providing individual investigators with increased support.
1.1. Purpose statement
The purpose of this study is to investigate the e®ectiveness of a combined distance
metric compared to a spatial distance metric. Similarly, the e®ectiveness of di®erent
clustering algorithms are also investigated. The clustering quality is measured using
multiple evaluation metrics and evaluated using statistical tests. A modi¯ed version
of the RI is used to better re°ect the clustering solutions accuracy with regard to
series of residential burglaries. The data comprises residential burglaries from
southern Sweden and the Stockholm area.
1.2. Outline
Section 2presents the related work. Sections 3and 4explain the data and the
methodology. The results are presented in Sec. 5and analyzed in Sec. 6. Finally, the
results are discussed in Sec. 7and the conclusions of the paper presented in Sec. 8.
2. Related Work
Intelligence-led policing and predictive policing are about making law enforcement
less reactive and more proactive.
6
An important aspect of predictive policing is to
link related crimes into series. Much research has been focused on estimating series
based on spatiotemporal characteristics as well as investigating the e®ects con-
cerning repeat and near-repeat victimization.
3,4,79
Linking crime cases has been
investigated before, primarily estimating whether pairs of crime cases are connected.
The pair estimation has mostly been conducted for violent crimes with a high pos-
sibility for series.
2,1014
But research has also been conducted into clustering crime
cases as a means of reducing the number of cases law enforcement o±cers have to
analyze when looking for possible series of crimes.
5,15
The clustering has been in-
vestigated for e.g., residential burglaries. Hotspot detection is a commonly used
technique that can be used to group cases based on spatial information to, based on
24 A. Borg & M. Boldt
density, predict future crime locations.
1621
The research into clustering and pair-
wise link estimation, however, investigated using other crime characteristics, beside
spatial information.
There exist multiple crime characteristics which can be used for comparison, e.g.,
modus operandi (MO), spatial proximity, and temporal proximity. The MO can be
further divided into three domains; entry behavior, target characteristics, and goods
stolen.
22
Entry behavior describes the procedure used to enter the premises. Target
characteristics describe characteristics of the residence being targeted.
Studies have computed the similarity between pairs of crimes based on various
crime characteristics. Many of these studies have used similarity coe±cients between
cases, such as the Jaccard coe±cient.
2
Previous research has suggested that there is a
di®erence between the similarities of linked and unlinked residential burglaries, when
investigating pairs of crimes.
1,11,14,22
Earlier clustering research has investigated clustering using the cut-clustering
algorithm based on single, independent, crime characteristics.
5
Pair-wise link esti-
mation found that there are reasons to combine multiple characteristics.
14
This has
been suggested to increase the accuracy of clustering-based solutions for grouping
residential burglaries. Initial research has investigated model-based clustering to
combine di®erent aspects of crime data.
15
The performance of the cut-clustering algorithm investigated previously did not
produce clustering solutions with a high accuracy. The choice of clustering algo-
rithms a®ects the clustering solution and is dependent on the data investigated.
23
As
such, multiple clustering algorithms should be investigated to suggest an algorithm
more suitable to the domain.
While previous research cluster crimes were based on spatial data, temporal data
or single MO characteristics, this work extends this by also utilizing the additional
MO data into the proposed combined distance metric that is used for clustering
crimes. This enables the possibility to group burglaries based on MO characteristics.
3. Data
The data set consists of residential burglary reports, collected by law enforcement
o±cers according to a two page structured digital form that has been developed in
close cooperation between law enforcement and academia. The content of the form is
based on collected knowledge from crime analysts as well as relevant theory in the
¯eld. In total, the form consists of 114 binary parameters that captures speci¯cs
about the burglar's MO. All 114 parameters are represented as checkboxes in the
form, and as such the values are either 1 or 0 depending on whether checkboxes are
ticked or not. Each form is divided into 11 subsections, as described in Table 1.Asan
example one of the sections, including its parameters, is shown in Fig. 1. In addition
to the binary parameters, the form also includes input ¯elds concerning temporal and
spatial data, i.e., date and time intervals as well as geographical position (latitude,
longitude, and address).
Clustering Residential Burglaries 25
The form is integrated with a structured data collection process that increases the
quality of the collected data compared to traditional open text reports. This is
mainly because the form works as a checklist that guides the law enforcement o±cers
though mandatory questions to ask. Another positive e®ect that comes from using
the form, is due to the tick-based checkbox layout, which instantly discretized the
collected data, making it more easily interpreted by suitable analysis algorithms.
Once a form is ¯lled out, it is automatically veri¯ed and the law enforcement
o±cer is noti¯ed on any inconsistencies. When the automatic veri¯cation process is
passed, and any inconsistencies have been addressed, the form is registered in a
database and made accessible through a custom developed software-based analysis
system. In June 2015, there were approximately 12,000 residential burglary forms
stored in the database, all collected in the southern part of Sweden and the Stock-
holm area. More details regarding both the form and the associated analysis system
are available.
5
In addition to the data collected in the form, law enforcement o±cers have pro-
vided anonymized data about suspects connected to the residential burglary forms.
Using these labeled burglary forms, it is possible to connect cases that share at least
one, or more suspects, i.e., linking cases together into series. As such, a linked crime
pair is a pair of residential burglaries that share one or more suspects.
Table 1. Summary of parameters collected from crime scenes using the digital form.
Name of subsection Description #Parameters
Time and place Date and time range as well as residence address 7
Residential area Rural or urban, number of neighbors, etc. 7
Type of residency fVilla, townhouse, apartment, farmg, number of °ats, etc. 12
Burglary alarm If alarm existed, if it was enabled, activated, sabotaged 5
Object description Lights lit in/outside, member in neighborhood watch, etc. 10
Plainti® Plainti® away or home, prior suspicious events, etc. 15
Break in Method and location of break in 26
Search strategy How the residence was searched for goods 3
Stolen goods Categories of stolen goods, e.g., cash, gold, medicine, etc. 7
Trace evidence Trace evidence secured, e.g., DNA, ¯ngerprint, etc. 18
Miscellaneous Witness, con¯dential hints, and searchable goods 4
Total: 114
Fig. 1. An example of the residential area section depicting a residence located in an urban area with a
single neighbor and located next to a forest or ¯eld.
26 A. Borg & M. Boldt
The present work uses two di®erent data sets created by randomly sampling 100
burglary forms into each of the data sets from the original data set with 226 burglary
forms. The two data sets are denoted D1and D2henceforth and, thus, contain 100
o®enses each. As can be seen in Table 2, the labeled cases contain repeat o®enders
accounting for series that include between two and ¯ve burglaries. However, the
labeled cases also include single o®enders that law enforcement o±cers could only
connect to a single o®ense. The reason for including single o®enders in the study is
because they are used when calculating the Rand evaluation metric further described
in Sec. 4.3.
4. Method
This section describes the distance metrics and clustering algorithms that are eval-
uated using data from the burglary form introduced in the previous section.
Two distance metrics and a set of clustering algorithms are compared over two
data sets. The two data sets are sampled using simple random sampling, where
each data set has 100 instances. The two data sets are denoted D1and D2henceforth.
Two distance metrics are evaluated. The ¯rst is based on spatial data, considered
baseline, and the second is based on a combination of crime location data. Both are
explained further in Sec. 4.1.
A set of clustering algorithms is evaluated using the two distance functions on the
two data sets. The clustering algorithms used are described in Sec. 4.2. Each clus-
tering algorithm and distance function is evaluated on each data set 10 times, where
each data run is randomized so the clustering method produces di®erent clustering
results 10 times. This is done, e.g., by changing the seed if applicable. The number of
clusters is based on the prior knowledge of the series. As such, the number of clusters,
k, is set to the number of series available in each data set, e.g., as shown in Table 2.
It should be noted that a priori knowledge concerning kmight not always be
available, or for some algorithms not necessary. The value of ka®ects the perfor-
mance of the clustering and should be set appropriately, with a number of available
methods for ¯nding k.
24
Methods for investigating the optimal number of clusters,
however, are considered outside the scope of this study.
Table 2. Summary of labeled crimes and series for data set D1and D2.
Crime series size D1count Proportion D1(%) D2count Proportion D1(%)
5 1515
452014
3 6 18 4 12
2 16321020
1a25 25 59 59
Total: 100 100 100 100
aNot actual series but crimes where burglars could only be tied to one single crime.
Clustering Residential Burglaries 27
A set of evaluation metrics is recorded for each run. The evaluation metrics used in
the experiments are described in Sec. 4.3. For the RI evaluation, the clustering solution
is evaluated against the true clustering solution provided by law enforcement. This
enables the comparison of the distance metrics as well as the algorithms investigated.
4.1. Distance metric
Based on checkbox values within the 11 sections of the burglary form, it is possible
to calculate pair-wise similarity measures between cases using the Jaccard index.
Given two cases C1and C2, it is possible to calculate the resulting Jaccard index
by comparing attributes, i.e., the checkbox values, between the two cases according
to Eq. (1). Note that since a checkbox represents a binary value the equation for
calculating the similarity between binary asymmetric attributes is used instead of the
traditional Jaccard index.
JðC1;C2Þ¼ A11
A10 þA01 þA11
:ð1Þ
In Eq. (1), A11 represents attributes that are checked, i.e., given a value of 1, in
both case C1and C2.A10 and A01 represent attributes that are checked in C1but not
in C2, and vice versa. In this study, it is rather the distance between cases that is of
interest, and as such the Jaccard distance is used instead. The Jaccard distance is
complementary to the Jaccard index and is calculated according to Eq. (2).
dJðC1;C2Þ¼ A10 þA01
A10 þA01 þA11
:ð2Þ
By calculating pair-wise Jaccard distances, it is possible to compare burglary
cases with regard to the variables collected. Similarity analyses of burglaries have to
a large extent focused on a single variable as the basis for estimating the similarity
between cases. However, similarity between cases can also be measured using a
combination of multiple variables, e.g., both spatial and MO similarity. Studies that
have investigated linking crime pairs suggested that a combination of multiple
variables performed better than single alternatives.
In this study, a multivariate distance metric is investigated as basis for evaluating
similarity between cases. Table 3shows the mean distance between pairs of crime
cases for the di®erent variables. The table shows the mean for all pairs, not just
linked pairs. If just looking at the linked pairs, the mean (and standard deviation) for
the spatial characteristic is 27:149ð27:073Þkilometers, temporal ¼29:267ð27:943Þ
days, target ¼0:352ð0:133Þ,entrance ¼0:422ð0:102Þ, stolen goods ¼0:219ð0:132Þ,
victim behavior ¼0:161ð0:169Þ, and physical trace ¼0:384ð0:132Þ. Generally, the
linked mean is lower than for all pairs. The target selection variable, however, is not
lower for the linked pairs.
The multivariate distance metric is a weighted Euclidean distance that is calcu-
lated from the compounding variables shown in Table 3. The table also presents the
number of parameters from the structured burglary form that are included within
28 A. Borg & M. Boldt
each of the seven categories. The weights of each variable, shown in Table 3, are
based on the coe±cients from a Logistic regression analysis model previously de-
veloped based on the data, but in accordance with previous research.
14
As such, the
logistic regression analysis used the same feature-rich data as in this study and the
resulting regression coe±cients from that model are used as weights in the proposed
multivariate distance metric within this study. This is one way of deriving the
weights based on prior knowledge. The weights are important because it factors in
that the characteristics are not equally important.
5
It would also allow law en-
forcement to adjust weights according to other considerations, e.g., a speci¯c MO.
The total weighted combined Euclidean distance, dcombined , is calculated accord-
ing to Eq. (3), where Dspatial,Dtemporal,Dtarget ,Dentrance ,Dgoods,Dvictim, and Dtrace are
the included variables, and w1,w2,w3,w4,w5,w6, and w7are the associated weights
extracted from the regression model, as presented in Table 3.
dcombined ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
w1ðDspatialÞ2þw2ðDtemporal Þ2þþw7ðDtrace Þ2
q:ð3Þ
The second distance metric used is the spatial distance metric. This is considered
state of the art. It is based on the euclidean distance, according to (4). It only
comprises the spatial distance between two crime locations, i.e., Dspatial .
dspatial ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðDspatialÞ2
q:ð4Þ
4.2. Clustering algorithms
In this subsection, the four clustering algorithms used to evaluate the premise are
presented. The algorithms are chosen either because they are widely used or because
related studies have indicated the suitability. Whilst the K-means clustering algo-
rithm is one of the more popular algorithms, it does not function reliably on binary
data. Consequently, the K-means clustering algorithm was excluded. The default
implementation of the four clustering algorithms within the Weka machine-learning
software suite
a
were used, except for the Cut-clustering algorithm since it was not
a
http://www.cs.waikato.ac.nz/ml/weka/.
Table 3. Data characteristics.
Variable Metric Par Weight Min Max Median Mean (D)
Spatial Kilometers 3 1.025 0.0 558.140 197 248.061 (229)
Temporal Days 4 1.072 0.0 462.0 150 121.215 (95)
Target selection Jaccard 34 0.0 0.0 0.682 0.545 0.353 (0.135)
Entrance method Jaccard 26 4.799 0.0 0.737 0.677 0.452 (0.134)
Stolen goods Jaccard 10 2829 0.0 0.667 0.529 0.298 (0.157)
Victim behavior Jaccard 15 15.899 0.0 0.695 0.631 0.357 (0.151)
Physical trace Jaccard 22 2884 0.0 0.842 0.642 0.402 (0.181)
Par: shows how many parameters are used for the characteristic.
Clustering Residential Burglaries 29
included in Weka. The Cut-clustering algorithm was therefore implemented
according to the speci¯cation.
25
The default options were used for the weka algo-
rithms. All algorithms had access to a priori information regarding the number of
clusters to use in the analysis. In a real world setting, it would instead be possible to
use on methods for estimating the number of clusters. For instance, the self-tuning
variant of the Spectral clustering algorithm could be used.
26
However, this was not
investigated further in this study.
The Cut-clustering algorithm is a graph-based clustering algorithm relying on
minimum cut tree algorithms to cluster the input data, which is represented by an
undirected adjacency graph.
25
Each node in the graph is an instance and these nodes
are connected if the similarity between the corresponding instances is positive, and if
so the edge is weighted by the corresponding similarity score. The algorithm works
by adding the arti¯cial node to the existing graph and then connecting all nodes in
the graph with it. Then a minimum cut tree is computed and the arti¯cial node
removed. The clusters consist of the nodes connected after the arti¯cial node has
been removed. A high value, results in a higher number of clusters produced, and
vice versa. Using a binary search approach, it is possible to ¯nd the value pro-
ducing a speci¯c number of clusters. The current implementation uses a distance
function and converts the distance to a similarity according to the equation described
for the spectral clustering algorithm.
b
Alternative convertion formulas were tested,
e.g., 1=ð1þdðx;yÞ, but did not impact performance. In order to be comparable
against the spectral clustering algorithm, the same formula was used.
The Expectation-Maximization (EM ) clustering algorithm is a probability-based
clustering algorithm.
27,28
As such it does not use a distance metric. Instead, a set of
kprobability distributions assign attributes to instances within the a priori decided
kclusters. The clustering process is two-fold, ¯rst the initial values of the means
and standard deviations for each of the kprobability distributions are estimated.
Then, each probability that an instance belongs to each cluster is calculated. Second,
the means and standard deviation of each cluster distribution is recalculated
based on the latest clustering result. This process is continued until the classes that
instances are assigned to remain unchanged, which means the EM clustering
algorithm has converged to a maximum. Unfortunately, this might be a local instead
of the global maximum. Therefore, the whole process is repeated multiple times, with
di®erent initial estimate values of the means and standard deviations, to increase the
chance of ¯nding the global maxima. Finally, the largest maxima is selected and its
related kprobability distributions are used in any further clustering.
Hierarchical clustering algorithm is implemented using a either a top-down or
bottom-up (agglomerative) approach.
28
The agglomerative approach begins by
considering each instance as its own cluster. Next, the two clusters with the least
distance between them are identi¯ed and merged together into one new cluster.
Then, the process of ¯nding the two closest clusters and merging them is continued
b
http://www.luigidragone.com/software/spectral-clusterer-for-weka/.
30 A. Borg & M. Boldt
until only one ¯nal cluster exists. The output of the clustering is the sequence of
mergings that could be represented as a hierarchical clustering structure in the form
of a binary tree (dendrogram). A key part of the Hierarchical clustering algorithm
concerns the distance calculation between clusters. Several di®erent methods are
available, such as the single-linkage method that makes use of the minimum distance
between two clusters, which also makes it sensitive to outliers. Another method is the
centroid-linkage that calculates the centroid of a cluster based on its members' in-
ternal distances, and then uses the distance between centroids to determine the
closest clusters. The complete-linkage method computes the maximum distance
between two clusters.
28
The adjusted complete-linkage method, similar to the
complete linkage-method, computes the maximum distance between two nodes from
two clusters. The method then ¯nds the largest distance between nodes within either
of the two clusters and subtracts that from the maximum distance between the two
clusters.
28
The Hierarchical clustering algorithm in this paper uses three di®erent
approaches to calculate the distance between clusters, single-link, complete-link, and
adjusted complete-link.
Spectral clustering is a graph-based clustering algorithm that has been found to
generally detect good clustering solutions.
29,30
The algorithm takes number of clus-
ters and a similarity matrix as input, and calculates an nna±nity matrix for n
instances, where nis the number of instances in the data set.
30
Using Principle
Component Analysis it is possible to identify relevant Eigenvalues and their asso-
ciated Eigenvectors. Next, the Eigenvectors with su±ciently large Eigenvalues are
extracted, and the number of extracted Eigenvectors is equal to the number of
dimensions in the data set. Finally, dimension reduction is carried out by mapping
the extracted Eigenvectors into a new space where the instances could be more
e±ciently clustered. The currently used implementation is adapted to the Weka
framework and, as such, uses a distance function and converts the distance to a
similarity measure.
c
4.3. Evaluation metrics
One of the most important aspects of cluster analysis is the validation of clustering
results. Research into clustering has indicated that it is not reliable to use only a
single cluster validation measure.
23
It is preferable to use multiple measures that
re°ect di®erent aspects of a partitioning. In this study, ¯ve di®erent validation
measures are implemented. The quality of the clustering solution is estimated using
two validity indices, Connectivity and Silhouette index (SI). The connectivity is used
for measuring connectedness.
31
The SI is used for assessing compactness and sepa-
ration properties of a partitioning.
32
For evaluating the stability of a clustering
method, the Jaccard index is used.
33
RI and Series Rand index (SRI) are used for
assessing accuracy.
34
This measure is applied to calculate the agreement between
the clustering solution and the known clustering solution. The traditional RI is
c
http://www.luigidragone.com/software/spectral-clusterer-for-weka/.
Clustering Residential Burglaries 31
calculated using all instances and the SRI is calculated using only the instances that
belong in a series.
Connectivity captures the degree to which cases are connected within a cluster by
keeping track of whether the neighboring cases are put into the same cluster.
31
Let
miðjÞbe the jth nearest neighbor of case i, and let imiðjÞbe zero if iand jare in
the same cluster and 1=jotherwise. Then for a particular clustering solution (par-
tition) P¼fC1;C2;...;Ckgof data set M, which contains minstances (rows) with
ndi®erent experimental conditions or attributes (columns), the Connectivity is
de¯ned according to Eq. (5). It has a value between zero and in¯nity that should
be minimized.
ConnðPÞ¼X
m
i¼1X
n
j¼1
imiðjÞ:ð5Þ
Silhouette index re°ects the compactness and separation of clusters.
32
Let P¼
fC1;C2;...;Ckgbe a clustering solution (partition) of data set M, which contains m
cases. Then the SI is de¯ned according to Eq. (6). In the equation, airepresents the
average distance of case ito the other cases of the cluster to which the case is
assigned, and birepresents the minimum of the average distances of case ito cases of
the other clusters. The SI vary between 1 to 1 and higher value indicates better
clustering results.
sðPÞ¼ 1
mX
m
i¼1
ðbiaiÞ=maxfai;big:ð6Þ
The Jaccard index is used to evaluate the stability of a clustering method.
33
The
considered clustering method is randomized so it produces di®erent clustering results
p¼10 times. The averaged Jaccard index is computed over all pðp1Þ=2 pairs of p
outcomes for each of the data sets D1and D2individually. The Jaccard index is
calculated as follows. Given a pair of clustering solutions of the same data set (M), P1
and P2,ais de¯ned as the number of pairs that belong to the same cluster in P1as
well as in P2. Let bbe the number of pairs that belong to the same cluster in P1but
not in P2. Further, cis de¯ned to be the number of pairs that belong to the same
cluster in P2but not in P1. The Jaccard index between P1and P2is then de¯ned as
in Eq. (7).
JðP1;P2Þ¼ a
aþbþc:ð7Þ
The Rand index is used to calculate the accuracy of cluster solutions (partitions).
This allows for a measure of agreement between two partitions, P1and P2, of the
same data set (M). Each partition is viewed as a collection of mðm1Þ=2 pairwise
decisions, where mis the number of cases. For each pair of cases giand gjin M,
the partition either assigns them to the same cluster or to di®erent clusters. Let abe
the number of decisions where giis in the same cluster as gjin P1and in P2. Let bbe
32 A. Borg & M. Boldt
the number of decisions where the two cases are placed in di®erent clusters in both
partitions. Total agreement, thus accuracy, can then be calculated using Eq. (8). The
RI ranges between 0 to 1, where a higher value indicates a higher accuracy. P2is
known beforehand and is based on labeled data.
RandðP1;P2Þ¼ aþb
mðm1Þ=2:ð8Þ
The Series Rand index is used to calculate the accuracy, but with emphasis on
series. This is implemented similar to the traditional RI, but instead only measures
the agreement of two clustering solutions with regard to cases that are part of a
series, i.e., disregarding from crimes that do not belong to a series.
5. Results
The results are presented in four mnmatrixes (one for each metric) per algorithm
and distance measure. The Cut-clustering algorithm failed to produce nontrivial
clustering solutions when using the combined distance metric, and only produced
nontrivial clustering solutions in 50%of the runs when using the spatial distance
metric. As such, there are no metrics available for the Cut-clustering algorithm when
using the combined distance metric. The Connectivity (Table 5) and SI (Table 4)
indicate the clustering quality. The measured SI can be seen in Table 4. It seems that
while the Spectral clustering algorithm performs better using the combined metric,
the Silhouette indexes of the other algorithms are quite similar.
Table 4. Mean SI for the algorithms and distance functions.
Combined1Combined2Spatial1Spatial 2
Cut 0.46 0.18
EM 0.81 0.86 0.82 0.88
HierarchicalClusterer (Adj. Complete) 0.46 0.44 0.47 0.46
HierarchicalClusterer (Complete) 0.46 0.45 0.48 0.44
HierarchicalClusterer (Single) 0.46 0.45 0.48 0.47
Spectral 0.66 0.62 0.50 0.44
Table 5. Mean connectivity index for the algorithms and distance functions.
Combined1Combined2Spatial1Spatial 2
Cut 49.50 99.00
EM 90.70 97.50 90.70 97.50
HierarchicalClusterer (Adj. Complete) 85.80 91.20 92.50 86.30
HierarchicalClusterer (Complete) 96.70 97.40 97.80 96.50
HierarchicalClusterer (Single) 87.10 94.20 82.60 95.60
Spectral 98.20 97.90 97.40 96.80
Clustering Residential Burglaries 33
The connectivity index do not show any distinct di®erences between the spatial
and combined metric. In fact, for the Hierarchical clustering algorithm there is only
minor di®erence between the two distance functions, as can be observed in Table 5.
Tables 6and 7show the accuracy of the clustering solutions measured by the RI and
SRI, respectively. For both metrics, there are only negligible di®erences between the
combined and spatial metric, but the SRI shows a lower score than the RI. This is
because the accuracy of the clustering solutions are not in°ated by crimes not part of
a series, as the SRI only includes crimes part of a series of residential burglaries.
Table 8shows the stability of the clustering algorithms for the di®erent data sets
using the Jaccard index. The Jaccard index is used to indicate the stability of
the clustering solutions. The EM algorithm shows best performance with a Jaccard
index of around 0.5. The Cut-clustering algorithm only produced nontrivial clus-
tering solutions using the combined metric, and it produced trivial clustering
solutions in 50% of the cases when using the spatial metric. Therefore, the results of
Table 6. Mean RI for the algorithms and distance functions.
Combined1Combined2Spatial1Spatial 2
Cut 0.04 0.09
EM 0.96 0.97 0.96 0.97
HierarchicalClusterer (Adj. Complete) 0.89 0.92 0.89 0.92
HierarchicalClusterer (Complete) 0.97 0.97 0.97 0.97
HierarchicalClusterer (Single) 0.91 0.95 0.91 0.95
Spectral 0.98 0.98 0.98 0.98
Table 7. Mean SRI for the algorithms and distance functions.
Combined1Combined2Spatial1Spatial 2
Cut 0.10 0.12
EM 0.92 0.93 0.92 0.93
HierarchicalClusterer (Adj. Complete) 0.85 0.87 0.85 0.87
HierarchicalClusterer (Complete) 0.92 0.93 0.92 0.93
HierarchicalClusterer (Single) 0.86 0.92 0.86 0.92
Spectral 0.93 0.94 0.94 0.95
Table 8. Mean Jaccard index for the algorithms and distance functions.
Combined1Combined2Spatial1Spatial 2
Cut 0.47 0.65
EM 0.45 0.59 0.45 0.59
HierarchicalClusterer (Adj. Complete) 0.10 0.11 0.10 0.10
HierarchicalClusterer (Complete) 0.21 0.19 0.21 0.19
HierarchicalClusterer (Single) 0.10 0.12 0.10 0.12
Spectral 0.31 0.30 0.22 0.18
34 A. Borg & M. Boldt
Cut-clustering algorithm for the Jaccard metric should be discarded. The Spectral
algorithm produces more stable clustering solutions using the combined metric
compared to the spatial, around 0:3 and 0:2, respectively.
6. Analysis
The results evaluation is two-fold. First, the di®erence between the algorithms
performance for the two distance functions are evaluated using Wilcoxon's test.
Second, the performance of the di®erent algorithms is evaluated using Friedman's
test. The algorithm that has the best mean performance over multiple evaluation
metrics is investigated further using a Nemenyi post hoc test.
6.1. Distance metric comparison
For the Spectral clustering algorithm, the combined distance metric was signi¯cantly
better than the spatial distance metric with regard to SI(W¼12;p<0:05), RI
(W¼80;p<0:05), Jaccard index (W¼105;p<0:05), but not for Connectivity
(W¼138:5;p>0:05). With regard to SRI (W¼400;p<0:05), the spatial distance
metric performed signi¯cantly better. This can be observed in Figs. 25where the
observations of the Spectral clustering algorithm for both data samples have been
visualized using box-plots. While there are some outliers, the ¯gures show that the
two distance functions do not overlap. A signi¯cant di®erence was detected for the
Hierarchical Clusterer (Single) clustering algorithm (W¼278;p<0:05) with regard
to the SI, but not for the other metrics. There were no signi¯cant di®erences found
between the distance functions for Hierarchical Clusterer (Single) or Hierarchical
Combined Spatial
0.45 0.50 0.55 0.60 0.65 0.70
Silhouette Index
Distance function
Fig. 2. SI per distance metric for the Spectral clustering algorithm, indicating cluster solution quality.
Combined Spatial
0.965 0.970 0.975 0.980
Rand Index
Distance function
Fig. 3. RI per distance metric for the Spectral clustering algorithm, indicating cluster solution accuracy.
Clustering Residential Burglaries 35
Clusterer (Complete) clustering algorithms. Since EM does not use a distance metric,
there was no reason to test this. As the Cut-clustering algorithm failed to produce
clustering solutions for the combined distance metric, it must be concluded that the
spatial distance metric is preferable in that case.
6.2. Algorithm comparison
Friedman's test was applied to the di®erent metrics to evaluate whether any algo-
rithm performed signi¯cantly better than another algorithm. Friedman's test found
signi¯cant di®erences between the algorithms for the RI (2¼14:428;df ¼3;
p<0:05) and the SRI (2¼12:149;df ¼3;p<0:05). The test found no signi¯cant
di®erences for the SI (2¼1:75;df ¼3;p>0:05) or the Connectivity index
(2¼12:28;df ¼3;p>0:05). Friedman's test found no signi¯cant di®erence for
the Jaccard index (2¼6:473;df ¼3;p>0:05).
The Nemenyi test for the RI shows that, in this case, the Spectral clustering
algorithm performed signi¯cantly better than the Cut-clustering algorithm and the
Hierarchical Clustering algorithm (using an adjusted complete link approach) at
p¼0:05 and p¼0:01, respectively (Table 9). The Hierarchical clustering algorithm
(using a complete link approach) also performed signi¯cantly better than the Cut-
clustering algorithm. For the SRI, Friedman's test found that the Spectral clustering
algorithm performed signi¯cantly better than the Cut-clustering algorithm and
the Hierarchical Clustering algorithm (using an adjusted complete link approach)
at p¼0:05 and p¼0:01, respectively (Table 10). No signi¯cant di®erence can be
detected between the other algorithms.
●●
Combined Spatial
0.91 0.92 0.93 0.94 0.95
Series Rand Index
Distance function
Fig. 4. SRI per distance metric for the Spectral clustering algorithm, indicating cluster solution accuracy.
Combined Spatial
0.20 0.25 0.30 0.35
Jaccard Index
Distance function
Fig. 5. Jaccard index per distance metric for the Spectral clustering algorithm, indicating cluster solution
stability.
36 A. Borg & M. Boldt
6.3. Evaluation metric analysis
A correlation matrix between the variables was investigated to see if there were any
unlabeled evaluation metrics that could be used to indicate a higher RI. Tables 11
and 12 show how the di®erent variables correlate to each other for the spatial and
combined distance functions. Similar to the box-plots (Figs. 24), the data is limited
to the observations for the spectral clustering algorithm. As can be expected,
the RI and SRI closely correlate to each other regardless of the distance metric. The
Table 9. Nemenyi test results for RI.
Cut EM HC1HC2HC3Spectral
Cut
EM
HierarchicalClusterer (Adj. Complete)
HierarchicalClusterer (Complete) *
HierarchicalClusterer (Single)
Spectral ** *
Average Rank 6 3 5 2 4 1
Critical di®erence at p¼0:05 :3:769, Critical di®erence at p¼0:01 :4:449.
*denotes signi¯cant di®erence at p¼0:05, **denotes signi¯cant di®erence at p¼0:01
HC13: HierarchicalClusterer (Adj. Complete), HierarchicalClusterer (Complete), and
HierarchicalClusterer (Single).
Table 10. Nemenyi test results for SRI.
Cut EM HC1HC2HC3Spectral
Cut
EM
HierarchicalClusterer (Adj. Complete)
HierarchicalClusterer (Complete)
HierarchicalClusterer (Single)
Spectral ** *
Average Rank 6 2.5 5 2.5 4 1
Critical di®erence at p¼0:05 :3:769, Critical di®erence at p¼0:01 :4:449.
*denotes signi¯cant di®erence at p¼0:05, **denotes signi¯cant di®erence at p¼0:01
HC13: HierarchicalClusterer (Adj. Complete), HierarchicalClusterer (Complete), and
HierarchicalClusterer (Single).
Table 11. Correlation matrix for the Combined distance metric.
Connectivity SI RI SRI
Connectivity 1.00 0.10 0.07 0.07
SI 0.10 1.00 0.11 0.06
RI 0.07 0.11 1.00 0.97
SRI 0.07 0.06 0.97 1.00
Clustering Residential Burglaries 37
connectivity correlates negatively to the RI and SRI, also independent of distance
metric. This correlation is not surprising as a lower connectivity indicates a better
cluster solution. For the combined distance metric, there is a positive correlation,
albeit small, between the SI and RI. Surprisingly, there is a negative correlation
between the SRI and SI. This would indicate that, for the spatial distance metric, a
cluster solution which has problems separating clusters potentially has a higher
accuracy.
There is no clear metric that has a high correlation with either the RI or the
SRI. As such, using an evaluation metric which relies on unlabeled data to indicate a
high accuracy seems to be without basis.
7. Discussion
The results and the analysis showed that the combined distance metric performed
as good as or in certain cases better than the spatial distance metric. While
there were exceptions to this, the di®erence between the two in those cases were
negligible. There are advantages to the combined distance metric that are not
available to a single characteristics distance metric.
An advantage is the increased amount of information used. While the spatial
distance metric performs with similar results to the combined distance metric, it
could be argued that increasing the amount of information the clustering solution is
based on allows more robust decision making support. Also, while spatial analysis of
residential burglaries or other types of crimes, i.e., hotspot analysis, can be a good
indicator of crimes part of a series or indicating crime waves, there is no possibility
of identifying series of crimes committed over a longer time period or identifying
a series within a high risk area where multiple criminals operate frequently. In
these cases other information must be included, e.g., MO information. Whilst this
can be done manually by law enforcement o±cers, manual analysis is often resource
demanding, often limited to, e.g., violent crimes, and subject to increased risk of
operator error.
A second advantage to the combined distance metric is that it would allow law
enforcement o±cers to provide their own weights to the di®erent characteristics
based on their expert opinions. Providing clustering solutions that can be deemed
to be adapted to each individual investigation. However, default weights can be
provided based on solved crimes. A drawback of basing the default weights on solved
Table 12. Correlation matrix for the Spatial distance metric.
Connectivity SI RI SRI
Connectivity 1.00 0.27 0.13 0.16
SI 0.27 1.00 0.52 0.53
RI 0.13 0.52 1.00 1.00
SRI 0.16 0.53 1.00 1.00
38 A. Borg & M. Boldt
cases would be that they are biased towards the cases that law enforcement are able
to solve. At the moment, that is cases that have a close spatial and temporal dis-
tance. This could be remedied using organizational improvement, something that,
e.g., Swedish law enforcement is currently working on.
Another reason for using weights is that not all data collected are indicative of a
link between cases. In this case, the target selection characteristics does not seem to
di®er between linked and unlinked cases. It can be questionable whether such data
should be used in the clustering analysis. In certain cases, it might be bene¯cial,
according to law enforcement o±cers, and in such cases the weight for that char-
acteristic should probably be increased. In other cases, it could be necessary to
decrease the weight or remove the characteristic altogether. There are also quality
aspects that might indicate that certain data should not be included. In this study,
unstructured text is excluded as it is di±cult to translate it to structured form
without data quality loss, due to, e.g., use of synonyms, spelling mistakes, etc. Such
considerations must be made when considering analyzing the crime data.
21
A potential drawback to the combined distance metric is that not all clustering
algorithms can be used with it. This is due to the inclusion of binary data in the
instances. Algorithms such as the K-means clustering algorithm require non-binary
data. However, the Spectral clustering algorithm seems to be a good candidate. The
Spectral clustering algorithm performed signi¯cantly better than the Cut-clustering
and Hierarchical clustering algorithms regardless of which distance metric was used.
When evaluating clustering solutions with multiple singletons, the True Nega-
tives in°ate the RI. This is also true of clustering solutions with multiple smaller
clusters. The SRI provides accuracy based on how well the series has been clustered,
without taking into account crimes not part of any series. The SRI, however, is also
susceptible to the problem of multiple small clusters, albeit to a lesser extent than the
RI. The f-measure might be an alternative to the RI.
It should be noted that the number of clusters a®ects the clustering solution. In
this study, prior knowledge of the series in the data set was used to decide the
number of clusters. This information, however, is not always available. As such, it
could be that the value for kin this study is optimal and the results should be
interpreted as optimal. In practice, the value of kmight not always be optimal and
the results might be a®ected. It is worth noting that methods for ¯nding the value for
khave been investigated.
24
There is no cluster evaluation metric that has a high correlation with either the RI
or the SRI. Consequently, it is not possible from the results to identify an evaluation
metric that relies on unlabeled data capable of indicating a high accuracy. This is
unfortunate as the amount of labeled data for residential burglaries is likely to be
sparse. However, it is our opinion that the SI is still a reasonable evaluation metric
when labeled data is missing. The SI re°ects the compactness and separation of
clusters.
32
Each series of residential burglaries should have a high intra-series simi-
larity score and a low inter-series similarity score, which is similar to what the
SI evaluates.
2
The use of multiple evaluation metrics makes it possible to view the
Clustering Residential Burglaries 39
clustering as a multiple criteria decision making (MCDM) problem. Methods exist
for resolving disagreements among evaluation metrics.
35,36
8. Conclusion
The contributions of this paper include, but are not limited to, investigating a
method based on combined distance metrics for analyzing similarities between res-
idential burglaries. Further, its e®ective use by multiple clustering algorithms to
provide a decision based on several variables has also been investigated. Clustering
residential burglaries based on di®erent similarity aspects would potentially
allow clustering solutions with a better accuracy and a broader decision basis than
relying on individual characteristics, providing enhanced decision support for law
enforcement o±cers.
A combined distance metric for clustering residential burglaries has been inves-
tigated. The performance was evaluated based on multiple evaluation metrics using
¯ve clustering algorithms. The combined distance metric was compared against a
spatial distance metric representing the baseline. Wilcoxon's test show that the
combined distance metric generally performed similar or with a higher performance
than the spatial distance metric, but in a few cases it performed negligibly worse.
However, the combined distance metric has the advantage of using a more complete
picture of the residential burglary as the basis for the clustering of the burglary. As
such, it provides a better ground for clustering crime cases than single characteristics.
If burglary series extends both spatially and temporally the additional MO infor-
mation utilized in the present study could aid in linking crimes and thereby creating
useful crime clusters.
The choice of clustering algorithms impacts the performance as measured by the
evaluation metrics. Multiple algorithms were investigated. The evaluation metrics of
the algorithms were evaluated using Friedman's test and the Nemenyi test. The
Spectral clustering algorithm was the highest ranking algorithm and performed with
signi¯cantly better accuracy than the Cut-clustering algorithm and hierarchical
clustering algorithm. This suggests the feasibility of using the spectral clustering
algorithm in the criminology domain.
As knowledge of perpetrators is not common, it is argued that the SI is a rea-
sonable metric to use when evaluating cluster solutions of data without any
knowledge of the perpetrators. However, no clear correlation could be found between
the SI and the accuracy indices for the combined distance metric. This suggest that
for this domain the SI cannot be used to indicate high accuracy clustering solutions.
9. Future work
Two venues for future work have been identi¯ed. First, a study based on more
labeled data would allow the results to be more generalizable. Second, the approach
should be investigated for other crime categories, such as vehicle theft or various
40 A. Borg & M. Boldt
frauds. Di®erent crime categories have di®erent behavioral characteristics, and
whether clustering can be used to group series of crimes has not been investigated
using MO characteristics.
References
1. M. Tonkin, J. Woodhams, R. Bull, J. W. Bond and E. J. Palmer, Linking di®erent types
of crime using geographical and temporal proximity, Criminal Justice and Behavior 38
(11) (2011) 10691088.
2. J. Woodhams, C. R. Hollin and R. Bull, The psychology of linking crimes: A review of the
evidence, Legal and Criminological Psychology 12(2) (2010) 233249.
3. J. H. Ratcli®e, The hotspot matrix: A framework for the spatio-temporal targeting
of crime reduction, Police Practice and Research: An International Journal 5(1) (2004)
523.
4. J. E. Eck, Crime hot spots: What they are, why we have them, and how to map them, in
Mapping Crime: Understanding Hot Spots (National Institute of Justice, Washington DC,
2004).
5. A. Borg, M. Boldt, N. Lavesson, U. Melander and V. Boeva, Detecting serial residential
burglaries using clustering, Expert Systems with Applications 44(11) (2014) 52525266.
6. M. Maguire and T. John, Intelligence led policing, managerialism and community en-
gagement: Competing priorities and the role of the national intelligence model in the UK,
Policing and Society: An International Journal of Research and Policy 16(1) (2006)
6785.
7. K. Bowers and S. Johnson, Who commits near repeats? A test of the boost explanation,
Western Criminology Review 5(3) (2004) 1224.
8. W. Bernasco, Them again?: Same-o®ender involvement in repeat and near repeat bur-
glaries, European Journal of Criminology 5(4) (2008) 411431.
9. D. Johnson, The space/time behaviour of dwelling burglars: Finding near repeat patterns
in serial o®ender data, Applied Geography 41 (2013) 139146.
10. C. Bennell and D. V. Canter, Linking commercial burglaries by modus operandi: Tests
using regression and ROC analysis, Science & Justice: Journal of the Forensic Science
Society 42(3) (2002) 153.
11. C. Bennell, N. J. Jones and T. Melnyk, Addressing problems with traditional crime
linking methods using receiver operating characteristic analysis, Legal and Criminological
Psychology 14(2) (2010) 293310.
12. C. Bennell, D. Gauthier, D. Gauthier, T. Melnyk and E. Musolino, The impact of data
degradation and sample size on the performance of two similarity coe±cients used in
behavioural linkage analysis, Forensic Science International 199(13) (2010) 8592.
13. C. Bennell, S. Bloom¯eld, B. Snook, P. Taylor and C. Barnes, Linkage analysis in cases of
serial burglary: Comparing the performance of university students, police professionals,
and a logistic regression model, Psychology, Crime & Law 16(6) (2010) 507524.
14. L. Markson, J. Woodhams and J. W. Bond, Linking serial residential burglary: Com-
paring the utility of modus operandi behaviours, geographical proximity, and temporal
proximity, Journal of Investigative Psychology and O®ender Pro¯ling 7(2) (2010) 91107.
15. B. J. Reich and M. D. Porter, Partially supervised spatiotemporal clustering for burglary
crime series identi¯cation, Journal of the Royal Statistical Society: Series A (Statistics in
Society) 178(2) (2015) 465480.
16. Y. Xue and D. E. Brown, A decision model for spatial site selection by criminals: A
foundation for law enforcement decision support, IEEE Transactions on Systems, Man,
and Cybernetics, Part C: Applications and Reviews 33(1) (2003) 7885.
Clustering Residential Burglaries 41
17. S. Wang, X. Li, Y. Cai and J. Tian, Spatial and temporal distribution and statistic
method applied in crime events analysis, 19th Int. Conf. Geoinformatics, 2011, Shanghai,
China, (2011), pp. 16.
18. G. Zhou, J. Lin and W. Zheng, A web-based geographical information system for crime
mapping and decision support, Int. Conf. Computational Problem-Solving (ICCP), 2012,
Leshan, China, (2012), pp. 147150.
19. P. Phillips and I. Lee, Crime analysis through spatial areal aggregated density patterns,
Geoinformatica 15(1) (2011) 4974.
20. G. Oatley, B. Ewart and J. Zeleznikow, Decision support systems for police: Lessons from
the application of data mining techniques to \soft" forensic evidence, Arti¯cial Intelli-
gence and Law 14(12) (2006) 35100.
21. S. Chainey and J. Ratcli®e, GIS and Crime Mapping (John Wiley & Sons, US, 2005).
22. C. Bennell and N. J. Jones, Between a ROC and a hard place: A method for linking serial
burglaries bymodus operandi, Journal of Investigative Psychology and O®ender Pro¯ling
2(1) (2005) 2341.
23. A. Borg, N. Lavesson and V. Boeva, Comparison of clustering approaches for gene
expression data, The 12th Scandinavian AI Conf. (Scai), Aalborg, Denmark, (2013),
pp. 5564.
24. C. A. Sugar and G. M. James, Finding the number of clusters in a dataset, Journal of the
American Statistical Association 98(463) (2003) 750763.
25. G. W. Flake et al., Graph clustering and minimum cut trees, Internet Mathematics 1(4)
(2004) 385408.
26. L. Zelnik-Manor and P. Perona, Self-Tuning Spectral Clustering (MIT Press, Cambridge
MA, 2004).
27. R. Xu and D. Wunsch, Survey of clustering algorithms, IEEE Transactions on Neural
Networks 16(3) (2005) 645678.
28. I. H. Witten, E. Frank and M. A. Hall, Data Mining Practical Machine Learning Tools
and Techniques, 3rd ed. (Elsevier Morgan Kaufman, 2011).
29. S. E. Schae®er, Graph clustering, Computer Science Review 1(1) (2007) 2764.
30. J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Transactions on
Pattern Analysis and Machine Intelligence 22(8) (2000) 888905.
31. J. Handl et al., Computational cluster validation in post-genomic data analysis, Bioin-
formatics 21 (2005) 32013212.
32. P. Rousseeuw, Silhouettes: A graphical aid to the interpretation and validation of cluster
analysis, Journal of Computational Applied Mathematics 20 (1987) 5365.
33. P. Jaccard, The distribution of °ora in the alpine zone, New Phytologist 11 (1912) 3750.
34. W. M. Rand, Objective criteria for the evaluation of clustering methods, Journal of the
American Statistical Association 66 (1971) 846850.
35. G. Kou, Y. Lu, Y. Peng and Y. Shi, Evaluation of classi¯cation algorithms using MCDM
and rank correlation, International Journal of Information Technology & Decision
Making 11(1) (2012) 197225.
36. G. Kou, Y. Peng and G. Wang, Evaluation of clustering algorithms for ¯nancial risk
analysis using MCDM methods, Information Sciences 275 (2014) 112.
42 A. Borg & M. Boldt
... Machine learning approaches to crime linkage often include a number of methods to measure similarity and compare their performance (e.g., Borg et al., 2017;Borg & Boldt, 2016;Li & Qi, 2019). This can include similarity metrics such as Jaccard's coefficient (as discussed A. Burrell et al. ...
... This can include similarity metrics such as Jaccard's coefficient (as discussed A. Burrell et al. Aggression and Violent Behavior 79 (2024) 102014 above) and vectors (e.g., Bolle & Casey, 2018;Borg & Boldt, 2016;Li & Qi, 2019). For example, Li and Qi (2019) computed a similarity value for numeric, categorical, and keyword attributes for each crime pair. ...
... The similarity of the criminal process was also included by extracting key information from the crime narrative and matching changes (e.g., taking into account when a behaviour occurred during the offence). Borg and Boldt (2016) used Jaccard's co-efficient as a distance metric (i. e., a measure of pairwise similarity). ...
Article
Abstract This paper reviews the crime linkage literature to identify how data were pre-processed for analysis, methods used to predict linkage status/series membership, and methods used to assess the accuracy of linkage predictions. Thirteen databases were searched, with 77 papers meeting the inclusion/exclusion criteria. Methods used to pre-process data were human judgement, similarity metrics (including machine learning approaches), spatial and temporal measures, and Mokken Scaling. Jaccard's coefficient and other measures of similarity (e.g., temporal proximity, inter-crime distance, similarity vectors) are the most common ways of pre-processing data. Methods for predicting linkage status were varied and included human (expert) judgement, logistic regression, multi-dimensional scaling, discriminant function analysis, principal component analysis and multiple correspondence analysis, Bayesian methods, fuzzy logic, and iterative classification trees. A common method used to assess linkage-prediction accuracy was to calculate the hit rate, although position on a ranked list was also used, and receiver operating characteristic (ROC) analysis has emerged as a popular method of assessing accuracy. The article has been published open access and is free to download from https://www.sciencedirect.com/science/article/pii/S1359178924001046
... It is worth noting that there is a growing literature utilising machine learning approaches for linking. See Bollé and Casey (2018), Borg and Boldt (2016), and Li and Qi (2019) for examples of such work. ...
Chapter
Research has shown that the majority of offences are committed by a minority of offenders. Therefore, any method to help identify prolific/serial offenders is of benefit to the police. Behavioural Crime Linkage (BCL) is a method of identifying series of offences committed by the same person(s) using the behaviour displayed during the offence. This can include, but is not limited to, target selection, control and weapon use, approach, property stolen, and temporal and spatial trends. This chapter will explain the theoretical framework for BCL and common methods for testing the accuracy of this method (e.g. logistic regression, Receiver Operating Characteristic ). The chapter will then outline how BCL has been applied in robbery. It will discuss how the success of BCL is influenced by factors such as type of location (e.g. urban versus rural) and group offending (e.g. can you link offences committed by groups?). This chapter will draw heavily on the PhD research of the author but will cite other literature (e.g. evidence to support the theoretical framework for BCL) where relevant.KeywordsBehavioural crime linkageCrime linkageRobbery
... The unsupervised method mainly finds similar cases to form clusters. Borg and Boldt used four clustering algorithms to identify crime series based on behavioural similarity [21]. Zhu and Xie employed the Restricted Boltzmann Machine (RBM) to obtain the cooccurrence pattern in criminal behaviour to link crimes [22]. ...
Article
Full-text available
Detecting serial crimes is to find criminals who have committed multiple crimes. A classification technique is often used to process serial crime detection, but the pairwise comparison of crimes is of quadratic complexity, and the number of nonserial case pairs far exceeds the number of serial case pairs. The blocking method can play a role in reducing pairwise calculation and eliminating nonserial case pairs. But the limitation of previous studies is that most of them use a single criterion to select blocks, which is difficult to guarantee an excellent blocking result. Some studies integrate multiple criteria into one comprehensive index. However, the performance is easily affected by the weighting method. In this paper, we propose a combined blocking (CB) approach. Each criminal behaviour is defined as a behaviour key (BHK) and used to form a block. CB learns several weak blocking schemes by different blocking criteria and then combines them to form the final blocking scheme. The final blocking scheme consists of several BHKs. Because rare behaviour can better identify crime series, each BHK is assigned a score according to its rarity. BHKs and their scores are used to determine whether a case pair need to be compared. After comparing with multiple blocking methods, CB can effectively guarantee the number of serial case pairs while greatly reducing unnecessary nonserial case pairs. The CB is embedded in a supervised machine learning framework. Experiments on real-world robbery cases demonstrate that it can effectively reduce pairwise comparison, alleviate the class imbalance problem and improve detection performance.
... Several supervised learning algorithms have been applied to crime linkage, including neural networks [11], logistic regression [29,30], decision trees [31], Bayesian classification [32], etc. Researchers use unsupervised methods to identify all serial crimes rather than serial crime pairs, various clustering algorithms [33], outlier detection [34] and Restricted Boltzmann Machine (RBM) [35], etc. In addition, some scholars applied semi-supervised algorithms [13] and fuzzy multi-criteria decision making [36,37] to associate crimes. ...
Article
Crime linkage is a challenging task in crime analysis, which is to find serial crimes committed by the same offenders. It can be regarded as a binary classification task detecting serial case pairs. However, most case pairs in the real world are nonserial, so there is a serious class imbalance in the crime linkage. In this paper, we propose a novel random forest based on the information granule. The approach doesn’t resample the minority class or the majority class but concentrates on indistinguishable case pairs at the classification boundary. The information granule is used to identify case pairs that are difficult to distinguish in the dataset and constructs a nearly balanced dataset in the uncertainty region to deal with the imbalanced problem. In the proposed approach, random trees come from the original dataset and the above mentioned nearly balanced dataset. A real-world robbery dataset and some public imbalanced datasets are employed to measure the performance of the approach. The results show that the proposed approach is effective in dealing with class imbalances, and it can be extended to combine with other methods solving class imbalances.
... Jaccard index compares two sets and calculates the similarity by dividing the size of the intersection with the size of the union of the two sets [3], i.e. as in Equation 1: ...
Conference Paper
Full-text available
For any corporation the interaction with its customers is an important business process. This is especially the case for resolving various business-related issues that customers encounter. Classifying the type of such customer service e-mails to provide improved customer service is thus important. The classification of e-mails makes it possible to direct them to the most suitable handler within customer service. We have investigated the following two aspects of customer e-mail classification within a large Swedish corporation. First, whether a multi-label classifier can be introduced that performs similarly to an already existing multi-class classifier. Second, whether conformal prediction can be used to quantify the certainty of the predictions without loss in classification performance. Experiments were used to investigate these aspects using several evaluation metrics. The results show that for most evaluation metrics, there is no significant difference between multi-class and multi-label classifiers, except for Ham-ming loss where the multi-label approach performed with a lower loss. Further, the use of conformal prediction did not introduce any significant difference in classification performance for neither the multi-class nor the multi-label approach. As such, the results indicate that conformal prediction is a useful addition that quantifies the certainty of predictions without negative effects on the classification performance, which in turn allows detection of statistically significant predictions.
... Clustering (Jain, 2010;Hartigan, 1975;Khemchandani and Pal, 2019) is an unsupervised learning process to partition a given data set into clusters based on similarity/dissimilarity functions, such that the data objects partitioned in the same cluster are as similar as possible, while those in different clusters are dissimilar at the same time. Currently, there have been various clustering methods that were proposed and applied in many areas (Olde Keizer et al., 2016;Benati et al., 2017;Truong et al., 2017;Pham et al., 2018;Motlagh et al., 2019;Borg and Boldt, 2016;Mokhtari and Salmasnia, 2015). ...
Article
Fuzzy c-means (FCM) is a well-known and widely applied fuzzy clustering method. Although there have been considerable studies which focused on the selection of better fuzzifier values in FCM, there is still not one widely accepted criterion. Also, in practical applications, the distributions of many data sets are not uniform. Hence, it is necessary to understand the impact of cluster size distribution on the selection of fuzzifier value. In this paper, the coefficient of variation (CV) is used to measure the variation of cluster sizes in a data set, and the difference of coefficient of variation (DCV) is the change of variation in cluster sizes after FCM clustering. Then, considering that the fuzzifier value with which FCM clustering produces minor change in cluster variation is better, a criterion for fuzzifier selection in FCM is presented from cluster size distribution perspective, followed by a fuzzifier selection algorithm called CSD-m (cluster size distribution for fuzzifier selection) algorithm. Also, we developed an indicator called Influence Coefficient of Fuzzifier (\mathit{ICF}) to measure the influence of fuzzifier values on FCM clustering results. Finally, experimental results on 8 synthetic data sets and 4 real-world data sets illustrate the effectiveness of the proposed criterion and CSD-m algorithm. The results also demonstrate that the widely used fuzzifier value m=2 is not optimal for many data sets with large variation in cluster sizes. Based on the relationship between \mathit{CV}_0 and \mathit{ICF}, we further found that there is a linear correlation between the extent of fuzzifier value influence and the original cluster size distributions.
... Jaccard index compares two sets and calculates the similarity by dividing the size of the intersection with the size of the union of the two sets [3], i.e. as in Equation 1: ...
Chapter
Full-text available
For any corporation the interaction with its customers is an important business process. This is especially the case for resolving various business-related issues that customers encounter. Classifying the type of such customer service e-mails to provide improved customer service is thus important. The classification of e-mails makes it possible to direct them to the most suitable handler within customer service. We have investigated the following two aspects of customer e-mail classification within a large Swedish corporation. First, whether a multi-label classifier can be introduced that performs similarly to an already existing multi-class classifier. Second, whether conformal prediction can be used to quantify the certainty of the predictions without loss in classification performance. Experiments were used to investigate these aspects using several evaluation metrics. The results show that for most evaluation metrics, there is no significant difference between multi-class and multi-label classifiers, except for Hamming loss where the multi-label approach performed with a lower loss. Further, the use of conformal prediction did not introduce any significant difference in classification performance for neither the multi-class nor the multi-label approach. As such, the results indicate that conformal prediction is a useful addition that quantifies the certainty of predictions without negative effects on the classification performance, which in turn allows detection of statistically significant predictions.
Article
Detecting serial crimes is one of the most challenging tasks in crime analysis. Linking crimes committed by the same criminal can improve the work efficiency of police offices and maintain public safety. Previous crime linkage studies have focused on the crime features of modus operandi (M.O.) but did not address the crime process. In this paper, we proposed an approach for detecting serial robbery crimes based on understanding offender M.O. by integrating crime process information. According to the crime narrative text, a natural language processing method is used to extract the action and object characteristics of the crime process, a dynamic time warping method was introduced in the similarity measurement of these characteristics, and an information entropy method was used to weight the similarity of the action and object characteristics to obtain the comprehensive similarity of criminals’ crime process. A real-world robbery dataset is employed to measure the performance of finding serial crimes after adding the crime process information. According to the results, information about the crime process obtained from the case narrative text has significant separability and can better characterize better the offender’s M.O. Five machine learning algorithms are used to classify the case pairs and identify serial cases and nonserial cases. Based on the crime features, the results show that the addition of crime process information can substantially improve the effect of detecting serial crimes.
Article
Full-text available
This paper revisits the earlier claim of one of its authors that a fundamental shift is taking place in policing towards a strategic, future-oriented and targeted approach to crime control—broadly represented in the concept of “intelligence led policing” (ILP)—built around analysis and management of problems and risks, rather than reactive responses to individual crimes. Some doubt may be cast on this view by recent government promotion in the UK of “reassurance” and “neighbourhood” policing, which prioritise responses to community fears and perceptions (rather than analysis of “objective” crime data), and through drives to improve detection rates in reactive investigations. However, ILP need not be understood narrowly in terms of proactive operational methods based on police intelligence, and is not necessarily incompatible with these new concerns. The National Intelligence Model (NIM), now adopted by all police forces in England and Wales, offers a framework of business processes for the management of policing priorities of all kinds: it can incorporate the perspectives of partner agencies and local communities, and can set parameters for reactive as well as proactive responses to crime. The structured use of analysis within the Model potentially takes full account of these factors, yet retains an essentially evidence based process of decision making and prioritisation, as well as a “forward looking” focus on threats to community safety. It may also in time facilitate closer integration of police and Community Safety Partnership processes. This represents an ideal rather than a present reality, and there are major risks to its realisation, including police cultural attitudes and misunderstanding, over-dominance of centrally set targets, and “silo thinking”.
Article
Full-text available
According to the Swedish National Council for Crime Prevention, law enforcement agencies solved approximately three to five percent of the reported residential burglaries in 2012. Internationally, studies suggest that a large proportion of crimes are committed by a minority of offenders. Law enforcement agencies, consequently, are required to detect series of crimes, or linked crimes. Comparison of crime reports today is difficult as no systematic or structured way of reporting crimes exists, and no ability to search multiple crime reports exist.
Conference Paper
Full-text available
Timely mapping of crime locations and accurate detection of spatial concentrations of crime help to identify where crime tends to concentrate in space and time and thus provide important information for law enforcement crime reduction efforts. The main objective of this work is to design and implement a Web-based Geographic Information System (GIS) for crime mapping and decision support. Four hotspot mapping techniques, i.e., choropleth mapping, grid mapping, spatial ellipse mapping and kernel density mapping, are implemented in the system. The system is a rich Internet application and is entirely based on open source software, making it affordable and efficient for many small and medium-sized police departments in developing countries. Results from the prototype development demonstrate that for a Web-based crime hotspot mapping system, rich Internet application technology in combination with open source software is an effective solution.
Article
Full-text available
Classification algorithm selection is an important issue in many disciplines. Since it normally involves more than one criterion, the task of algorithm selection can be modeled as multiple criteria decision making (MCDM) problems. Different MCDM methods evaluate classifiers from different aspects and thus they may produce divergent rankings of classifiers. The goal of this paper is to propose an approach to resolve disagreements among MCDM methods based on Spearman's rank correlation coefficient. Five MCDM methods are examined using 17 classification algorithms and 10 performance criteria over 11 public-domain binary classification datasets in the experimental study. The rankings of classifiers are quite different at first. After applying the proposed approach, the differences among MCDM rankings are largely reduced. The experimental results prove that the proposed approach can resolve conflicting MCDM rankings and reach an agreement among different MCDM methods.
Book
Book description: The growing potential of GIS for supporting policing and crime reduction is now being recognised by a broader community. GIS can be employed at different levels to support operational policing, tactical crime mapping, detection, and wider-ranging strategic analyses. With the use of GIS for crime mapping increasing, this book provides a definitive reference. GIS and Crime Mapping provides essential information and reference material to support readers in developing and implementing crime mapping. Relevant case studies help demonstrate the key principles, concepts and applications of crime mapping. This book combines the topics of theoretical principles, GIS, analytical techniques, data processing solutions, information sharing, problem-solving approaches, map design, and organisational structures for using crime mapping for policing and crime reduction. Delivered in an accessible style, topics are covered in a manner that underpins crime mapping use in the three broad areas of operations, tactics and strategy. * Provides a complete start-to-finish coverage of crime mapping, including theory, scientific methodologies, analysis techniques and design principles. * Includes a comprehensive presentation of crime mapping applications for operational, tactical and strategic purposes. * Includes global case studies and examples to demonstrate good practice. * Co-authored by Spencer Chainey, a leading researcher and consultant on GIS and crime mapping, and Jerry Ratcliffe, a renowned professor and former police officer. This book is essential reading for crime analysts and other professionals working in intelligence roles in law enforcement or crime reduction, at the local, regional and national government levels. It is also an excellent reference for undergraduate and Masters students taking courses in GIS, Geomatics, Crime Mapping, Crime Science, Criminal Justice and Criminology.
Article
In this survey we overview the definitions and methods for graph clustering, that is, finding sets of ''related'' vertices in graphs. We review the many definitions for what is a cluster in a graph and measures of cluster quality. Then we present global algorithms for producing a clustering for the entire vertex set of an input graph, after which we discuss the task of identifying a cluster for a specific seed vertex by local computation. Some ideas on the application areas of graph clustering algorithms are given. We also address the problematics of evaluating clusterings and benchmarking cluster algorithms.
Conference Paper
Clustering algorithms have been used to divide genes into groups according to the degree of their expression similarity. Such a grouping may suggest that the respective genes are correlated and/or co-regulated, and subsequently indicates that the genes could possibly share a common biological role. In this paper, four clustering algorithms are investigated: k-means, cut-clustering, spectral and expectation-maximization. The algorithms are benchmarked against each other. The performance of the four clustering algorithms is studied on time series expression data using Dynamic Time Warping distance in order to measure similarity between gene expression profiles. Four different cluster validation measures are used to evaluate the clustering algorithms: Connectivity and Silhouette Index for estimating the quality of clusters, Jaccard Index for evaluating the stability of a cluster method and Rand Index for assessing the accuracy. The obtained results are analyzed by Friedman's test and the Nemenyi post-hoc test. K-means is demonstrated to be significantly better than the spectral clustering algorithm under the Silhouette and Rand validation indices.
Article
Statistical clustering of criminal events can be used by crime analysts to create lists of potential suspects for an unsolved crime, to identify groups of crimes that may have been committed by the same individuals or group of individuals, for offender profiling and for predicting future events. We propose a Bayesian model-based clustering approach for criminal events. Our approach is semisupervised, because the offender is known for a subset of the events, and utilizes spatiotemporal crime locations as well as crime features describing the offender's modus operandi. The hierarchical model naturally handles complex features that are often seen in crime data, including missing data, interval-censored event times and a mix of discrete and continuous variables. In addition, our Bayesian model produces posterior clustering probabilities which allow analysts to act on model output only as warranted. We illustrate the approach by using a large data set of burglaries in 2009–2010 in Baltimore County, Maryland.
Article
Whilst analysis of crime for tactical and strategic reasons within the criminal justice arena has now become an established need, predictive analysis of crime remains, and probably always will be, a goal to be desired. Opening a window on this over the last 2 decades, prominent research from academia has focused on the phenomenon of repeat victimisation and more recently ‘near repeat’ victimisation, both firmly grounded in the geography of crime. Somewhat limited to the establishment of near repeat behavioural patterns in whole area data, these can be utilised for crime prevention responses on a local scale. Research reported here however, explores the phenomenon through the examination of serial offending by individual offenders to establish if such spatio-temporal patterns are apparent in the spatial behavioural patterns of the individual burglar, and if so how they may be defined and therefore utilised on a micro rather than macro scale. It is hypothesised that offenders' responsible for more than one series of offences will display consistency across their crime series within time and distance parameters for their closest offences in space. Results improve upon current knowledge concerning near repeat offending being the actions of common offenders. Testing of the extracted data indicates that offenders maintain personal boundaries of ‘closeness’ in time and space even when actions are separated by significant time spans, creating stylised behavioural signatures appertaining to their use of and movement through space when offending.