Available via license: CC0
Content may be subject to copyright.
ADVANCING CRIME LINKAGE ANALYSIS WITH MACHINE
LEARNING: A COMPREHENSIVE REVIEW AND FRAMEWORK FOR
DATA-DRIVEN APPROAC HES
Vinicius Lima
Computer and Information Technology
Purdue Univeristy
West Lafayette
vlima@email
Umit Karabiyik
Computer and Information Technology
Purdue Univeristy
West Lafayette
ukarabik@email
ABS TRACT
Crime linkage is the process of analyzing criminal behavior data to determine whether a pair or group
of crime cases are connected or belong to a series of offenses. This domain has been extensively
studied by researchers in sociology, psychology, and statistics. More recently, it has drawn interest
from computer scientists, especially with advances in artificial intelligence. Despite this, the literature
indicates that work in this latter discipline is still in its early stages. This study aims to understand
the challenges faced by machine learning approaches in crime linkage and to support foundational
knowledge for future data-driven methods. To achieve this goal, we conducted a comprehensive
survey of the main literature on the topic and developed a general framework for crime linkage
processes, thoroughly describing each step. Our goal was to unify insights from diverse fields into a
shared terminology to enhance the research landscape for those intrigued by this subject.
Keywords Crime Linkage
·
Criminalistics
·
Criminal Behavior
·
Crime Patterns
·
Artificial Intelligence
·
Machine
Learning
1 Introduction
Crime Linkage (CL) [
1
] is a multidisciplinary field that has garnered significant attention from sociologists, psycholo-
gists, statisticians, and computer scientists [
2
,
3
,
4
,
5
,
6
,
7
]. In essence, they aim to connect pairs of crimes by creating
crime link associations [3]. The CL analysis is typically based on Modus Operandi (MO) or criminal behavior, which
can be inferred from evidence found at crime scenes or other source of police data [
8
]. These MOs/behaviors are
systematically compared to assess the similarities between different crimes [
9
]. High similarity scores often indicate
potential connections, providing valuable insight into shared offender characteristics or underlying patterns within these
criminal acts.
The literature on crime linkage encompasses a wide array of methods and evaluation metrics, alongside various data sets
used to establish connections between criminal incidents. Numerous studies have showcased CL outcomes, particularly
in cases involving property crimes and sexual offenses [
5
]. The CL is in essence a classification problem, where the
goal is to either assess if a pair of crimes is linked or not, or to associate a case to a series of offenses, based on pattern
characteristics.
There are many methodologies for modeling and evaluating crime linkage. Data-driven approaches in CL have been
explored using statistical methods, fuzzy logic, and machine learning techniques. Although recent years have seen a
surge in artificial intelligence (AI) research, studies applying machine learning to crime linkage are still in the early
stages. We sought to understand the reasons behind this. To gain insight, we surveyed key data-driven work, including
both statistical and machine learning approaches, and analyzed the challenges - whether implicit or explicit - associated
with handling large amounts of data.
arXiv:2411.00864v1 [cs.LG] 30 Oct 2024
Previous literature surveys have not included machine learning approaches [
3
,
4
,
5
], thus, our aim is to provide an
extension of research assessment on the theme. When analyzing the studies collected, we observed a general framework
in the CL process, which will be also discussed in this paper.
In summary, the contributions of this study is two fold:
•
(1) A comprehensive survey of the main literature on data-drive crime linkage and outline of a general
framework of the linkage process to support further research in the topic.
•
(2) Analyze the main challenges faced by crime linkage, especially when dealing with machine learning and
bigger datasets, and thus support future work.
The structure of this paper unfolds as follows. First, we will discuss the main literature on crime linkage, outlining the
concepts and definitions involved. Then, in Section 3, we will describe the general framework usually used to work on
CL, as a discussion of each step and the techniques involved. Section 4 is dedicated to elaborate on the challenges faced
by the topic, as potential solutions when applicable. Finally, we will wrap up with the main conclusions of this work.
The Appendix (Section 6) will show all the papers surveyed and details on each study.
2 Related Work
Crime Linkage, a.k.a. Case Linkage or Behavioral Crime Linking, has been a subject of extensive study, particularly
within domains such as Criminology and Psychology. Its origins trace back to the 1970s [
1
], with a notable surge in
practical research observed in the 2000s [
4
]. Although related to criminal profiling [
5
], crime linkage specifically refers
to “the process of linking two or more crimes together on the basis of the crime scene behavior exhibited by an offender”
[
10
]. In other words, this discipline is interested in understanding pattern characteristics in criminal behavior that might
suffice connections between which seem to be unrelated crimes, often in the context of serial offenders.
Crime Linkage is underpinned by two assumptions in criminal behavior: consistency and distinctiveness. The latter
posits that serial criminals exhibit patterned actions akin to a signature behavior [
11
], while the former suggests that this
signature is unique enough to differentiate their criminal activity from others [
10
]. However, in terms of data-driven CL,
Bennell et al. [
12
] introduced three other assumptions that challenge research in this area: the reliability of coded data
within systems, the accuracy of the data, and analysts’ ability to accurately link crimes using this data. Basically, the
reliability and suitability of the data to draw conclusions on the link between crime remain the subject of debate [13].
The primary focus of this topic lies in identifying pairs or series of crimes committed by the same perpetrator. This
field is of significant importance, and CL has already been used as evidence in court proceedings to implicate suspects
[
14
,
15
,
16
]. In particular, Keppel documented the first case in Canada where crime linkage was used as evidence,
which a decision later corroborated the result through DNA analysis [
17
]. However, the authors have debated whether
and how pattern behavior can be used as evidence in court. Labuschagne [
18
] points out that a linkage decision should
not be based only on computerized approaches, while Canter et al. [
19
] do not recommend exclusively experience-based
decisions.
In 1993, the U.S. Supreme Court established guidelines for the incorporation of scientific evidence into legal proceedings,
now enshrined in Rule 702, commonly known as the Daubert standard [
20
]. This landmark decision provided a
framework to bolster the credibility of forensic science evidence. The standards outlined for admissible scientific
evidence include factors such as reliability, peer review and publication, error rates, general acceptance within the
scientific community, the existence of known standards, and applicability to the specific case at hand. In the context of
Data-Driven Crime Linkage and its conformity with the Daubert standard, Pakkanen et al. [
20
] proposed an evaluation
framework centered on three key aspects:
•
Consistent Behavior: This pertains to assessing the probability of consistency in linking one crime to another
and the associated margin of error. It also involves determining to what extent the findings from specific
samples can be generalized to broader populations. A comprehensive examination of the core attributes linking
ostensibly distinct crimes becomes imperative.
•
Reliable Database: This aspect revolves around the meticulous coding of variables in databases. Given the
subjective nature of variables, careful selection methods are essential yet challenging. In addition, determining
which variables are crucial in establishing crime linkages warrants further investigation. Furthermore, databases
should encompass not only solved cases but also unsolved ones. Thus, preliminary studies are needed to
discern when a crime constitutes a part of a serial set to ensure ecological validity.
2
•
Frequency: This involves quantifying the similarity between linked crimes. The mere presentation of similari-
ties and differences in variables between crime pairs is insufficient; it is equally vital to ascertain the likelihood
of attributing them to the same offender, particularly in cases involving serial and one-off occurrences.
The utilization of databases to facilitate crime linkage traces back to the inception of the FBI’s Violent Criminal
Apprehension Program (ViCAP) in 1985 [
12
]. ViCAP was specifically designed to serve as a centralized intelligence
hub aimed at identifying potential connections among seemingly disparate crimes, avoiding the problem known as
linkage blindness [
21
]. The program was developed to mitigate issues stemming from communication gaps between
police stations and to facilitate the identification of links that span across different jurisdictions, especially those related
to homicide, sexual offenses,and missing people [
22
]. However, ViCAP relies primarily on the manual entry of relevant
crime scene and MO data, requiring training and expertise to potentially provide links effectively. Another significant
database employed for CL purposes is the Violent Crime Linkage Analysis System (ViCLAS), which has been used in
many countries such as Canada, New Zealand, Germany, Belgium, Great Britain, and some American states [23].
However, the coding process for entering information into these databases has been scrutinized. To demonstrate this,
Martineau and Corey [
13
] conducted a study testing the level of agreement among Canadian police officers in coding
crime linkage scenarios, yielding a 38% agreement for homicide cases and 25% for sexual assault cases. This highlights
the subjectivity of human evaluation on criminal matters, which can greatly impact the assessment of when a crime can
be considered linked or not, potentially leading to false accusations. Moreover, the process of decision-making for crime
linkage itself has sparked discussion due to the absence of standardized training or evaluation methods for this intricate
and subjective decision-making process. For instance, Bennell et al. [
24
] demonstrated that students outperformed
police personnel in associating crimes with serial offenders, and a logistic regression model outperformed humans.
The field of CL encompasses various methodologies, and some have divided them into Comparative Case Analysis
(CCA) and Case Linkage Analysis (CLA) [
4
]. CCA involves comparing features of a crime against a database to identify
similar cases. A survey conducted by Burrell and Bull (2011) with crime analysts from the UK police highlighted
the challenges in identifying series of crimes by analyzing elements of crime scene behavior, including time, location,
modus operandi, crime evidence, and characteristics of offenders and victims [
25
]. However, gathering forensic data,
particularly for CCA purposes, was noted as particularly challenging. Within CCA, two distinct sub-methods exist:
reactive and proactive [
10
]. Reactive Linkage involves comparing a specific index crime with many or all crimes in a
database, often providing a similarity score to quantify the resemblance between two criminal activities or the likelihood
of them being perpetrated by the same offender. Proactive Linkage, on the other hand, clusters crimes together to
identify potential series of crimes.
CLA focuses on determining whether pairs of crimes or a small subset are linked [
26
]. Typically, experts offer opinions
based on their own methodologies to establish links between cases, relying on specific signature behaviors repeatedly
exhibited by the criminal [
27
,
28
]. While experts may provide detailed information on MO characteristics, these are
often specific to particular cases and cannot be generalized [
4
]. Notably, famous serial killers have been studied and
analyzed based on their signature behaviors, enabling the connection of their crimes [29, 30, 15].
2.1 Modus Operandi in the Crime Linkage Context
As highlighted, understanding signature criminal behavior is pivotal to crime linkage. Commonly known as modus
operandi, it refers to the patterned behavior exhibited by a perpetrator during criminal activities, aimed at safeguarding
their identity, selecting a victim, ensuring the success of the crime, facilitating escape, and evading detection [
31
,
32
,
33
,
34
,
35
]. In the context of crime linkage, the aim is to gather sufficient evidence to discern these behavioral patterns and
encapsulate them as a signature behavior. Assuming they are consistent and “individualizable” (distinctiveness), these
patterns can be utilized to identify serial offenders or establish connections between related crimes (for example, using
MOs to cluster a network of offenders). Data-driven studies typically deal with a predefined set of MO attributes, usually
coded by experts or directly inputted into databases. However, this approach somewhat contradicts the assumptions
posited by Woodhams et al. (2007) [
3
] since a finite and discrete number of MO characteristics may limit the diverse
possibilities of distinctiveness within offender behavior. It is expected that the signature behavior lies within the nuances
of the crime scene rather than in a generalized combination of attributes. This will be further discussed in the challenges
section.
The discussion of whether MO is sufficient to translate links in crime has been topic of many research discussions. In
fact, certain studies have indicated that spatial analysis of crime (i.e. proximity of crimes) is a more effective feature for
linking crimes compared to traditional MO [
36
]. Indeed, due to the dynamic nature of MO, extracting its key elements
is a hard task that necessitates extensive experience [
8
]. This difficulty in delineating MO may contribute to the scarcity
of data-driven studies in crime linkage, particularly in homicides [
5
], as it will be discussed later. It can be inferred that
3
burglaries and robberies can often exhibit a well-defined set of MO attributes, while the same cannot be said for other
type of crime.
2.2 Data-driven studies
This section is dedicated to briefly outline the main applied studies on Crime Linkage, with a particular emphasis on
data-driven methods, i.e. falling under the CCA approach. Four literature review papers were identified concerning
crime linkage in practice. Woodhams et al. [
37
] provided an early review, discussing fundamental concepts and
the primary challenges faced by CL at that time. Bennell et al. (2012) delved into the underlying assumptions of
computerized crime linkage systems. In a subsequent work, authors in [
38
] analyzed evaluation measures of crime
linkage, particularly focusing on the AUC (Area Under the Curve), and identified crime type, behavior domain, and
distance as key factors affecting performance. While predominantly focused on criminal profiling, Fox et al. [
5
]
dedicated a section to discussing the 17 papers compiled by Bennell et al. (2014), with a focus on the effect size of the
studied samples. Davies et al. [
39
] conducted the most recent literature review on the topic, describing the challenges
to obtain crime linkage in practice. Their focus extended beyond the methodology for crime linkage outcomes to
encompass studies exploring the overall practice and usage of crime linkage, yielding 30 relevant papers. Simirlaly, this
study concentrates specifically on practical methods for deriving crime linkage, particularly focusing on studies utilizing
crime datasets. However, unlike previous works, this study will emphasize statistical/machine learning methods and the
nuances of applying these techniques.
In this context, much of the research has focused on property crimes such as burglary [
40
,
36
,
41
,
42
,
43
,
44
,
45
,
6
,
46
,
47
,
48
,
49
,
50
,
2
,
51
], robbery [
52
,
53
,
54
,
45
,
55
,
56
,
57
], car theft [
58
,
59
,
44
], and arson [
60
]. In terms of crimes against
individuals, there is a notable focus on sexual crimes (not necessarily involving murder) [
61
,
62
,
63
,
64
,
65
,
66
,
67
,
68
],
with three studies delving into homicide [
42
,
69
,
70
]. Additionally, there are studies where researchers have examined
crime linkage across various crime types, including a variety of crime categories [
71
], burglaries, robberies, and car
thefts [
72
,
73
], and burglaries, robberies, and assaults in general [
47
]. Refer to the Appendix table (Section 6) to check
other crime types related to topic in question in this paper.
Within purely statistical approaches, researchers have explored various methodologies to ascertain the probability of
linking one crime to another, employing techniques such as logistic regression (LR), Naive Bayes, and decision trees
(DT). Naive Bayes is based on Bayes’ Theorem, which calculates the probability of a particular event occurring given
the occurrence of another event. Despite its simplicity, the "naive" assumption assumes that features (variables) are
independent of one another, which rarely holds in real-world scenarios but still often yields good results. For example,
Porter (2016) took this a step further by utilizing the Bayes factor to inform the decision-making process regarding
which model to employ [
2
]. The main advantage for using Bayes models is that the results are more easily interpret as
compared to other more advanced machine learning techniques.
The main methodological preference among many researchers for CL is logistic regression [
74
]. This approach is
suitable for CL, as the task typically involves binary classification—deciding whether crimes are linked or not. LR
estimates the probability of crimes being linked based on a linear combination of input features, using a logistic function
(sigmoid, for instance) to constrain the output between 0 and 1, which can be interpreted as a probability. During
training, the model learns the optimal weights for the features and identifies a decision threshold, which can later
be evaluated for performance. One limitation of this method is its assumption of a linear relationship between the
input features (crime features) and the output (linkage decision), which may not accurately capture more complex or
non-linear patterns in real-world crime data. Decision Trees have also being in CL context, but mainly as a comparison
with LR approaches. For example, Tonkin et al. [
73
] demonstrated the superior performance of regression models over
tree-based models and showcased that incorporating specific crime type behaviors could further enhance performance.
The Appendix table (Section 6) provides more examples on how these models have been used.
This study will give more emphasis on more advanced machine learning techniques, particularly because these
approaches have not been covered in previous literature surveys. We will also focus on CCA reactive linkages, as this
method is preferred among researchers. However, other studies have explored machine learning clustering techniques
to evaluate crime linkage decisions [
48
,
75
,
76
]. Additionally, while not strictly classified as machine learning, some
authors have utilized fuzzy logic to assess crime linkage scenarios [77, 78, 79, 49].
2.3 Machine Learning studies
With the advancements in Artificial Intelligence, numerous studies have harnessed these technologies for applications
within the justice system. It is important to distinguish between crime prediction and crime linkage. Crime prediction,
also known as Predictive Policing, has been the focus of numerous authors in Computer Science, employing a wide
array of methodologies [
80
]. Essentially, crime prediction involves the creation of a model trained on big data, typically
4
to detect crime locations, although it has also been used to predict the likelihood of an individual committing a crime
[
81
]. In contrast, Crime Linkage is concerned with connecting one crime to another, potentially identifying the same
offender. Within the realm of Machine Learning, there also exists a diverse range of methodologies to address the crime
linkage problem.
Machine Learning is the science behind Artificial Intelligence focused on discovering patterns and insights in large and
complex datasets. While both statistics and ML share common foundations, they usually differ in their applications
[
82
]. Statistics typically deals with well-defined models and smaller datasets, aiming to make inferences and understand
relationships. In contrast, machine learning often handles larger, more intricate datasets, using advanced algorithms to
uncover patterns and make predictions. Notably, Natural Language Processing (NLP) techniques, a branch from ML,
have been employed on textual data from police reports to discern relationships among incidents and thereby establish
case links [
6
,
46
,
83
,
84
,
85
,
86
,
54
]. Most of the NLP works focused on different techniques to retrieve information
and extracting MO features from these police narratives. Additionally, some studies have utilized text vectorization
techniques to compute similarity scores. Commonly referred to as embeddings, this approach involves transforming
textual information into high-dimensional vectors, thereby capturing latent textual features that can be compared against
other reports.
A summary of data-driven studies on crime linkage can be found in the Appendix (Section 6) . However, below we
describe the main works that can be considered purely based on ML.
Chi et al. [
57
] implemented a simple three-layer neural network, with the middle layer comprising two nodes designed
to calculate the similarity within input crime features (MOs). The third layer generated the final output similarity
score between the two crimes. While the neural network automatically calculated weights to fit the model, the authors
proposed the involvement of an experienced human in the loop to adjust the weights as needed. Additionally, they
introduced a separability index to prune input features that did not contribute to the linkage output, thus enhancing the
process. This approach not only improved the accuracy of linkage analysis but also revealed to analysts which input
features were actually related to the crime linkages. The authors noted that this technique was successfully incorporated
into a government security office in China.
Y.-S. Li and Qi [
54
] introduced a methodology to compare robbery cases in China by employing various steps to
measure differences between two crimes. Their approach involved using absolute distance for numerical attributes,
Jaccard’s coefficient for categorical attributes, similarity between Word2Vec [
87
] embeddings of selected keywords,
and Dynamic Time Warping (DTW) [
88
] Information Entropy [
89
] for assessing dissimilarities between the narratives
of two crimes, also referred to as “crime processes”.
Zhu and Xie (2021) proposed employing TF-IDF and Restricted Boltzmann Machine (RBM) [90] with regularization
on a textual dataset of 911 calls to extract location, time, and MOs [
45
]– an approach they initially presented in their
earlier work [
91
]. This unsupervised technique enabled them to determine if a crime series had been detected within
a pool of crime incidents. However, personal narratives of crimes can inherit bias context [
92
,
93
] and their method
utilizing 911 calls and a bag of words approach may not be the best choice for detecting crime linkage. For instance, the
word “black” was among the keywords identified by their method to be used as what they considered MO. Moreover,
TF-IDF does not take into account the context in which the retrieved keywords were inserted, rendering it susceptible to
bias detection towards longer documents [94]. More on textual embedding techniques will be discussed later.
Another comparable study is conducted by Solomon et al. [
46
], in which the authors leverage burglary police reports to
extract 40 predefined MO details using fastText [
95
] and smooth inverse frequency (SIF) [
96
] embedding techniques.
They then assess the similarity between the two crimes by incorporating the distance and time difference between the
two cases into the MO vector. Subsequently, they input these MO vectors, along with the spatial-temporal differences,
into a Siamese neural network [
97
] to determine the probability of these two inputs being identical. The authors
demonstrated the efficacy of their methods in a different language (Hebrew) using narrative text from victims and
dialogue between victims and police officers. While their techniques exhibited high performance in burglaries cases
where a predefined set of MO may suffice to characterize the crime, it remains unclear whether these methods would
also be applicable to crimes against persons [98].
In a pragmatic approach, Chohlas-Wood and Levine [
99
] from the New York Police Department (NYPD) developed an
application called Patternizr. This tool utilizes a blend of structured information, including location, crime subcategory,
modus operandi details (weapon, victim count, etc.), suspect information (weight, height, etc.), and unstructured data
such as crime narrative complaints. The primary objective is not to compute the final crime linkage, akin to the objective
of this research, but rather to furnish a list of the most similar cases to glean insights and enhance investigations.
However, while the authors assert fairness by excluding the race attribute, other studies have demonstrated that this
approach does not eliminate bias [
83
,
100
]. Furthermore, they utilized Word2Vec which has also been demonstrated to
carry potential bias in its embeddings [101, 102].
5
A significant challenge in crime linkage arises from the imbalance existing in the gold standard [
103
]. Typically, the
datasets used to test models exhibit considerable imbalance, with a surplus of non-linked cases compared to linked
cases. Given that most methods involve comparing each pair of crimes, the pool of potential linkages is notably small.
Consequently, machine learning models tend to favor the majority class, rendering accuracy evaluations problematic, as
they predominantly yield correct predictions for the abundant non-linked cases. Using accuracy as the sole evaluation
metric can lead to misinterpretation, particularly in such scenarios where there is a significant discrepancy between the
two binary classes. While some authors have proposed methods to address this imbalance issue [
55
,
7
], their efficacy
has been limited, with only marginal improvements observed, and primarily addressed the context of robbery cases. We
will dive more into this in Section 4.
3 Crime Linkage Framework
Overall, the crime linkage assessment process follows a similar structure among the studies reviewed. The pairwise
comparison is illustrated in Figure 1. The primary goal is to extract variables that characterize each crime case
(
C1, C2, ...Cn
), which in our context are referred to as MO attributes. Next, the idea is to calculate similarity
measurements between these attributes. Finally, all these similarity scores (SS) are averaged (or combined using other
methods, such as weighted averaging) to determine a final score (FS). A score above a certain threshold is considered
linked, while a score below that threshold is considered unlinked. However, just averaging the scores might not take into
consideration other complexities of the criminal information, and thus a machine learning method might be necessary at
this stage.
Figure 1: Typical pipeline of Crime Linkage assessment.
In summary, we can observe three steps in the CL process:
1. Identifying MO attributes from each criminal case.
2. Calculating similarity measures between each pair of crimes.
3. Using Machine Learning to classify whether this pair is considered linked or unliked.
More details on each step are provided in the following subsections:
3.1 MO variables
In this first step, the MO variables are either pre-defined and pulled from a database or extracted from crime narratives or
reports. Although techniques vary, most approaches treat the MO as a set of finite attributes that characterize a particular
crime. This may explain why most of the work has focused on property crimes, as these types of crimes tend to have
more standardized criminal behaviors compared to crimes against persons [
5
]. This retrieval step is crucial because
some level of expertise is necessary to understand which variables influence crime linkage analysis. Human coders, who
either insert variables into the database (such as ViCAP [
104
]) or extract them from police data, are typically subjected
to inter-rater reliability assessments to measure agreement in their results. As mentioned in Section 2, this has been a
persistent issue in the domain, but it is important to address the significance of reliability in data-driven approaches for
crime linkage systems [
12
]. In Machine Learning method the extraction can be done automatically, typically with the
support of Natural Language Processing techniques that retrieves variables from reports, based on textual context or
6
keywords. Lin et al. [
105
] proposed a feature selection process by calculating a index separability in order to prune data
to better results in crime linkage. Solomon et al. [
46
] converted typical police questions into sentence embeddings to
find MOs in massive amount of crime narratives.
Common attributes include the location and time of the event, as well as behavioral characteristics such as methods
of trespassing, the weapon used, the number of victims, the type of target location, and others. These variables can
be numerical, binary, categorical, or based on descriptive words. This step is important because the type of variable
dictates the type of similarity measure used in the following step. It is worth noting that a categorical attribute will need
to be converted to hot encoding or other sort of numbered format in order to serve as input into a ML algorithm.
As illustrated in the Appendix table (Section 6), a wide range of MO choices are available for selection in a given
model. Nevertheless, location and time, or more precisely, spatiotemporal differences, emerged as some of the most
consistent MO characteristics in the literature. Bennell et al. [
40
] demonstrated that even using just distance as a factor,
without incorporating additional MO features, resulted in high accuracy scores. Similarly, Tonkin et al. [
73
] showed that
both time and location are sufficient predictors for linking crimes, showing that residential burglaries and commercial
robberies tend to happen in regions familiar to the criminal in short lapses of time. Beyond these, there appears to be
little consistency among other MO attributes, as they tend to vary significantly depending on the type of crime and the
data available.
Many authors thus consider a crime as a vector of MOs. In this way, each crime case can be described as vector C =
{
X1, X2, ...Xk
}, where Xrepresents a specific MO or crime feature, and kis the total number of features or dimensions.
Following this approach, it is essential to map and order every MO attribute correctly. This includes having policies to
handle missing values or when certain behaviors are not present. As it will be shown in the next step, each MO needs to
be compared pairwise, making the order of the vector crucial, as a similarity cannot be calculated between different
types of MOs. For example, it is illogical to compare the location of an event with the time of the event. Some studies
have mapped MOs based on the degree of a particular behavior. For instance, a crime analyst might evaluate the level
of violence in a particular case. This approach is common when working with fuzzy logic models, where a degree of
membership is required to characterize sets of MOs [49, 79, 78].
3.2 Similarity measures
As mentioned earlier, similarity scores are used to compare how similar one crime is to another. Most studies use a
single method to contrast each feature in the crime vector, while others employ a combination of methods depending on
the feature type. The most common measurements are as follows:
3.2.1 Numerical attributes
One of the simplest methods for measuring similarity between features, particularly with quantitative attributes, is
to use the absolute distance or difference between pairs of attributes. This is rather a dissimilarity score and thus a
value close to zero indicates greater similarity. In crime linkage studies, common numerical features are time and
location. Researchers often calculate similarity based on the distance between two points in a map (using Euclidean
distance, for example) or the difference in time (measured in days, hours, or exact values). This approach is supported
by criminology literature, which suggests that crimes committed by the same offender are likely to occur in close
proximity both spatially and temporally [58, 106, 107].
3.2.2 Categorical attributes
There are many different was to calculate similarity between categorical values [
108
]. The most common technique
among CL studies is the Jaccard’s coeficient (J) [
109
]. Here the coefficient measures not only a single behavior but a
collection of them and calculates the ratio between common features and the total number of features (Equation 1).
The range of the Jaccard coefficient spans from 0, indicating no common features, to 1, indicating identical vector
dimensions.
J(C1, C2) = |C1∩C2|
|C1∪C2|(1)
where C1and C2are the sets being compared.
As an example, imagine that we have a case where MOs are “Entrance by Window”, “House without Fences”, and
“Criminal Record = Yes”, and another where we have “Entrance by Door”, “House without Fences”, and “Criminal
Record = Yes”. Comparing these two cases, Jaccard measurement between them would be 2/3.
7
Oftenly studies have focused more on the joint presence of features to characterize behavioral consistency, meaning
the same MO variable is found in both crimes. This consideration is important when working with police data, as
the absence of a feature does not necessarily indicate it did not occur — it may simply not have been reported due
to investigative limitations [
58
]. However, alternative methods for applying the Jaccard coefficient in crime linkage
have been explored. In [
68
], the authors tested Bayesian analysis and introduced two additional methods to evaluate
performance. One method considered joint absence (Jaccard = 1 if the behavior is either present or absent in both
crimes), while the other method treated absence as a new similarity evaluation (Jaccard = 1 if the behavior is present,
Jaccard = 2 if absent in both crimes, and 0 otherwise). The authors demonstrated that the latter approach achieved better
performance than traditional methods.
Although Jaccard’s method is straightforward, it does not account for variations in the levels or degrees within a feature.
For example, in burglary scenarios, a “breaking window” MO is likely more similar to "entering through the window"
than to “forcing door entry”. However, Jaccard’s method does not consider this nuance, treating all features with equal
weight regardless of their specific characteristics. Furthermore, police reports may not always provide objective feature
selection, which can impact the effectiveness of the method.
Similar to Jaccard, the Dice coefficient (D) (also known as Sørensen-Dice Index) was also found in our pool of studies.
The main difference here is that it places more emphasis on the intersection of a pair of crimes. In [
47
], Dice was used
to measure the similarity between two documents by comparing root node attributes. The Dice formula is given by:
D(C1, C2) = 2· |C1∩C2|
|C1|+|C2|(2)
Another method for calculating similarity between categorical variables involves constructing a hierarchical tree with
domain-specific categories. This approach captures the subtleties of similarity by grouping behaviors that fall under the
same domain. For instance, Chi et al. [
57
] developed hierarchical features by adjusting the weights of the leaves based
on expert input. By taken into account hierarchical levels of crime behavior, some authors have utilized the taxonomic
index (
∆
s). This approach treats MOs as a tree of parameters, and thus the similarity is calculated considering this tree
structure. In fact, Melnyk et al. [
110
] compared the two main coefficients for behavioral linkage analysis: Jaccard’s
coefficient (J) and the taxonomic similarity index (
∆
s). They found that while the taxonomic similarity index provides
good interpret ability of features, Jaccard’s coefficient, due to its simplicity and sensitivity to the data, is generally more
suitable for crime linkage analysis. One disadvantage of hierarchical methods is that it requires some kind of domain
knowledge to build the tree and group attributes.
Another interesting method for calculating categorical similarity is based on the frequency of specific features. Goodall’s
similarity measure (G) operates on the principle that the similarity between two items with the same categorical value
should be higher if that value is rare in the dataset. The rationale is that items sharing a rare characteristic are more
likely to be similar [
111
]. This technique was utilized in NYPD’s Patternizr to analyze structured categorical data [
99
].
According to Wang et al. [111], a simplified version of Goodall coefficient for attribute jcan be given by:
Gj(C1, C2) = (1−Pq∈Qp2
j(x)if C1j=C2j=x
0if C1j=C2j
(3)
where
p2
j(x) = nx(nx−1)
N(N−1) ,
with nxthe number of times xis observed in the collection of Ncrimes.
3.2.3 Cosine Similarity
Considering NLP studies on crime linkage, a typical similarity measure is the cosine similarity (also found in the
literature as normalization inner product or word-to-word similarity), which has been calculated between word or
sentence embeddings. Embeddings are the product of the converstion of textual data into multidimensional, using
techniques such as Word2Vec [
87
]. These vectors capture the semantic relationships between words, and thus, similar
words are likely to be close to each other in this multidimensional space. Cosine similarity is then a suitable method
to calculate the similarity between two words. This approach can account for word variations existing in categorical
attributes, potentially offering a better solution than the Jaccard coefficient in some situations. The Cosine Similarity
formula between two vectors, A and B, can be see in Equation 4.
8
Cosine Similarity(A, B) = A·B
∥A∥∥B∥=Pn
i=1 AiBi
pPn
i=1 A2
ipPn
i=1 B2
i
(4)
As an example, consider the words “gun”, “firearm” and “knife”. Using the fastText embedding technique [
95
], we can
plot the three words and calculate the cosine similarity between them (See Figure 2). It is expected that the first two
words will be more similar than with the third one, since they share more closely semantic meanings.
Figure 2: 3 dimensional representation of the words words "gun", "firearm", and "knife". Similarity between ’gun’ and
’knife’: 0.5895; Similarity between ’gun’ and ’firearm’: 0.7925; Similarity between ’knife’ and ’firearm’: 0.5644.
Word embeddings serve as suitable inputs for machine learning models, as they inherently operate on numerical inputs.
Various techniques exist for embedding words, including fundamental methods like Bag of Words (BoW) [
112
] and
Term Frequency-Inverse Document Frequency (TF-IDF) [
113
]. However, the popularity of this technique surged
following Google’s introduction of Word2Vec [
87
]. BoW and TF-IDF are well-established techniques in NLP used
for analyzing textual documents. BoW represents words in fixed-sized dimensions based on their frequency within a
corpus, making it the simplest form of embeddings [
112
]. On the other hand, TF-IDF considers words that are frequent
within a document but rare across the entire corpus, effectively highlighting important terms while mitigating the impact
9
of common “stopwords” such as pronouns, prepositions, and connectors [
113
]. However, these methods primarily focus
on word frequency and lack the ability to capture contextual relationships within the generated embeddings [114].
As mentioned, a significant breakthrough in representing words as numerical vectors came with Word2Vec, developed
by Google researchers [
115
]. This technique, trained on vast corpora including Wikipedia, Google News Articles,
the Web, and Books and Literature, utilizes a shallow neural network. The output of the hidden layer provides word
embeddings that capture semantic meaning. Remarkably, this allows for mathematical operations between words, such
as “KING - QUEEN = WOMAN” [
116
]. Being a pretrained model, Word2Vec can be used as an index of words and
their embeddings. However, despite its revolutionary impact on natural language processing tasks, it is important to note
that words embedded using Word2Vec may inherit biases from the texts on which they were trained [
102
]. Manzini et
al. [
102
] showed how Word2Vec can make racial associations such as Black-Criminal, Asian-Laborer, and Jew-Greedy.
Our survey analysis revealed that much of the research on CL relies on outdated embedding techniques. While more
advanced methods like BERT [
117
] and large language models (LLMs) [
118
] are now considered state-of-the-art
for encoding textual data, we found that many authors still primarily use traditional approaches such as TF-IDF and
Word2Vec.
3.3 Machine Learning Classification
The use of machine learning techniques in crime linkage is primarily aimed at classifying whether certain features of a
crime indicate a linked crime or associating a crime with a specific crime series, as it can be seen in Figure 3. These
predictions are made after training a chosen algorithm on a large dataset. When data was scarce, simpler algorithms
such as linear regression or decision trees were oftenly used. In our study, we observed that most CL approaches input
similarity scores between pairs of crimes into the algorithms, rather than using the MO attributes directly. The choice of
input features depends on the specific elements the researcher deems relevant for capturing crime patterns. For instance,
Chohlas-Wood et al. [
99
] incorporated historical arrest data alongside similarity scores, allowing the model to learn
from this additional information as well.
Figure 3: Typical methodology used in Crime Linkage with Machine Learning algorithms.
Below is a brief summary of the most common algorithms found in our literature:
•
Logistic Regression (LR): Logistic Regression is a statistical model primarily used for binary classification
tasks. It predicts the probability that a given input belongs to a particular class using the logistic function
(sigmoid function). This results in outputs that range between 0 and 1, representing class probabilities. It is
simple, interpretable, and works well for linearly separable data, making it a popular choice for problems like
spam detection and medical diagnosis.
•
Decision Tree (DT): A Decision Tree is a non-parametric supervised learning method used for classification
and regression. It splits the data into subsets based on feature value tests, creating a tree-like model of
decisions and their possible consequences. This model is easy to understand and visualize, as it mimics human
decision-making processes. However, decision trees can be prone to overfitting, especially with complex
datasets, although they handle both numerical and categorical data effectively.
•
Random Forest (RF): Random Forest is an ensemble learning method that constructs multiple decision trees
during training and outputs the mode of the classes for classification or the mean prediction for regression of
the individual trees. This approach reduces overfitting compared to individual decision trees and enhances
10
model robustness and accuracy. It is particularly effective in handling large datasets and high-dimensional
spaces, making it suitable for a variety of applications, from financial modeling to image classification.
•
Support Vector Machine (SVM): Support Vector Machine (SVM) is a supervised learning algorithm used
for both classification and regression tasks. It finds the hyperplane that best separates the data into classes in
high-dimensional space, with an emphasis on maximizing the margin between different classes. SVMs are
effective in high-dimensional spaces and can handle non-linear data through the use of kernel tricks. They are
particularly useful in applications like text classification, image recognition, and bioinformatics.
•
k-Nearest Neighbors (k-NN): k-Nearest Neighbors (k-NN) is an instance-based learning algorithm used for
classification and regression. It classifies a data point based on the majority class of its k-nearest neighbors in
the feature space. The simplicity of k-NN makes it easy to implement, but it is sensitive to the choice of k
and the distance metric used. It is particularly useful for pattern recognition tasks, such as handwriting digit
classification and recommendation systems.
•
Neural Networks (NN): Neural Networks are a set of algorithms modeled after the human brain, used for a
wide range of tasks, including classification, regression, and complex tasks like image and speech recognition.
They consist of layers of interconnected nodes (neurons) that can capture and model complex patterns and
non-linear relationships in data. Neural Networks are highly flexible and powerful, but they require large
amounts of data and significant computational resources, making them suitable for applications in artificial
intelligence and deep learning.
•
Gradient Boosting Decision Tree (GBDT): Gradient Boosting Decision Tree (GBDT) is an ensemble learning
method that builds multiple decision trees sequentially. Each new tree is trained to correct the errors made by
the previous ones, and the trees are combined to make a final prediction. This sequential approach reduces
overfitting and increases model accuracy, making GBDT highly effective for a variety of tasks, including
ranking, classification, and regression. Careful tuning of parameters like learning rate and the number of trees
is essential for optimal performance.
In supervised machine learning, the training process requires labeled data. In crime linkage studies, this means that it
must be known in advance whether crimes are linked. In practice, this requires that an offender responsible for multiple
crimes has already been convicted and that this information is recorded in a relevant database. However, in many cases,
a significant portion of the dataset consists of unlabeled data—crimes that are still under investigation. While this
unlabeled data can be used to predict whether a crime is linked or belongs to a known offender series, it cannot be used
for training. Another major challenge in supervised learning is the class imbalance. Ideally, the model needs examples
of both linked and unlinked crimes for training, but in reality, there is often a significant disparity between these two
classes, with far more unlinked cases than linked ones. This imbalance can cause the model to favor the majority class,
leading to biased predictions. We will explore this issue in greater detail in the following section.
While crime linkage is typically considered as a binary classification problem, Li and Shao [
84
] demonstrated the
suitability of a three-way analysis. In addition to the linked and unlinked classes, they introduced a third category called
"boundary". This category encompasses cases that are difficult to classify definitively, which are then referred to a
human crime analyst for further evaluation. The challenge lies in minimizing the number of cases that fall into this
boundary region, as overloading analysts with too many cases is undesirable, while still maintaining high uncertainty
within this region. Increasing the number of cases in the boundary region can enhance the measurement of uncertainty
but also places greater demands on human analysis.
Although less common, some studies have approached the crime linkage problem using unsupervised methods. This
approach is appealing in the crime linkage context because it addresses the inherent issue of class imbalance and
takes out the burden of labeling data. Some studies have explored clustering methods to analyze associations between
different crime series [
76
,
75
,
77
,
48
]. However, designing a model to identify connections between crimes without
knowing which cases are actually linked presents a big challenge. For instance, Zhu et al. (2022) modeled crime
linkage using a Hawkes process, where incidents are governed by an intensity function that decays over time [
6
]. This
approach, inspired by the behavior of earthquakes, posits that crimes can trigger other crimes, creating a sort of network
of linked or associated events. This is somewhat analogous to the predictive policing software PredPol, which employs
the Epidemic-Type Aftershock Sequence (ETAS) model [119].
3.4 Evaluation Metrics
Within machine learning studies, it is common to separate data to test the trained model. As consequence, the researcher
needs to selected an evaluation method to confirm whether his model provides a good performance. The thresholds to
determine whether the model had a good performance will depend on the type of crime and the metrics used. There
11
several evaluation metrics for machine learning and statistics algorithms and this study will present the most commons
found in the literature.
The Area Under the Curve (AUC) is the most common evaluation metric in purely statistical CL approaches [
120
].
Also known as the Receiver Operating Characteristic (ROC) curve, it is particularly useful for imbalanced datasets, as it
plots the True Positive Rate (TPR) against the False Positive Rate (FPR) [
61
,
68
]. TPR, also known as sensitivity or
recall, is the proportion of actual positives that are correctly identified by the model. It is calculated as TPR = TP / (TP
+ FN), where TP represents true positives and FN represents false negatives. FPR, on the other hand, is the proportion
of actual negatives that are incorrectly identified as positives by the model. It is calculated as FPR = FP / (FP + TN),
where FP represents false positives and TN represents true negatives. According to Swets (1988), an AUC over 70% is
considered sufficient for a good performance, however this will most likely depend on the number of data points and
type of crime [
121
]. Although AUC/ROC is the preferable choice among author, many agree that the evaluation process
should be tested in more than one technique [73].
Moving forward to machine learning methods, we have other common evaluation metrics, such as accuracy, precision,
recall, and f1-score, specially used when in supervised fashion. Accuracy is one of the most straightforward metrics
used to evaluate the performance of a classification model. It is the ratio of correctly predicted instances (both positives
and negatives) to the total number of instances. Precision is the ratio of correctly predicted positive observations to the
total predicted positives, which thus takes the formula TP / (TP + FP). The F1-score is the harmonic mean of precision
and recall, providing a balance between the two, which can be written as 2 x (Precision x Recall) / (Precision + Recall).
The choice among these metrics, will depend on how the researcher wants to measure performance. For instance, in
cases where the classes are imbalanced, accuracy can be misleading. The F1-score considers both false positives and
false negatives, providing a more informative measure of a model’s performance.
It is also common to use k-fold cross validation when evaluating a model. In this method, the model is trained and
evaluated k times, with each iteration using a different fold as the test set and the remaining k-1 folds as the training set.
The performance metrics from each iteration are then averaged to provide a final evaluation measure. This approach
aims to balance performance by addressing both bias and variance. A specific case of k-fold cross-validation is
Leave-One-Out Cross-Validation (LOOCV), where k is equal to the number of data points in the dataset. While LOOCV
can be particularly useful for small datasets, it is computationally expensive due to the high number of iterations
required.
4 Challenges in Crime Linkage
Applying machine learning to large-scale crime data is valuable for uncovering the nuances and complexities of criminal
behavior patterns. However, this field presents several challenges that require careful consideration. We have identified
four key challenges associated with working with these approaches. A related work comes from Yang et al. [
122
],
although their survey is based on a very short pool of papers.
4.1 Imbalanced Data
Probably the most common issues encountered in the literature when using big amounts of data to crime linkage is
the problem of data imbalance. In crime series analysis, most available data points are typically unlinked. Although
recidivism is not rare in criminal behavior, this imbalance can arise for various reasons, such as ongoing investigations
where no offender has been identified yet or limitations in police infrastructure, such as jurisdictional constraints, which
hinder the ability to connect offenders. Consequently, when preparing the data, there is often a decision to determine
some cases as labeled (linked or not) and others as unlabeled. It is anticipated that there will be fewer serial crimes
compared to one-off cases.
In these imbalanced datasets, machine learning models tend to favor the majority class, typically the non-linked cases,
due to several factors. First, most algorithms aim to minimize overall error during training, which can lead them to
focus on the majority class to achieve higher accuracy. When the vast majority of cases are non-linked, the model can
achieve a seemingly high accuracy rate simply by predicting all cases as non-linked, as the model is trained to minimize
error across the dataset. This tendency to favor the majority class can result in poor performance when predicting the
minority class (linked cases). For example, a model may produce an accuracy of 90% in a dataset where 95% of the
cases are non-linked, yet it may fail to identify any linked cases at all. This scenario highlights a critical issue: accuracy
alone is an inadequate metric for evaluating model performance in imbalanced contexts. Instead, other metrics such as
precision, recall, F1-score, and the area under the receiver operating characteristic curve (AUC-ROC) provide a more
nuanced understanding of model performance, especially for the minority class.
12
Based on our review, it appears that earlier studies utilizing logistic regression (LR) models evaluated their methods
using the AUC/ROC metric, addressing the imbalance issue by adjusting their false alarm thresholds according to expert
opinion [
68
]. However, these studies typically involved significantly fewer data points compared to more advanced
machine learning methods, leaving the problem of data imbalance an open area for further research.
There are simple approaches to addressing the data imbalance problem, such as undersampling the dataset by discarding
the surplus of non-linked cases to match the number of linked samples [
123
]. However, this method can result in
a dataset so small that the use of machine learning becomes unjustifiable. Other authors have proposed alternative
methods to handle the imbalance issue. These include oversampling techniques, such as SMOTE (Synthetic Minority
Over-sampling Technique), which generate synthetic samples for the minority class, and cost-sensitive learning, where
different misclassification costs are assigned to the majority and minority classes to mitigate the bias toward the majority
class [
124
]. These techniques aim to improve the performance and reliability of machine learning models in the context
of imbalanced datasets.
Li et al. (2020) demonstrated that using the Information Granule Random Forest (IGRF) method can reduce the number
of non-serial pairs in the dataset [
86
]. Their approach involves altering the granularity of the data by clustering similar
cases using the k-nearest neighbors algorithm. This allows the model to train on a smaller dataset without significant
loss of information from the original data. However, the improvements in performance are modest, with an increase in
the true positive rate ranging from 2-5%.
The same authors later proposed a new solution using combined blocking to address the imbalance problem [
7
]. This
approach involves creating behavioral keys, which are blocks of cases sharing similar and rare behaviors or MOs. This
method results in a new dataset with fewer non-serial crimes. They demonstrated that by retaining 98% of the serial
cases and eliminating 65% of the pairs, the accuracy, recall, and F1 score improved compared to not using combined
blocking. While accuracy did not change significantly, recall and precision showed substantial improvement.
4.2 Bias
Although the issue of bias is not extensively discussed in the specific literature, particularly regarding machine learning
techniques, we found evidence that this challenge still demands significant attention. Bias within machine learning is a
major concern, especially with supervised methods, which can amplify bias during the labeling process. Studies have
revealed biases in police data, particularly against minority groups [
125
]. Additionally, police underreporting can result
in data that does not accurately reflect the true situation [
126
]. In the crime linkage process, it is crucial to evaluate
whether crime associations are driven by ethnicity or other biased attributes, ensuring that the analysis remains fair and
unbiased. Especially, there is the necessity of attention to which type of data is being served as input to ML models.
Bias can be inserted in ML application in any or all of these three stage: preprocessing (trained data), in-processing
(withing the algorithm itself), and post-processing (after training) [
127
,
128
]. If the police data used to train models for
crime likange is biased, then the result will be a biased model. An example of potential preprocessing bias can be found
in [
45
] where the author used 911 call narratives to train their model, using TF-IDF to extract keywords from the texts.
The word “black” was among the keywords identified by their method, which would later to be used as a particular
attibute to describe that crime. Although extracting keywords from documents seems a good approach to characterize a
document, in a police context, it is important to either debias the dataset first or evaluate how potential biased keywords
are affecting your results (and workaround them).
Chohlas-Wood and Levine [
99
] from the New York Police Department (NYPD) developed an application called
Patternizr, which they claim is bias-free. This tool utilizes a blend of structured information, including location, crime
subcategory, modus operandi details (weapon, victim count, etc.), suspect information (weight, height, etc.), and
unstructured data such as crime narrative complaints. However, while the authors assert fairness by excluding the
race attribute, other studies have demonstrated that this approach does not eliminate bias [
83
,
100
]. Furthermore, they
utilized Word2Vec which has also been demonstrated to carry potential bias in its embeddings [
101
,
102
]. Although the
tool is not public available, it can be considered an example of both pre and in-processing bias.
The best way to build ML models to support criminal decisions is to make sure they are interpretable and explainable
[
129
]. This way the nuances of how the model operates can be assessed and thus corrected, mitigate or even discarded
if necessary. In the era of LLMs and Generative AI it has becoming each day more challenging to explain some of the
models behaviors [
130
]. However, for a practical usuage in the criminal justice it is important to garantee interpretability
in order to maintain fair and court acceptable results.
13
4.3 Labeled Data
We mentioned that labeling data can add bias into the model. Besides dealing with this problem, labeling data can
also be quite cumbersome. In CL scenarios labeling is required when determining the MO attributes and if cases are
connected or not. The stage where MO needs to be extracted to create MO vectors has counted on machine tecnhiques
or purely human manual work. The latter case requires some level of expertise to appropriately code the behavior that
will feed the model. The code evaluation and reliability in data-driven approaches for crime linkage systems has been
topic of extensive research [12].
Martineau and Corey [
13
] conducted a study to assess the agreement level among Canadian police officers in coding
crime linkage scenarios, revealing a 38% agreement for homicide cases and 25% for sexual assault cases. This
underscores the subjective nature of human evaluation in criminal investigations. Furthermore, the decision-making
process for crime linkage itself has sparked debate due to the absence of standardized training or evaluation methods for
this complex and subjective task. For example, Bennell et al. [
24
] found that students outperformed police personnel in
associating crimes with serial offenders, with a logistic regression model outperforming humans. The Belgian version
of ViCLAS was also examined in a study by Davies et al. [
131
], revealing ongoing improvements in the coding process
but highlighting the need for further enhancements, such as standardization of inter-rater evaluations. In another study,
Pakkanen et al. (2012) found that student subjects correctly linked 61% of cases [
132
], showing a slight difference from
the work by Santtila et al. [
70
], where the model achieved 63% accuracy. In a positive note, the former study did not
find evidence of bias influencing the coding process.
There is also the problem of each variables to select. Some argue that individual behavior more accurately maps MO
for crime linkage. Authors in [
69
] advocated for the use of individual behaviors in homicide data, but their study
suggested that determining which behaviors are pivotal for crime linkage poses a significant challenge. Interestingly,
their findings revealed that nearly identical accuracy levels could be achieved using just 15 out of the 90 behavior
categories. Other studies have used grouping tecniques to agregate variables and feed the model with these cluster.
Another example with homicides was done by [
9
]. The authors hierarchically categorized 39 "crime scenes behaviors"
into five distinct categories: Plan, Control, Ritual, Impulsivity, and Ritual (later subdivided into two groups of organized
and disorganized). Interestingly, with this approach Melnyk et al.’s work in [
5
] achieved a high accuracy rate (96%) for
detecting crime linkage.
4.4 Crime Type
The literature on crime linkage encompasses a wide array of methods and evaluation metrics, alongside varied datasets
employed to establish connections between criminal incidents. Numerous studies have showcased CL outcomes,
particularly in cases involving property crimes and sexual offenses (not necessarily sexual homicides). However, as
mentioned, it is notable that the research community has predominantly focused on linking burglaries and robberies,
with comparatively less attention directed towards other types of crimes [
5
]. It is still an open research question of how
crime linkage insights can be achieved on other types of crime, especially violent crimes, such as murders.
It is valid to reason that different crime types will have different MO variables. Nevertheless, which variables best maps
or characterizes a crime is still under research experiments. Reducing a crime description into a finite and define set of
MO attributes is a challenging process, which remains an open area of study. Researchers need to first identify attributes
that characterize each individual crime case and then just use a pool of them for the crime linkage goal. As mentioned,
most of the CL studies used proprietary crime data, which, due to its nature, can be considered suitable for this mapping
behavior into a set of standarized variables. However, this approach imposes constraints on the breadth of potential
criminal behaviors, posing significant challenges, especially in the analysis of violent crimes like homicides [
133
]. For
example, religious ritual killings may display unique MOs that diverge from standardized lists of MO characteristics
[
34
]. In way, selecting MOs to a finite and discrete multidimensional vector somewhat contradicts the assumptions
adopted fro crime linkage since very rare attributes might not be captured when extracting MOs from criminal data.
In [
46
] for example, a set of 40 MOs was particularly searched when applying to cosine similarity to their narratives
datasets. Further studies are required to capture the dynamics and flexibility that can shape a criminal behavior.
While this flexibility may add complexity to the CL task, applying machine learning methods could help capture
the nuances of how various MO segments influence crime associations. It is important to use models that allow for
interpretability, enabling us to understand the impact of different inputs on the outputs. Although domain knowledge
remains important, machine learning has the potential to break down the complexity and better understand criminal
behavior patterns.
14
5 Conclusions
Our survey highlights how crime linkage studies have evolved with the increasing availability of data. We observed a
clear shift from traditional statistical models to more advanced machine learning approaches, driven by the need for
more sophisticated techniques to handle the growing data volume. Notably, there has been a recent surge in interest in
using crime linkage decisions based on police textual data, such as crime narratives, police reports, and 911 calls. This
shift has introduced NLP techniques into the field, attracting the attention of computer scientists to an area previously
dominated by criminologists, psychologists, and statisticians.
This study analyzed key works on CL that utilized machine learning methods, exploring how different authors
approached the topic. In doing so, we identified a clear framework that was consistently applied across most studies.
We broke this framework down into distinct steps, each of which was further detailed to enhance understanding.
This structure can help establish a common language among researchers from different fields and encourage broader
contributions to the discipline, particularly when applying data-driven methods.
We also observed that the application of machine learning methods to CL is still far from being fully adopted in
real-world scenarios. Several key challenges in the literature must be addressed to enable ethical and practical
implementation. Four primary challenges stood out: imbalanced data, biased data, the availability and reliability of
labeled data, and the crime type influences. While there are certainly more obstacles, these were the most prominent in
the context of artificial intelligence approaches. Moving forward, we believe researchers need to be mindful of these
challenges and take them into account in their studies to advance the field ethically.
Despite the challenges, our survey of studies shows that machine learning holds great promise for enhancing CL
processes. ML offers an efficient way to break down complex information and uncover patterns of criminal behavior
that would be difficult or too labor-intensive to identify through human effort alone. As a result, both law enforcement
and researchers can benefit from these models by gaining valuable insights for CL decisions and integrating them into
police investigations. The diverse approaches observed suggest that collaboration between experts across domains
(criminologists, computer scientists, and law enforcement) has the potential to yield more impactful results.
References
[1]
Edward J Green, Carl E Booth, and Michael D Biderman. Cluster analysis of burglary m/os. Journal of Police
Science & Administration, 1976.
[2]
Michael D. Porter. A statistical approach to crime linkage. The American Statistician, 70(2):152–165, 2016.
ISBN: 0003-1305 Publisher: Taylor & Francis.
[3]
Jessica Woodhams, Clive R. Hollin, and Ray Bull. The psychology of linking crimes: A re-
view of the evidence. Legal and Criminological Psychology, 12(2):233–249, 2007. _eprint:
https://onlinelibrary.wiley.com/doi/pdf/10.1348/135532506X118631.
[4]
Kari Davies and Jessica Woodhams. The practice of crime linkage: A review of the literature. Journal of
Investigative Psychology and Offender Profiling, 16(3):169–200, 2019. ISBN: 1544-4759 Publisher: Wiley
Online Library.
[5]
Bryanna Fox and David P. Farrington. What have we learned from offender profiling? A systematic review and
meta-analysis of 40 years of research. Psychological Bulletin, 144(12):1247–1274, 2018. Place: US Publisher:
American Psychological Association.
[6]
Shixiang Zhu and Yao Xie. Spatiotemporal-textual point processes for crime linkage detection. The Annals of
Applied Statistics, 16(2):1151–1170, 2022. ISBN: 1932-6157 Publisher: Institute of Mathematical Statistics.
[7]
Yusheng Li and Xueyan Shao. A supervised machine learning framework with combined blocking for detecting
serial crimes. Applied Intelligence, 52(10):11517–11538, 2022. ISBN: 0924-669X Publisher: Springer.
[8]
John E. Douglas and Corinne Munn. Violent crime scene analysis: Modus operandi, signature, and staging. FBI
L. Enforcement Bull., 61:1, 1992. Publisher: HeinOnline.
[9]
Tamara Melnyk, Craig Bennell, Donna J. Gauthier, and Donald Gauthier. Another look at across-crime
similarity coefficients for use in behavioural linkage analysis: an attempt to replicate Woodhams, Grant,
and Price (2007). Psychology, Crime & Law, 17(4):359–380, May 2011. Publisher: Routledge _eprint:
https://doi.org/10.1080/10683160903273188.
[10]
Jessica Woodhams, Ray Bull, and Clive R. Hollin. Case Linkage. In Richard N. Kocsis, editor, Criminal
Profiling: International Theory, Research, and Practice, pages 117–133. Humana Press, Totowa, NJ, 2007.
15
[11]
David Canter. Psychology of offender profiling. Handbook of psychology in legal contexts, 1995. Publisher:
John Wiley & Sons.
[12]
Craig Bennell, Brent Snook, Sarah MacDonald, John C. House, and Paul J. Taylor. Computerized crime linkage
systems: A critical review and research agenda. Criminal Justice and Behavior, 39(5):620–634, 2012. ISBN:
0093-8548 Publisher: Sage Publications Sage CA: Los Angeles, CA.
[13]
Melissa M. Martineau and Shevaun Corey. Investigating the reliability of the violent crime linkage analysis
system (ViCLAS) crime report. Journal of Police and Criminal Psychology, 23:51–60, 2008. ISBN: 0882-0783
Publisher: Springer.
[14]
Dario Bosco, Angelo Zappalà, and Pekka Santtila. The admissibility of offender profiling in courtroom: A review
of legal issues and court opinions. International Journal of Law and Psychiatry, 33(3):184–191, July 2010.
[15]
Gérard N. Labuschagne. The use of a linkage analysis as evidence in the conviction of the Newcastle serial
murderer, South Africa. Journal of Investigative Psychology and Offender Profiling, 3(3):183–191, 2006. _eprint:
https://onlinelibrary.wiley.com/doi/pdf/10.1002/jip.51.
[16]
Caroline B. Meyer. Criminal Profiling as Expert Evidence? In Richard N. Kocsis, editor, Criminal Profiling:
International Theory, Research, and Practice, pages 207–247. Humana Press, Totowa, NJ, 2007.
[17]
R. D. Keppel. Signature Murders: A Report of the 1984 Cranbrook, British Columbia Cases. J. Forensic Sci.,
45(2):500–503, March 2000. Publisher: ASTM International.
[18]
GN Labuschagne. The use of linkage analysis evidence in serial offense trials. Crime linkage: Theory, research,
and practice, pages 197–224, 2014.
[19]
David V Canter, Laurence J Alison, Emily Alison, and Natalia Wentink. The organized/disorganized typology of
serial murder: Myth or model? Psychology, Public Policy, and Law, 10(3):293, 2004.
[20]
Tom Pakkanen, Pekka Santtila, and Dario Bosco. Crime linkage as expert evidence. Crime linkage: Theory,
research, and practice, page 225, 2014.
[21]
Steven A Egger. A working definition of serial murder and the reduction of linkage blindness. Journal of Police
Science & Administration, 1984.
[22]
Steven A. Egger. A working definition of serial murder and the reduction of linkage blindness. Journal of Police
Science & Administration, 12(3):348–357, 1984. Place: US Publisher: Int’l Assn of Chiefs of Police, Inc.
[23]
Peter I. Collins, Gregory F. Johnson, Alberto Choy, Keith T. Davidson, and Ronald E. Mackay. Advances in
violent crime analysis and law enforcement: The canadian violent crime linkage analysis system. Journal of
Government Information, 25(3):277–284, May 1998.
[24]
Craig Bennell, Sarah Bloomfield, Brent Snook, Paul Taylor, and Carolyn Barnes. Linkage analysis in cases of
serial burglary: Comparing the performance of university students, police professionals, and a logistic regression
model. Psychology, Crime & Law, 16(6):507–524, 2010. ISBN: 1068-316X Publisher: Taylor & Francis.
[25]
Amy Burrell and Ray Bull. A Preliminary Examination of Crime Analysts’ Views and Experiences of Compara-
tive Case Analysis. International Journal of Police Science & Management, 13(1):2–15, March 2011. Publisher:
SAGE Publications Ltd.
[26]
LEE RAINBOW. A practitioner’s perspective. Crime linkage: Theory, research, and practice, page 173, 2014.
ISBN: 146650675X Publisher: CRC Press.
[27] Louis B. Schlesinger. Serial offenders: Current thought, recent findings. CRC Press, 2000.
[28]
Robert D. Keppel and William J. Birnes. Serial violence: Analysis of modus operandi and signature characteris-
tics of killers. CRC press, 2008.
[29]
Robert D. Keppel, Joseph G. Weis, Katherine M. Brown, and Kristen Welch. The Jack the Ripper murders:
A modus operandi and signature analysis of the 1888–1891 Whitechapel murders. Journal of Investigative
Psychology and Offender Profiling, 2(1):1–21, 2005. ISBN: 1544-4759 Publisher: Wiley Online Library.
[30]
Kaeko Yokota, Hiroki Kuraishi, Taeko Wachi, Yusuke Otsuka, Kazuki Hirama, and Kazumi Watanabe. Practice
of offender profiling in Japan. International Journal of Police Science & Management, 19(3):187–194, 2017.
ISBN: 1461-3557 Publisher: SAGE Publications Sage UK: London, England.
[31]
Derek B. Cornish and Ronald V. Clarke. Understanding crime displacement: An application of rational choice
theory. In Crime opportunity theories, pages 197–211. Routledge, 2017.
[32]
John E. Douglas, Ann W. Burgess, Allen G. Burgess, and Robert K. Ressler. Crime classification manual. (No
Title), 2012.
16
[33]
Dion Gee and Aleksandra Belofastov. Sex crime linkage: Sexual fantasy and offense plasticity. In Crime Linkage,
pages 60–81. Routledge, 2014.
[34]
Robert R. Hazelwood and Janet I. Warren. The relevance of fantasy in serial sexual crimes investigation. In
Practical aspects of rape investigation, pages 67–78. CRC press, 2016.
[35]
Benoit Leclerc, Jean Proulx, and Eric Beauregard. Examining the modus operandi of sexual offenders against
children and its practical implications. Aggression and violent behavior, 14(1):5–12, 2009. ISBN: 1359-1789
Publisher: Elsevier.
[36]
Craig Bennell and Natalie J. Jones. Between a ROC and a hard place: a method for linking serial burglaries
by modus operandi. Journal of Investigative Psychology and Offender Profiling, 2(1):23–41, 2005. _eprint:
https://onlinelibrary.wiley.com/doi/pdf/10.1002/jip.21.
[37]
Craig Bennell, Brent Snook, Sarah MacDonald, John C House, and Paul J Taylor. Computerized crime linkage
systems: A critical review and research agenda. Criminal Justice and Behavior, 39(5):620–634, 2012.
[38]
Craig Bennell, Rebecca Mugford, Holly Ellingwood, and Jessica Woodhams. Linking crimes using behavioural
clues: Current levels of linking accuracy and strategies for moving forward. Journal of Investigative Psychology
and Offender Profiling, 11(1):29–56, 2014.
[39]
Kari Davies and Jessica Woodhams. The practice of crime linkage: A review of the literature. Journal of
Investigative Psychology and Offender Profiling, 16(3):169–200, 2019.
[40]
Craig Bennell and David V Canter. Linking commercial burglaries by modus operandi: Tests using regression
and roc analysis. Science & Justice, 42(3):153–164, 2002.
[41]
Lucy Markson, Jessica Woodhams, and John W Bond. Linking serial residential burglary: Comparing the
utility of modus operandi behaviours, geographical proximity, and temporal proximity. Journal of Investigative
Psychology and Offender Profiling, 7(2):91–107, 2010.
[42]
Tamara Melnyk, Craig Bennell, Donna J Gauthier, and Donald Gauthier. Another look at across-crime similarity
coefficients for use in behavioural linkage analysis: An attempt to replicate woodhams, grant, and price (2007).
Psychology, Crime & Law, 17(4):359–380, 2011.
[43]
M Tonkin, P Santtila, and R Bull. The linking of burglary crimes using offender behaviour: Testing research
cross-nationally and exploring methodology. Legal and Criminological Psychology, 17(2):276–293, 2012.
[44]
Matthew Tonkin, Jessica Woodhams, Ray Bull, John W Bond, and Pekka Santtila. A comparison of logistic
regression and classification tree analysis for behavioural case linkage. Journal of Investigative Psychology and
Offender Profiling, 9(3):235–258, 2012.
[45]
Shixiang Zhu and Yao Xie. Spatial-Temporal-Textual Point Processes for Crime Linkage Detection, August
2021. arXiv:1902.00440 [cs, stat].
[46]
Adir Solomon, Amit Magen, Simo Hanouna, Mor Kertis, Bracha Shapira, and Lior Rokach. Crime linkage based
on textual hebrew police reports utilizing behavioral patterns. In Proceedings of the 29th ACM international
conference on information & knowledge management, pages 2749–2756, 2020.
[47]
Chih-Hao Ku and Gondy Leroy. A decision support system: Automated crime report analysis and classification
for e-government. Government Information Quarterly, 31(4):534–544, 2014.
[48]
Brian J Reich and Michael D Porter. Partially supervised spatiotemporal clustering for burglary crime series
identification. Journal of the Royal Statistical Society Series A: Statistics in Society, 178(2):465–480, 2015.
[49]
Fabrizio Albertetti, Paul Cotofrei, Lionel Grossrieder, Olivier Ribaux, and Kilian Stoffel. The crilim methodology:
crime linkage with a fuzzy mcdm approach. In 2013 European Intelligence and Security Informatics Conference,
pages 67–74. IEEE, 2013.
[50]
Nadeem Qazi and BL William Wong. An interactive human centered data science approach towards crime
pattern analysis. Information Processing & Management, 56(6):102066, 2019.
[51]
Kaeko Yokota and Shoichi Watanabe. Computer-based retrieval of suspects using similarity of modus operandi.
International Journal of police science & management, 4(1):5–15, 2002.
[52]
Jessica Woodhams and Kirsty Toye. An empirical test of the assumptions of case linkage and offender profiling
with serial commercial robberies. Psychology, Public Policy, and Law, 13(1):59, 2007.
[53]
Amy Burrell, Ray Bull, and John Bond. Linking personal robbery offences using offender behaviour. Journal of
Investigative psychology and offender profiling, 9(3):201–222, 2012.
[54]
Yu-Sheng Li and Ming-Liang Qi. An approach for understanding offender modus operandi to detect serial
robbery crimes. Journal of Computational Science, 36:101024, September 2019.
17
[55]
Yu-Sheng Li, Hong Chi, Xue-Yan Shao, Ming-Liang Qi, and Bao-Guang Xu. A novel random forest approach
for imbalance problem in crime linkage. Knowledge-Based Systems, 195:105738, 2020. ISBN: 0950-7051
Publisher: Elsevier.
[56]
Yusheng Li and Xueyan Shao. A supervised machine learning framework with combined blocking for detecting
serial crimes. Applied Intelligence, 52(10):11517–11538, 2022. ISBN: 0924-669X Publisher: Springer.
[57]
Hong Chi, Zhihong Lin, Huidong Jin, Baoguang Xu, and Mingliang Qi. A decision support system for detecting
serial crimes. Knowledge-Based Systems, 123:88–101, May 2017.
[58]
Matthew Tonkin, Tim Grant, and John W Bond. To link or not to link: A test of the case linkage principles using
serial car theft data. Journal of Investigative Psychology and Offender Profiling, 5(1-2):59–77, 2008.
[59]
Kari Davies, Matthew Tonkin, Ray Bull, and John W Bond. The course of case linkage never did run smooth: A
new investigation to tackle the behavioural changes in serial car theft. Journal of Investigative Psychology and
Offender Profiling, 9(3):274–295, 2012.
[60]
H Ellingwood, R Mugford, T Melnyk, C Bennell, and K Fritzon. Linking serial arson: Comparing the simple
matching index to jaccard’s coefficient. Journal of Investigative Psychology and Offender Profiling, 10(1):1–27,
2013.
[61]
Craig Bennell, Natalie J Jones, and Tamara Melnyk. Addressing problems with traditional crime linking methods
using receiver operating characteristic analysis. Legal and Criminological Psychology, 14(2):293–310, 2009.
[62]
Craig Bennell, Donna Gauthier, Donald Gauthier, Tamara Melnyk, and Evanya Musolino. The impact of data
degradation and sample size on the performance of two similarity coefficients used in behavioural linkage
analysis. Forensic science international, 199(1-3):85–92, 2010.
[63]
Jessica Woodhams and Gerard Labuschagne. A test of case linkage principles with solved and unsolved serial
rapes. Journal of Police and Criminal Psychology, 27:85–98, 2012.
[64]
Jan Martin Winter, Jan Lemeire, Stijn Meganck, Jo Geboers, Gina Rossi, and Andreas Mokros. Comparing the
predictive accuracy of case linkage methods in serious sexual assaults. Journal of Investigative Psychology and
Offender Profiling, 10(1):28–56, 2013.
[65]
Sandra Oziel, Alasdair Goodwill, and Eric Beauregard. Variability in behavioural consistency across temporal
phases in stranger sexual offences. Journal of Police and Criminal Psychology, 30:176–190, 2015.
[66]
Kaeko Yokota, Kazumi Watanabe, Taeko Wachi, Yusuke Otsuka, Kazuki Hirama, and Goro Fujita. Crime linkage
of sex offences in japan by multiple correspondence analysis. Journal of investigative psychology and offender
profiling, 14(2):109–119, 2017.
[67]
Chelsea Slater, Jessica Woodhams, and Catherine Hamilton-Giachritsis. Testing the assumptions of crime
linkage with stranger sex offenses: A more ecologically-valid study. Journal of Police and Criminal Psychology,
30:261–273, 2015.
[68]
Matthew Tonkin, Tom Pakkanen, Jukka Siren, Craig Bennell, Jessica Woodhams, Amy Burrell, Hanne Imre,
Jan M Winter, Eva Lam, G ten Brinke, et al. Using offender crime scene behavior to link stranger sexual assaults:
A comparison of three statistical approaches. Journal of Criminal Justice, 50:19–28, 2017.
[69]
Benny Salo, Jukka Sirén, Jukka Corander, Angelo Zappalà, Dario Bosco, Andreas Mokros, and Pekka Santtila.
Using bayes’ theorem in behavioural crime linking of serial homicide. Legal and Criminological Psychology,
18(2):356–370, 2013.
[70]
Pekka Santtila, Tom Pakkanen, Angelo Zappala, Dario Bosco, Maria Valkama, and Andreas Mokros. Behavioural
crime linking in serial homicide. Psychology, Crime & Law, 14(3):245–265, 2008.
[71]
Matthew Tonkin, Jessica Woodhams, Ray Bull, John W Bond, and Emma J Palmer. Linking different types of
crime using geographical and temporal proximity. Criminal Justice and Behavior, 38(11):1069–1088, 2011.
[72]
Matthew Tonkin, Jessica Woodhams, Ray Bull, and John W Bond. Behavioural case linkage with solved and
unsolved crimes. Forensic Science International, 222(1-3):146–153, 2012.
[73]
Matthew Tonkin, Jan Lemeire, Pekka Santtila, and Jan M. Winter. Linking property crime using offender
crime scene behaviour: A comparison of methods. Journal of Investigative Psychology and Offender Profiling,
16(2):75–90, 2019. ISBN: 1544-4759 Publisher: Wiley Online Library.
[74]
David R Cox. The regression analysis of binary sequences. Journal of the Royal Statistical Society Series B:
Statistical Methodology, 20(2):215–232, 1958.
[75]
Song Lin and Donald E Brown. An outlier-based data association method for linking criminal incidents. Decision
Support Systems, 41(3):604–615, 2006.
18
[76]
Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri. Finding patterns with a rotten core: Data mining
for crime series with cores. Big Data, 3(1):3–21, 2015.
[77]
Kilian Stoffel, Paul Cotofrei, and Dong Han. Fuzzy clustering based methodology for multidimensional data
analysis in computational forensic domain. International Journal of Computer Information Systems and Industrial
Management Applications, 4:11–11, 2012.
[78]
Soumendra Goala, Palash Dutta, and Pranjal Talukdar. Intuitionistic fuzzy multi criteria decision making
approach to crime linkage using resemblance function. International Journal of Applied and Computational
Mathematics, 5:1–17, 2019.
[79]
Soumendra Goala and Palash Dutta. A fuzzy multicriteria decision-making approach to crime linkage. Interna-
tional Journal of Information Technologies and Systems Approach (IJITSA), 11(2):31–50, 2018.
[80]
Varun Mandalapu, Lavanya Elluri, Piyush Vyas, and Nirmalya Roy. Crime Prediction Using Machine Learning
and Deep Learning: A Systematic Review and Future Directions. IEEE Access, 11:60153–60170, 2023.
arXiv:2303.16310 [cs].
[81]
Albert Meijer and Martijn Wessels. Predictive policing: Review of benefits and drawbacks. International Journal
of Public Administration, 42(12):1031–1039, 2019.
[82] H Ij. Statistics versus machine learning. Nat Methods, 15(4):233, 2018.
[83]
Molly Griffard. A bias-free predictive policing tool: An evaluation of the nypd’s patternizr. Fordham Urb. LJ,
47:43, 2019.
[84]
Yusheng Li and Xueyan Shao. Thresholds learning of three-way decisions in pairwise crime linkage. Applied
Soft Computing, 120:108638, 2022.
[85]
Yusheng Li and Xueyan Shao. A supervised machine learning framework with combined blocking for detecting
serial crimes. Applied Intelligence, 52(10):11517–11538, 2022.
[86]
Yu-Sheng Li, Hong Chi, Xue-Yan Shao, Ming-Liang Qi, and Bao-Guang Xu. A novel random forest approach
for imbalance problem in crime linkage. Knowledge-Based Systems, 195:105738, 2020.
[87]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in
Vector Space, September 2013. arXiv:1301.3781 [cs].
[88]
Hiroaki Sakoe and Seibi Chiba. Dynamic programming algorithm optimization for spoken word recognition.
IEEE transactions on acoustics, speech, and signal processing, 26(1):43–49, 1978.
[89]
Claude Elwood Shannon. A mathematical theory of communication. The Bell system technical journal,
27(3):379–423, 1948.
[90]
Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey Hinton. Restricted boltzmann machines for collaborative
filtering. In Proceedings of the 24th international conference on Machine learning, pages 791–798, 2007.
[91]
Shixiang Zhu and Yao Xie. Crime Event Embedding with Unsupervised Feature Selection. In ICASSP 2019 -
2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3922–3926,
Brighton, United Kingdom, May 2019. IEEE.
[92]
Brian C Renauer and Emma Covelli. Examining the relationship between police experiences and perceptions of
police bias. Policing: An International Journal of Police Strategies & Management, 34(3):497–514, 2011.
[93]
Shea W. Cronin, Jack McDevitt, Amy Farrell, and James J. Nolan III. Bias-crime reporting: Organizational
responses to ambiguity, uncertainty, and infrequency in eight police departments. American behavioral scientist,
51(2):213–231, 2007. ISBN: 0002-7642 Publisher: Sage Publications Sage CA: Los Angeles, CA.
[94]
Inbal Yahav, Onn Shehory, and David Schwartz. Comments mining with tf-idf: the inherent bias and its removal.
IEEE Transactions on Knowledge and Data Engineering, 31(3):437–450, 2018.
[95]
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomás Mikolov. Bag of tricks for efficient text classifica-
tion. CoRR, abs/1607.01759, 2016.
[96]
Sanjeev Arora, Yingyu Liang, and Tengyu Ma. A simple but tough-to-beat baseline for sentence embeddings. In
International conference on learning representations, 2017.
[97]
Gregory Koch, Richard Zemel, Ruslan Salakhutdinov, et al. Siamese neural networks for one-shot image
recognition. In ICML deep learning workshop, volume 2. Lille, 2015.
[98]
Enzo Yaksic. Addressing the challenges and limitations of utilizing data to study serial homicide. In Reviewing
Crime Psychology, pages 353–379. Routledge, 2020.
19
[99]
Alex Chohlas-Wood and E. S. Levine. A recommendation engine to aid in identifying crime patterns. INFORMS
Journal on Applied Analytics, 49(2):154–166, 2019. ISBN: 2644-0865 Publisher: INFORMS.
[100]
Fernando Martínez-Plumed, Cèsar Ferri, David Nieves, and José Hernández-Orallo. Fairness and Missing Values,
May 2019. arXiv:1905.12728 [cs, stat].
[101]
Marc-Etienne Brunet, Colleen Alkalay-Houlihan, Ashton Anderson, and Richard Zemel. Understanding the
origins of bias in word embeddings. In International conference on machine learning, pages 803–811. PMLR,
2019.
[102]
Thomas Manzini, Yao Chong Lim, Yulia Tsvetkov, and Alan W. Black. Black is to Criminal as Caucasian is to
Police: Detecting and Removing Multiclass Bias in Word Embeddings, July 2019. arXiv:1904.04047 [cs, stat].
[103]
Anton Borg, Martin Boldt, Niklas Lavesson, Ulf Melander, and Veselka Boeva. Detecting serial residential
burglaries using clustering. Expert Systems with Applications, 41(11):5252–5266, September 2014.
[104]
James B Howlett, Kenneth A Hanfland, and Robert K Ressler. The violent criminal apprehension program:
Vicap: A progress report. FBI L. Enforcement Bull., 55:14, 1986.
[105]
Zhihong Lin, Hong Chi, Mengyi Sha, Baoguang Xu, and Mingang Gao. The application of separability analysis
in feature selection of the serial crime linkage problem. In Presented at the The 45th International Conference
on Computers & Industrial Engineering, Metz/France, 2015.
[106]
Paul Brantingham and Patricia Brantingham. Crime pattern theory. In Environmental criminology and crime
analysis, pages 100–116. Willan, 2013.
[107]
David Weisburd. The law of crime concentration and the criminology of place. Criminology, 53(2):133–157,
2015.
[108]
Shyam Boriah, Varun Chandola, and Vipin Kumar. Similarity measures for categorical data: A comparative
evaluation. In Proceedings of the 2008 SIAM international conference on data mining, pages 243–254. SIAM,
2008.
[109] Paul Jaccard. Nouvelles recherches sur la distribution florale. Bull. Soc. Vaud. Sci. Nat., 44:223–270, 1908.
[110]
C. Izsak and A. R. G. Price. Measuring b-diversity using a taxonomic similarity index, and its relation to spatial
scale. Marine Ecology Progress Series, 215:69–77, 2001. ISBN: 0171-8630.
[111]
Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri. Learning to detect patterns of crime. In Machine
Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Prague, Czech
Republic, September 23-27, 2013, Proceedings, Part III 13, pages 515–530. Springer, 2013.
[112]
Yin Zhang, Rong Jin, and Zhi-Hua Zhou. Understanding bag-of-words model: a statistical framework. Interna-
tional journal of machine learning and cybernetics, 1:43–52, 2010.
[113]
Juan Ramos et al. Using tf-idf to determine word relevance in document queries. In Proceedings of the first
instructional conference on machine learning, volume 242, pages 29–48. Citeseer, 2003.
[114]
Shahzad Qaiser and Ramsha Ali. Text mining: use of tf-idf to examine the relevance of words to documents.
International Journal of Computer Applications, 181(1):25–29, 2018.
[115]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in
vector space. arXiv preprint arXiv:1301.3781, 2013.
[116] Kenneth Ward Church. Word2vec. Natural Language Engineering, 23(1):155–162, 2017.
[117]
Jacob Devlin. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint
arXiv:1810.04805, 2018.
[118] Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Chapados, and Siva
Reddy. Llm2vec: Large language models are secretly powerful text encoders. arXiv preprint arXiv:2404.05961,
2024.
[119]
George O Mohler, Martin B Short, Sean Malinowski, Mark Johnson, George E Tita, Andrea L Bertozzi, and
P Jeffrey Brantingham. Randomized controlled field trials of predictive policing. Journal of the American
statistical association, 110(512):1399–1411, 2015.
[120]
Logan Ewanation, Craig Bennell, Matthew Tonkin, and Pekka Santtila. Receiver operating characteristic curves
in the crime linkage context: Benefits, limitations, and recommendations. Applied Cognitive Psychology,
37(6):1277–1289, 2023.
[121] John A Swets. Measuring the accuracy of diagnostic systems. Science, 240(4857):1285–1293, 1988.
20
[122]
Haopeng Yang, Chengling Huang, Huang Liang, Weiwei Ding, and Xiaojian Li. A survey of property crime
incident links and their discovery techniques. In 2021 International Conference on Mechanical, Aerospace and
Automotive Engineering, pages 93–97, 2021.
[123]
Aida Ali, Siti Mariyam Shamsuddin, and Anca L Ralescu. Classification with class imbalance problem. Int. J.
Advance Soft Compu. Appl, 5(3):176–204, 2013.
[124]
Sohrab Hossain, Ahmed Abtahee, Imran Kashem, Mohammed Moshiul Hoque, and Iqbal H Sarker. Crime
prediction using spatio-temporal data. In Computing Science, Communication and Security: First International
Conference, COMS2 2020, Gujarat, India, March 26–27, 2020, Revised Selected Papers 1, pages 277–289.
Springer, 2020.
[125]
P Jeffrey Brantingham. The logic of data bias and its impact on place-based predictive policing. Ohio St. J. Crim.
L., 15:473, 2017.
[126]
David Buil-Gil, Angelo Moretti, and Samuel H Langton. The accuracy of crime statistics: Assessing the impact
of police data bias on geographic crime analysis. Journal of Experimental Criminology, pages 1–27, 2021.
[127]
Max Hort, Zhenpeng Chen, Jie M Zhang, Mark Harman, and Federica Sarro. Bias mitigation for machine
learning classifiers: A comprehensive survey. ACM Journal on Responsible Computing, 1(2):1–52, 2024.
[128]
Sorelle A Friedler, Carlos Scheidegger, Suresh Venkatasubramanian, Sonam Choudhary, Evan P Hamilton, and
Derek Roth. A comparative study of fairness-enhancing interventions in machine learning. In Proceedings of the
conference on fairness, accountability, and transparency, pages 329–338, 2019.
[129]
Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable
models instead. Nature machine intelligence, 1(5):206–215, 2019.
[130]
Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti,
Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, et al. Explainable artificial intelligence (xai) 2.0: A
manifesto of open challenges and interdisciplinary research directions. Information Fusion, 106:102301, 2024.
[131] Kari Davies. The practice of crime linkage. PhD thesis, University of Birmingham, 2018.
[132]
Tom Pakkanen, Angelo Zappalà, Caroline Grönroos, and Pekka Santtila. The effects of coding bias on estimates
of behavioural similarity in crime linking research of homicides. Journal of Investigative Psychology and
Offender Profiling, 9(3):223–234, 2012.
[133] John E Douglas and Lauren K Douglas. Modus operandi and the signature aspects of violent crime. 2006.
[134]
Adir Solomon, Amit Magen, Simo Hanouna, Mor Kertis, Bracha Shapira, and Lior Rokach. Crime linkage based
on textual hebrew police reports utilizing behavioral patterns. In Proceedings of the 29th ACM international
conference on information & knowledge management, pages 2749–2756, 2020.
[135]
Yu-Sheng Li and Ming-Liang Qi. An approach for understanding offender modus operandi to detect serial
robbery crimes. Journal of Computational Science, 36:101024, 2019.
[136]
Jessica Woodhams, Matthew Tonkin, Amy Burrell, Hanne Imre, Jan M Winter, Eva KM Lam, Gert Jan ten Brinke,
Mark Webb, Gerard Labuschagne, Craig Bennell, et al. Linking serial sexual offences: Moving towards an
ecologically valid test of the principles of crime linkage. Legal and Criminological Psychology, 24(1):123–140,
2019.
[137]
Hong Chi, Zhihong Lin, Huidong Jin, Baoguang Xu, and Mingliang Qi. A decision support system for detecting
serial crimes. Knowledge-Based Systems, 123:88–101, 2017.
21
6 Appendix
Table 1: Overview of the main data-driven studies on Crime Linkage.
Pub.
Year
Title Data used
Type of
Crime
MOs used
Similarity
Measures
ML tech-
nique
Summary
2022
Thresholds
learning of
three-way
decisions in
pairwise crime
linkage [84]
364 cases with 111
serial offenses from
Zhengzhou City, Henan
Province, China.
Robbery
11 MOs: number of crim-
inals, tools used, way
criminals disguise, way
victims are harmed, way
criminal rob property,
way criminal threat vic-
tims, part of victim be-
ing harmed, item robbed,
actions to control victim,
way criminals break ob-
stacles, actions taken by
criminals
Absolute Dis-
tance, Jaccard
and Cosine
Similarity.
RF
The authors proposed a ternary classifi-
cation system for linked crimes, rather
than the traditional binary approach. In
addition to the ’linked’ and ’not linked’
outcomes, they introduced a middle cat-
egory where the linkage decision re-
mains uncertain, necessitating further
evaluation by criminal experts. Their
goal was to optimize the decision bound-
ary, minimizing the number of cases
with high uncertainty.
2022
A supervised
machine learn-
ing framework
with combined
blocking for
detecting serial
crimes [7]
364 cases with 111
serial offenses from
Zhengzhou City, Henan
Province, China.
Robbery
11 MOs: number of crim-
inals, tools used, way
criminals disguise, way
victims are harmed, way
criminal rob property,
way criminal threat vic-
tims, part of victim be-
ing harmed, item robbed,
actions to control victim,
way criminals break ob-
stacles, actions taken by
criminals.
Absolute Dis-
tance, Jaccard
and Cosine
Similarity.
LR, KNN,
GDBT,
NN, and
RF.
The authors addressed the class imbal-
ance problem by reducing the number of
case pair assessments. Using behavioral
key features and combined blocking
methods, they first evaluated whether
a case pair possessed sufficiently strong
characteristics to be considered in the
CL decision process.
2022
Spatiotemporal-
textual point
processes for
crime linkage
detection [6]
349 burglary and 333
robbery crime incidents.
collected from 911 calls
from the Atlanta Po-
lice Department. 56 in-
cidents were identified
as being from serial of-
fenses.
Robbery
and Bur-
glary.
280 TF-IDF keywords
(for example: home,
door, window, stolen),
along with time of the
event and police beat lo-
cation.
Cosine Simi-
larity.
Un-
supervised
learning,
using
RBM
and EM
algorithm.
They reformulate the CL problem as a
Hawkes process, where an incident fol-
lows an intensity function that decays
over time. Similar to earthquakes behav-
ior, the authors considered that crimes
has an effect of influence other crimes,
which can considered linked or some-
how associated. However, the distance
between crime is not used as an decay
effect, but rather the linkage between
crimes are correlated with similar narra-
tive descriptions.
22
. . . continued
Pub.
Year
Title Data used
Type of
Crime
MOs used
Similarity
Measures
ML tech-
nique
Summary
2020
A novel random
forest approach
for imbalance
problem in
crime linkage
[86]
364 cases with 111
serial offenses from
Zhengzhou City, Henan
Province, China.
Robbery
111 MOs: number of
criminals, tools used,
way criminals disguise,
way victims are harmed,
way criminal rob prop-
erty, way criminal threat
victims, part of victim be-
ing harmed, item robbed,
actions to control victim,
way criminals break ob-
stacles, actions taken by
criminals.
Absolute Dis-
tance, Jaccard
and Cosine
Similarity.
IGRF They addressed the imbalance problem
by adjusting the data granularity. By ap-
plying IGRF, they reduced the number
of non-serial pairs through clustering of
similar pair-cases. This approach led
to improvements of around 2-5% com-
pared to traditional RF methods.
2020
Crime linkage
based on textual
hebrew police
reports utilizing
behavioral pat-
terns [134]
65,990 Hebrew reports
that occurred in Israel
between the years of
2005-2018. 9,622 of the
reports are labeled with
the criminal identity.
Burglary
Location, time, and 40
MOs (not clearly spec-
ified, but the authors
"focused on the bur-
glary characteristics (e.g.,
source of entry) rather
than on criminals’ at-
tributes (e.g., visual de-
scription, ethnicity)").
Spatio-
temportal
difference
and GBM to
classify how
likey a MO is
present in a
report.
Siamese
Neural
Network
They presented a solution that extracts
MO in a language independent fashion.
They used fasText to create the embed-
dings and the way they extracted MO
is similar to how police who interview
a victim or suspect. They compared
cosine similarities from defined ques-
tions and words and sentences gener-
ated by the embeddings. The cosine
similarities are fed into a GBM that
classifies how likely a MO is present
in a report. These probabilities and the
spatio-temporal difference are added to
a SNN that compares two reports and
gives a score how close they are from
each other. Their solution achieved an
f1-score of 92%.
2019
An approach
for understand-
ing offender
modus operandi
to detect serial
robbery crimes
[135]
334 cases with 86
serial offenses from
Zhengzhou City, Henan
Province, China.
Robbery
9 MOs: number of crim-
inals, tools used, way
criminals disguise, way
victims are harmed, way
criminal rob property,
way criminal threat vic-
tims, way criminals break
obstacles, actions taken
by criminals, crime pro-
cess.
Absolute
distance, Jac-
cards, Cosine
Similarity,
and DTW.
LR, SVM,
KNN, NN,
and RF
In addition to using numeric, categori-
cal, and keyword features, they incorpo-
rated crime process data to enhance the
model’s input. The crime process con-
sists of two sequences: one containing
only nouns (objects) and the other only
verbs (actions), extracted from narrative
reports. The similarity between these
sequences is calculated using DTW, and
the overall process similarity is deter-
mined by the weighted sum of these two
similarity measures. The weights are
computed using information entropy to
optimize the results. They demonstrated
that adding process information signifi-
cantly improved performance, with all
classifiers achieving over 90% accuracy.
23
. . . continued
Pub.
Year
Title Data used
Type of
Crime
MOs used
Similarity
Measures
ML tech-
nique
Summary
2019
Linking serial
sexual of-
fences: Moving
towards an
ecologically
valid test of the
principles of
crime linkage
[136]
3,364 sexual offences
from 5 different coun-
tries (UK, South Africa,
Finland, Netherlands,
Belgium), where 2,081
were solved serial
crimes, 1,191 were
solved apparent one-off
crimes (n = 1,191), and
91 were unsolved serial
crimes that were linked
by DNA.
Sexual
offenses
166 MOs related to gain
and maintain control over
the victim, associated
with exiting the crime
scene or evading capture,
sexual behaviours, target
selection variables, and
behaviours thought to re-
flect the offence ‘style’ of
the offender and that ‘are
not directly necessary for
the success of the attack’.
Jaccard. LR.
They demonstrated that incorporating
one-off and unsolved crime data points
does not negatively impact crime link-
age predictions. However, they ac-
knowledged that the method may pro-
duce a notable number of false positives.
Despite this, the overall accuracy was
strong, particularly with a larger sample
size compared to previous studies.
2019
Linking prop-
erty crime
using offender
crime scene
behaviour: A
comparison of
methods [73]
160 residential bur-
glaries committed by
80 serial offenders in
the Greater Helsinki,
Finland. 376 vehicle
theft crimes committed
by 188 serial offender in
Northamptonshire, UK.
118 commercial rob-
beries committed by 59
offender in the Greater
Helsinki, Finland.
Burglary,
Car
Theft,
and Rob-
bery.
Burglary MOs: location
of the crime, time and
day of the week, type of
property, method of entry,
the offender’s search be-
haviour, and the type and
cost of property stolen.
Car theft data MOs: lo-
cation, type of car that
was stolen, age of the ve-
hicle, time and day of
the week, how the vehicle
was entered and started,
and the physical state in
which the vehicle was
recovered. Commercial
robbery MOs: location,
type of business robbed,
time of day and day of
the week, whether a dis-
guise was worn, weapon
use, number of offenders,
use of violence, language
used, and the type and
cost of property stolen.
Jaccard.
LR, DT
and Bayes
Model.
They demonstrated better performance
using Bayesian and regression models,
with distance and temporal proximity
serving as the primary predictors. They
also discussed the limitations of relying
solely on AUC for evaluation, suggest-
ing that this could lead to misinforma-
tion. As a solution, they proposed using
ranked lists of matched crimes to pro-
vide more accurate insights.
24
. . . continued
Pub.
Year
Title Data used
Type of
Crime
MOs used
Similarity
Measures
ML tech-
nique
Summary
2019
A Recommen-
dation Engine
to Aid in Iden-
tifying Crime
Patterns [99]
Approximately 30,000
complaints gathered
from 10 years of data
from the NYPD.
Burglary,
Robbery,
and
Grand
Larceny.
39 MOs: Location
features, Time features,
Categoriacal features
(premise type, crime
classification, the M.O.
itself, weapon type, and
details about the crime’s
location), Suspect fea-
tures, and keywords
from complain narratives
(TF-IDF).
Mix of simi-
larities mea-
surements
(including
cosine similar-
ity, Goodall’s
similarity,
difference
in numeri-
cal features,
count of rare
words, count
of matches
in categorical
features).
RF.
This works present the NYPD solution
called Patternizr. It basically gives a
score of how similar crimes are from
each other. The model is trained on a
mixture of police complains, structured
data (including arrest information). By
inserting a complain in the application,
it results a list of complains that are sim-
ilar to the questioned one.
2017
A decision sup-
port system for
detecting serial
crimes [137]
92 solved cases, commit-
ted by 45 offenders (or
group of offenders).
Robbery
22 MOs: Categorical
features included num-
ber of gang members,
somatotype of the sus-
pect, gender of the vic-
tim; Numerical features
included height of the
suspect, age of the sus-
pect, time etc; Hierarchi-
cal features included ac-
tions taken by the sus-
pects, tools used by the
suspects, state of the vic-
tim before the crime was
committed, etc.
Expert
evaluation
(categorical
attributes),
Euclidean
distance, and
weighted
tree distance
(hierarchical
attributes).
NN.
They proposed a three-layer neural net-
work that outputs a score indicating the
likelihood of the input crimes being
committed by the same offender. By
incorporating a human-in-the-loop ap-
proach to adjust the weights for building
feature vectors, they achieved a preci-
sion of 76%. Additionally, they used
a separability index to prune the input
data for better performance.
2017
Using offender
crime scene
behavior to link
stranger sexual
assaults: A
comparison of
three statistical
approaches
[68]
3,364 sexual offences
from 5 different coun-
tries (UK, South Africa,
Finland, Netherlands,
Belgium), where 2,081
were solved serial
crimes, 1,191 were
solved apparent one-off
crimes, and 91 were
unsolved serial crimes
that were linked by
DNA.
Sexual
offenses
166 MOs related to gain
and maintain control over
the victim, associated
with exiting the crime
scene or evading capture,
sexual behaviours, target
selection variables, and
behaviours thought to re-
flect the offence ‘style’ of
the offender and that ‘are
not directly necessary for
the success of the attack’.
Jaccard.
LR, DT,
and Bayes
Model.
Their goal was to compare the three
methods, with the Iterative Classifica-
tion Tree showing slightly better perfor-
mance, although all methods produced
good AUC scores. They also evaluated
performance by setting a false alarm
cut-off threshold at 15%, which led to
a decrease in AUC. Additionally, they
demonstrated that the choice of Jac-
card’s design method significantly in-
fluences performance.
25
. . . continued
Pub.
Year
Title Data used
Type of
Crime
MOs used
Similarity
Measures
ML tech-
nique
Summary
2016
A statistical ap-
proach to crime
linkage [2]
4681 solved breaking
and entering crimes re-
ports provided by the
Baltimore County Police
Department, where 772
were considered serial.
Burglary
9 MOs: distance, tem-
poral proximity, prop-
erty type, point of entry,
method of entry.
Numerical
differences
and matching
attributes.
LR, Naive
Bayes,
Boosted
tree.
This study demonstrated how to apply
the Bayes Factor (BF) to evaluate crime
linkages. The key idea is that BF calcu-
lates the ratio of probabilities, favoring
the linked hypothesis over the unlinked
one. They also employed BF along with
hierarchical clustering to identify crime
series. Additionally, BF was used to
rank suspects within crime series, show-
casing its versatility in crime analysis.
2015
Testing the
assumptions
of crime
linkage with
stranger sex of-
fenses: A more
ecologically-
valid study [
67
]
50 serial and 50 one-off
cases from UK.
Sexual
offenses
217 MOs, which in-
cluded crime scene
location descriptions,
how the offender ap-
proached the victim,
verbal themes, and
sexual acts performed.
Jaccard LR.
TThe primary contribution of this work
at the time was the inclusion of one-
off cases. Surprisingly, this enhance-
ment led to improved model perfor-
mance compared to previous studies.
2014
A decision
support system:
Automated
crime re-
port analysis
and classi-
fication for
e-government
[47]
100 crime reports that
were synthetic gener-
ated by non-experts after
watching videos. Same
crime case could have
been written by different
people.
Burglary,
Robbery,
Theft,
and
Assault.
MOs are derived from the
reports using an ontology
created in previous work,
containing 20 semantic
trees, which including
38,000+ keywords and
phrases.
Jaccard and
Dice.
LR and
Naive
Bayes.
Similar to CL, their goal was to deter-
mine whether one report matches an-
other. Their motivation stemmed from
anonymous crime reports, which pro-
vide the police with an additional source
of crime data but often contain dupli-
cates. They demonstrated that the Naive
Bayes algorithm achieved an accuracy
of around 94% in identifying duplicate
reports.
2013
Using Bayes’
theorem in
behavioural
crime linking of
serial homicide
[69]
116 serial cases col-
lected from Italian news-
paper, Internet, and mi-
crofilms of library jour-
nals. This dataset had a
total of 19 series.
Homicide
92 MOs, not specifically