ChapterPDF Available

Abstract and Figures

We propose here a computational framework for co-offending network mining defined in terms of a process that combines formal data modeling with data mining of large crime and terrorism data sets as gathered and maintained by law enforcement and intelligence agencies. Our crime data analysis aims at exploring relevant properties of criminal networks in arrest-data and is based on 5 years of real-world crime data that was made available for research purposes. This data was retrieved from a large database system with several million data records keeping information for the regions of the Province of British Columbia. Beyond application of innovative data mining techniques for the analysis of the crime data set, we also provide a comprehensive data model applicable to any such data set and link the data model to the analysis techniques. We contend that central aspects considered in the work presented here carry over to a wide range of large data sets studied in intelligence and security informatics to better serve law enforcement and intelligence agencies.
Content may be subject to copyright.
Co-offending Network Mining
Patricia L. Brantingham2, Martin Ester1, Richard Frank2, Uwe Gl¨asser1and
Mohammad A. Tayebi1
School of Computing Science1
School of Criminology2
Simon Fraser University
British Columbia
Canada
{pbrantin,rfrank}@sfu.ca
{ester,glaesser,tayebi}@cs.sfu.ca
Abstract. We propose here a computational framework for co-offending
network mining defined in terms of a process that combines formal data
modeling with data mining of large crime and terrorism data sets as
gathered and maintained by law enforcement and intelligence agencies.
Our crime data analysis aims at exploring relevant properties of criminal
networks in arrest-data and is based on five years of real-world crime data
that was made available for research purposes. This data was retrieved
from a large database system with several million data records keeping
information for the regions of the Province of British Columbia. Beyond
application of innovative data mining techniques for the analysis of the
crime data set, we also provide a comprehensive data model applicable to
any such data set and link the data model to the analysis techniques. We
contend that central aspects considered in the work presented here carry
over to a wide range of large data sets studied in intelligence and security
informatics to better serve law enforcement and intelligence agencies.
Keywords: Computational criminology, Crime data model, Crime data
analysis, Social network analysis, Network-level analysis, Group-level
analysis, Node-level analysis, Network evolution, Network visualization
1 Introduction
Mathematical methods and computational tools increasingly gain momentum
in advanced studies of social phenomena not only in social sciences but also in
emerging interdisciplinary research fields like Computational Criminology [1,3].
Innovative research in criminology and counterterrorism indeed shows promising
results [7,2,8] that underscore the enormous potential for serving practical needs
in crime analysis and prevention, namely as instruments in crime investigations,
as an experimental platform for supporting evidence-based policy making and
in experimental studies to analyze and validate theories of crime [4]. The work
presented here has been inspired by practical experience with using mathematical
modeling combined with computational analysis techniques in the study of crime
2 Lecture Notes in Social Networks: Co-offending Network Mining
events spanning a wide range of criminal activities, including opportunistic and
violent serial crimes [6,3,5].
We propose here a comprehensive computational framework for co-offending
network mining [43,9,10] defined in terms of a process that combines formal data
modeling with data mining of large crime and terrorism data sets as gathered and
maintained by law enforcement and intelligence agencies. Aiming at identifying
common and potentially useful patterns in those data sets, the framework, as
illustrated in Figure 1, comprises three major phases: crime data preparation,
co-offending network mining and knowledge discovery.
Data Preparation Network Mining Knowledge Discovery
Data Information Knowledge
Crime Data
Preparation
Co-offending
Network
Analysis
Visualization &
Interpretation
Interpreting
Domain Expert
Evaluation
New
Requirements
Co-offending
Network
Extraction
Crime Data
Fig. 1. Computational framework for the process of co-offending network mining
The first phase of the framework, Data Preparation, is required to clean and
transform the collected data into a format that is suitable for the co-offending
network mining algorithms. Specifically, this phase includes data understanding
and specification, protecting offenders’ data privacy, data selection, detecting
and handling missing values and semantic errors (where possible), and applying
data preprocessing techniques that overall improve the efficiency of the mining
algorithms. Not considered is unstructured text as part of the crime data records.
The output of the preparation phase feeds into the Network Mining phase.
In the first step of this phase, the Co-offending Network Extraction component
extracts the network from the ‘cleaned’ crime data set. Subsequently, differ-
ent network analysis tasks are performed by the Co-offending Network Analy-
Lecture Notes in Social Networks: Co-offending Network Mining 3
sis Component, including network-level analysis, group-level analysis, node-level
analysis and network evolution analysis.
Finally, in the Knowledge Discovery phase, the information obtained in the
Network Mining phase is first interpreted and then visualized as a basis for its
evaluation by domain experts. Network visualization provides a synthetic and
simple description of the important features by representing information about
interrelationships of actors, groups and their connection patterns in the network.
The Visualization & Interpretation component generates a visual representation
of the co-offending network that facilitates comprehension of the obtained results
and enables innovative methods for analytical reasoning using visual analytics.1
The extracted information is then evaluated by criminologists. Knowledge dis-
covery naturally is a continuous and highly iterative process rather than a linear
waterfall process. To this end, the Requirements Component seamlessly connects
all three phases to enable a data-information-knowledge continuum. That is, new
problems can be defined based on evaluations and feedback by experts, resulting
in new tasks that are passed to one of the components in the first or second
phase. The mining process then starts again from that component.
Crime data analysis as proposed here aims at exploring relevant properties
of criminal networks in arrest-data. As a result of a research memorandum of
understanding between ICURS2and “E” Division of Royal Canadian Mounted
Police (RCMP) and the Ministry of Public Safety and the Solicitor General,
five years of real-world crime data was made available for research purposes.
This data was retrieved from the RCMP’s Police Information Retrieval Sys-
tem (PIRS), a large database system keeping information for the regions of the
Province of British Columbia which are policed by the RCMP. PIRS contains
information about all reported crime events (4.4 million) and all persons asso-
ciated with a crime (9 million), from complainant to charged. In addition, PIRS
also contains information about vehicles used in crimes (1.4 million), and busi-
nesses which were involved in crimes (1.1 million). Of this dataset, only those
offenders that were charged, chargeable, or had a charge recommended, were
extracted and used for the following analysis. Being in one of these categories
implies that the police were serious enough about the persons involvement in a
crime as to warrant calling them ’offenders’. In total, there are over 50 groups
of crime types. For the purposes of this study, however, only the four most
important groups were considered:
Serious Crimes: crimes against a person, such as homicide and attempted
homicide, assault, abduction;
Property Crimes: crimes against property, such as burglary (break and enter
onto a premises or real property, and theft;
Moral Crimes: such as prostitution, arson, child pornography, gaming, breach;
Drug Crimes: such as trafficking, possession, import/export.
1Visual analytics is an emerging field using computers to analyze and visually convey
massive amounts of data in a form that human experts can more readily understand.
2The Institute for Canadian Urban Research Studies (ICURS) is a university research
centre at Simon Fraser University.
4 Lecture Notes in Social Networks: Co-offending Network Mining
Beyond the application of innovative data mining techniques for the analysis
of the crime data set, we also provide a comprehensive data model applicable
to any such data set and link the data model to the analysis techniques. We
contend that central aspects considered in the work discussed here carry over to
a wide range of large data sets studied in intelligence and security informatics
to better serve law enforcement and intelligence agencies.
The remainder of this manuscript is organized as follows. Section 2provides
some general background and discusses related work. Section 3first presents the
crime data model and then the co-offending network model, and also explains the
extraction of the co-offending network from the crime data set. Section 4focuses
on the analysis of the resulting network. The visualization and interpretation of
the results obtained in the network mining phase is illustrated and discussed in
Section 5in some detail. Section 6concludes this manuscript.
2 Background and Related Work
Social networks represent relationships among social entities. Normally, such
relationships can be represented as a network. Examples include interactions
between members of a group (like family, friends or neighbors) or economic re-
lationships between businesses. Social networks are important in many respects.
Social influence may motivate someone to buy a product, to commit a crime, and
any other decision can be interpreted and modeled under a social network struc-
ture. Spread of diseases such as AIDS infection and the diffusion of information
and word of mouth also strongly depend on the topology of social contacts. In
the following, we first provide some background on social network analysis and
mining and then discuss related work on mining co-offending networks.
Social Network Analysis Social network analysis (SNA) focuses on struc-
tural aspects of networks to detect and interpret the patterns of social entities
[23]. SNA essentially takes a network with nodes and edges and finds distin-
guished properties of the network through formal analysis. Data mining is the
process of finding patterns and knowledge hidden in large databases [31]. Data
mining methods are increasingly being applied to social networks, and there is
substantial overlap and synergy with SNA.
New techniques for the analysis and mining of social networks are developed
for a broad range of domains, including health [41] and criminology [43]. These
methods can be categorized depending on the level of granularity at which the
network is analyzed[11]: (1) methods that determine properties of the social net-
work as a whole, (2) methods that discover important subnetworks, (3) methods
that analyze individual network nodes, and (4) methods that characterize net-
work evolution. In the following, we list the tasks that are most relevant for
co-offending networks:
Centrality analysis [23] aims at determining more important actors of a social
network so as to understand their prestige, importance or influence in a
network.
Lecture Notes in Social Networks: Co-offending Network Mining 5
Community detection [32] methods identify groups of actors that are more
densely connected among each other than with the rest of the network.
Information diffusion [33] studies the flow of information through networks
and proposes abstract models of that diffusion such as the Independent Cas-
cade model.
Link prediction [34] aims at predicting for a given social network how its
structure evolves over time, that is, what new links will likely form.
Generative models [35] are probabilistic models which simulate the topology,
temporal dynamics and patterns of large real-world networks.
SNA also greatly benefits from visual analysis techniques. Visualizing struc-
tural information in social networks enables SNA experts to intuitively make
conclusions about social networks that might remain hidden even after getting
SNA results. Different methods of visualizing the information in a social network
providing examples of the ways in which spatial position, color, size, and shape
can be used to represent information are mentioned in [26].
Mining Co-offending Networks A co-offending network is a network of of-
fenders who have committed crimes together [20]. With increasing attention to
social network analysis, law enforcement and intelligence agencies have come
to realize the importance of detailed knowledge about co-offending networks.
Groups and organizations that engage in conspiracies, terroristic activities and
crimes like drug trafficking typically do this in a concealed fashion, trying to
hide their illegal activities. In analyzing such activities, investigations do not
only focus on individual suspects but also examine criminal groups and illegal
organization and their behavior. Thus, it is critical to identify and study crim-
inal networks using information resources such as police arrest data and court
data so as to apply social network analysis algorithms on these networks. So-
cial network analysis can provide very useful information about individuals as
well. Investigators can determine those who play a more important role and
make them their subjects of a closer inspection. In general, knowledge about
co-offending network structures provides a basis for law enforcement agencies to
make strategic or tactical decisions.
There have been some empirical studies that use SNA methods to analyze
co-offending or terrorist networks. Reiss [20] concludes that the majority of co-
offending groups are unstable, and their relationships are short-lived. But he also
states that high frequency offenders are ’active recruiters to delinquent groups
and can be important targets for law enforcement’. Reiss et al. [28] also found
that co-offenders have many different partners, and are unlikely to commit crimes
with the same individuals over time. McGloin et al. [27] showed that there is
some stability in co-offending relationships over time for frequent offenders, but
in general, delinquents do not tend to reuse co-offenders. However, the findings
of these works may not be representative, since they were obtained on very small
datasets: 205 individuals in [28], and 5600 individuals in [27].
COPLINK [24] was one of the first large scale research projects in crime
data mining that performed some excellent works on criminal network analysis.
6 Lecture Notes in Social Networks: Co-offending Network Mining
Xu et al. [42] employed the idea of a ’concept space’ in order to establish links
between individuals by comparing the activities of multiple offenders. The more
two individuals were involved in the same criminal events, the more they are
assumed to know each other. This method allows for the translation of event and
narrative data into an undirected but weighted co-offending network. The goal
was to identify central members and communities within the network, as well as
interactions between communities. Their main contribution is the application of
cluster analysis in order to detect subgroups within the network, and their ability
to detect overall network structures which then can be used by the criminal
investigators to further their investigations.
Xu et al. [43] presented CrimeNet Explorer, a framework for criminal net-
work knowledge discovery incorporating hierarchical clustering, SNA methods,
and multidimensional scaling. The authors further expanded the research in [42]
and designed a full-fledged system capable of incorporating outside data, such as
phone records and report narratives, in order to establish stronger ties between
individual offenders. Their results were compared to the domain knowledge of-
fered by the Tucson Police Department, whose jurisdiction the data came from.
Finally, Xu points out that the use of crime network analysis is highly impacted
by laws, regulations and privacy issues over data collection, confidentiality and
reporting.
Smith [40] presented a slight twist on crime network analysis, for the pur-
poses of criminal intelligence analysis, where the network is enhanced by extra
information. For example, vertices are not limited to offenders, but could be
police officers, reports, or anything that can be represented as an entity. Links
are associated with labels which denote the type of the relationship between the
two entities, such as ’mentions’ or ’reported by’. In this sense, their analysis is
more representative of a database schema than a social network, which does have
advantages as it is more expressive.
In [29], Kaza et al. explored the use of criminal activity networks to ana-
lyze information from law enforcement and other sources to provide value for
transportation and border security. The criminal activity network is defined as a
network of interconnected criminals, vehicles, and locations based on law enforce-
ment records. The authors concluded that including vehicular data in criminal
activity network yields clear advantages, since vehicles provides new investiga-
tive leads that can be used to detect individuals and vehicles that might threaten
the security of the border and transportation infrastructure.
3 Crime Data Model
This section introduces a unified formal model of crime data as a semantic frame-
work for defining in an unambiguous way the meaning of co-offending networks
and their constituent entities at an abstract level. Specifically, the formal model
aims at bridging the conceptual gap between data level, mining level and inter-
pretation level, and also facilitates separating the description of the data from
Lecture Notes in Social Networks: Co-offending Network Mining 7
the details of data mining and analysis. By reducing the unified model to more
specific views, the co-offending network model is then obtained as one such view.
3.1 Unified Crime Data Model
Crime data can be modeled as a finite attributed tripartite hypergraph Hwith
V,Erepresenting the vertices and the edges of H. The vertex set Vis parti-
tioned into three pairwise disjoint sets, A={a1, a2, . . . , aq},I={i1, i2, . . . , ir}
and R={r1, r2, . . . , rs}, reflecting actors such as offenders, victims, witnesses,
suspects and bystanders; events referring to crime incidents of a certain type;
and resources used in a crime, like mobile phones, vehicles or weapons.
The set Econsists of (hyper)edges such that each e∈ E is a subset of vertices
{v1, v2,...vp}⊆V with |eI|= 1 and |eA| ≥ 1 and |eR| ≥ 0. 3
Further, for any e, e0∈ E with eI=e0Iit follows that e=e0. In other
words, every edge eof Hidentifies a subset of actors {ai1, ai2, . . . , aij} ⊆ Aand
a subset of resources {ri1, ri2, . . . , ril} ⊆ Rwith any crime event ikI, that is
e={ik, ai1, ai2, . . . , aij, ri1, ri2, . . . , ril}. See Figure 2for an example.
a1"
a2"
a3"
i2"
i3"
i1"
Resource s"
Events"
Actors"
r1"
r2"
r3"
e
Hyperedge"
e=i2,a2,a3,r
3
{ }
Fig. 2. Hyperedge in the crime data model
Finally, attributes are defined on the vertices of Vsuch that for each v∈ V
there is a finite list of pairs (αv, βv) where αvis an attribute name and βvis the
value of αv. Attributes of actors, for instance, include their name and address
information, while attributes of events include the crime type, the location where,
and the time when, this incident occurred.
3.2 Co-offending network model
Co-offending networks are composed of individuals who commit crimes together.
For analyzing and reasoning about co-offending networks, as well as other more
3Every crime data record in the crime data set refers to a different crime incident.
8 Lecture Notes in Social Networks: Co-offending Network Mining
specific aspects of crime data that can be described in terms of entities and
their relations, the unified crime data model defined by the hypergraph His
decomposed into a number of simpler graph structures as follows.
Consider an attributed tripartite hypergraph H= (V, E ) where Vis identical
to Vand E={{a, i, r}| ∃e∈ E such that {a, i, r} ⊆ e}for aA, i I , r R.
Note that Hhas the same attributes as H. The hypergraph His now further
decomposed in a straightforward way into three bipartite graphs that model the
relations between actors and events (graph AI), actors and resources (graph AR)
and events and resources (graph IR).
Criminal activity graph Starting from AI, a new graph AIO= (VO, EO),
called criminal activity graph, is constructed as follows. VOconsists of vertices
representing either offenders or events. That is, VO=AOI, with AOA,
where AOidentifies the offenders in the set of actors. Every edge in EOeither
links an offender to an event or it links two offenders with one another. The latter
type of edge means that two offenders have jointly committed one or more crimes
in the past. To indicate multiple co-offenses, an attribute strength is associated
with every edge (ai, aj)EO, for ai, ajAO, where strength((ai, aj)) 1.
Figure 3illustrates a criminal activity graph with three offenders a1, a2, a3for
which it is known that a1, a2and a1, a3have jointly committed multiple crimes
(some of the related incidents are not explicitly shown here). The resource in
this example is not an integral part of the graph but derived information.
a1"
a2"
a3"
i2"
i3"
i1"
Resource "
strength"="2"
strength"="5"
confidence"=".35"
aiAO
ikI
Fig. 3. Criminal activity graph with hidden links
Co-offending network For generating the co-offending network, we start from
the criminal activity graph AIO. Assuming koffenders and mevents (k, m > 1),
we define a k×mmatrix Msuch that muv = 1, if offender ouis involved in
Lecture Notes in Social Networks: Co-offending Network Mining 9
event iv, and “0” otherwise. Now, we define the co-offending network by means
of the k×kmatrix N=MM Tand therefore have
nu,v =
k
X
x=1
nuxnxv .
This matrix links offenders involved in the same crime events. For any two given
offenders, the strength of a link is the number of co-offenses. The diagonal of
this matrix shows for each offender the number of related crime events.
Hidden Links Co-offenders often try to conceal their connections as much as
they can. Also, the available data is police arrest data that contains only partial
information of offender collaborations and their social interactions. Based on
these two factors, one can expect that besides the links based on explicit facts in
the crime data additional links can be derived by analyzing and mining the crime
data using link prediction methods. Such links are called hidden links, which
are probabilistic in nature as they are based on information that is considered
uncertain. Thus, hidden links have an attribute confidence, the value of which is
a positive real number from the interval [0,1]. A confidence value of “0” means
that no link exists.
The example illustrated in Figure 3assumes that in all three of the crime
incidents i1, i2, i3a common resource, say a particular vehicle, was used by one
of the offenders a1, a2, a3. From this information, one may derive a hidden link
(a2, a3) with some probability as stated by the value of the attribute confidence.
Methods for link detection and link prediction are similar and essentially
mean different interpretations. Link prediction determines links that have a high
probability for creation in the next step of network evolution. Link detection
assumes that detected links were hidden or removed from the data. A detected
link is called missing link in link detection methods and forming link in link
prediction approaches. In this work the analysis focuses on explicit links.
3.3 Crime Data Preparation
This section addresses the data preparation phase covering all activities prior to
analyzing the data for network mining. Figure 4illustrates steps that are carried
out in the preparation phase. A brief description of these steps follows below.
Study Objectives: Any co-offending network mining study should have clearly
specified objectives. The objectives affect data selection and also the sub-
sequent steps. For instance, the structure of the data and also the analysis
process for static and for dynamic co-offending network mining are different.
Data Collection: Data for co-offending network mining can originate from dif-
ferent sources, for instance police arrest data or court data. Additional re-
sources may provide supplementary information like email contacts or phone
calls that help filling gaps or resolving inconsistencies.
10 Lecture Notes in Social Networks: Co-offending Network Mining
Study Objective Data Collection
Data Selection
Data Understanding
Data Preprocessing Data Modeling
Any co-offending network
mining study should have
clearly specified
objectives.
According to the
objectives data resources
need to be identified.
Offenders’ privacy is
protected using data
anonymization techniques.
Data fields relevant to the
particular analysis are
selected for mining.
Data is preprocessed so
that missing values or
noise are handled
appropriately.
Data is transformed into
the format which is most
suitable for the mining
algorithms.
Collected data is explored,
described and checked for
completeness(missing
values) and redundancy.
Privacy Protection
Fig. 4. Steps involved in the data preparation
Privacy Protection: Preserving the privacy of individuals is a central issue when
working with sensitive data sets. Data anonymization techniques normally
have to be applied prior to sharing this kind of data with third parties.
Data Understanding: In this step the data is explored, described and checked
for completeness (missing values) and redundancy.
Data Selection: In this step the target data that is relevant to the analysis is
selected based on the study objectives, excluding redundant or useless data.
Data Preprocessing: Data is preprocessed so that missing values or noise are
handled appropriately. For example, for the crime data studied here, about
40 percent of the offender home address fields were empty.
Data Modeling: Data is transformed into the format which is most suitable for
the mining algorithms. See sections 3.1 and 3.2 for a detailed discussion of
this important step.
4 Co-offending Network Analysis
In Section 2, we grouped the tasks of SNA into four main categories: (1) network-
level analysis, (2) group-level analysis and (3) node-level analysis (4) network
evolution analysis. In this section we represent the important analysis oriented
concepts under these categories as well as the results of the analysis on the co-
offending networks. We applied the analysis tasks on the co-offending networks
extracted from different crime types and also on several snapshots of these net-
works4Gu(t) denotes the co-offending network of a specific crime type u(a,s,
p,dand mrepresents the all, serious, property, drugs and moral crimes types)
from year 2001 to year t.
4In implementing the analysis tasks, we used SNAP library which is publicly available
at http://snap.stanford.edu/.
Lecture Notes in Social Networks: Co-offending Network Mining 11
4.1 Network-Level Analysis
At the network-level the goal is to find properties of the network as a whole.
Global properties such as degree distribution, clustering coefficient and average
distance reflect network characteristics and can also be employed evaluate the
similarity of different networks using these properties.
The degree of a node is the number of edges the node has. The degree dis-
tribution, P(k), gives the probability that a randomly selected node has klinks.
Studies have shown that the most real world networks from divers fields rang-
ing from sociology to biology to communication follow a power-law distribution:
P(k) = Kλ[12], where λis called the exponent of the distribution. Power-
law distribution implies that nodes with few links are numerous, while very few
nodes have very large number of links. Networks with this property are called
scale free networks. Scale free property is one of the most documented networks
property. There are some other network models like the Erdos-Renyi [13] and
the Watts and Strogatz [14] models that are known as exponential networks
and their degree distribution conforms to a Poisson distribution. In this type of
networks there is a peak at the average degree of the network, therefor most of
the nodes have the same degree around average degree of the network and very
few nodes have very small or very large node degrees.
The degree distributions of the co-offending network is also scale-free. Figure
4.1 demonstrates the cumulative degree distribution for different types of co-
offending network. In all of these networks we can observe behavior consistent
with a power-law network. It means the majority of the offenders have small
degree, and a few offenders have significantly higher degree. To test how well the
degree distributions are modeled by a power-law, we computed the best power-
law fit using the maximum likelihood method [16]. The power-law parameter for
all crimes, serious, property, drugs and moral co-offending network respectively
are 2.29, 1.57, 1.42, 1.53 and 2.28.
Each link in the co-offending network is associated with a co-offending strength.
The co-offending strength Si,j between two offenders iand jis equal to the num-
ber of crimes that two offenders have been involved. We then define a network
¯
G(V, E , α) where Eincludes the links between the pairs of offenders iand j
in Vwhose co-offending strengths Si,j exceed a specified threshold α. Then we
will have a family of networks {¯
G(α1),¯
G(α2),..., ¯
G(αm)}. Figure 6plots the
distribution of number of nodes and links for the threshold networks. Again, a
power law distribution of co-offending strength suggests that the vast majority
of dyads only offended once or twice, but there are more than hundred dyads
that offended with each other more than 10 times over five years. When two of-
fenders collaborate on multiple incidents, the likelihood of a strong relationship
between the two offenders is high, therefore such offenders and their behaviors
should be inspected more carefully by the crime investigators.
Law enforcement officers and intelligence analysts frequently need to deter-
mine if there is a possible association among a specific group of offenders in
a co-offending network. So we need methods to determine if two offenders are
connected and what is the shortest connecting path. Dijkstra’s shortest path al-
12 Lecture Notes in Social Networks: Co-offending Network Mining
(a) Serious Crimes (b) Property Crimes
(c) Drugs Crime (d) Moral Crimes
(e) All Crimes
Fig. 5. Degree distribution of co-offending network for different crime types
gorithm [17] for weighted networks and Breadth First Search (BFS) algorithms
for unweighted network, are applied to identify the shortest paths in the net-
works.
Average distance of the network G(V, E ) is defined as the average path dis-
tance of connected nodes pairs. Average path distance can show the speed of
spreading a message in a co-offending network. Let lij denotes the number of
links in the shortest path connecting nodes iand jin the case there is such a
path and as infinity if there is not any path connecting nodes iand j. Therefore,
the average distance in a network is defined as:
AvgD(G) = P{i,j}:lij 6=lij
|{{i, j}:lij 6=∞}|
And the diameter is defined as Diam(G) = M ax(lij :{i, j }:lij 6=). Diam-
eter is an important property of the topology of a social network. The diameter
represents the longest path within the network and describes the compactness
and connectivity of the network. A network with a small diameter is very well-
Lecture Notes in Social Networks: Co-offending Network Mining 13
Fig. 6. Co-offending strength distribution
connected but a network with a large diameter will be very sparsely-connected.
For removing the effect of outliers another measure called effective diameter is
used. Effective diameter is the minimum number of hops for reaching at least 90
percent of all connected pairs of nodes in the network [19]. Table 1shows the av-
erage path length, diameter and effective diameter for the five studied networks.
The average path lengths and diameters for some of them are remarkably short.
For the network Ga(2006) average distance, diameter and effective diameter are
12.2, 36 and 16.87, respectively.
Metric All Crimes Serious Property Drugs Moral
# Co-offenders 157274 31132 44321 54286 35266
Avg. degree 4 1.85 1.95 2.15 4.8
Exponent (λ) 2.29 1.57 1.42 1.53 2.28
Avg. distance 12.2 1.69 8.45 22.17 3.41
Diameter 36 13 24 56 19
Effective diameter 16.87 4.1 14.36 36.14 5.68
Clustering coefficient 0.39 0.28 0.33 0.39 0.49
Largest Comp. percentage 18 % 10 % 32 % 23 % 21 %
Table 1. Statistical properties of the studied networks
In many social networks the friend of an actor is likely to be also her friend.
In other words, actors tempt to create complete triangles of relationships. This
property is called network clustering or transitivity. The clustering coefficient of
a node in the co-offending network tells us how much a nodes collaborators are
willing to collaborate with each other, and it represents the probability that two
of its collaborators are involved in a crime together. Local Clustering Coefficient
14 Lecture Notes in Social Networks: Co-offending Network Mining
calculates the probability of neighbors of a node to be neighbors to each other
is given by:
Cv=Nv
kv(kv1)
where kvis the number of neighbors of v,kv(kv1) is the maximum number
of links that can exist between neighbors of vand Nvis the number of links that
actually exist among neighbors of v. The average clustering coefficient per degree
is shown in Figure 10(b). The clustering coefficient of the network is computed
by averaging Cvfor all nodes in the network [14]:
C=1
|V|X
vV
Cv
The clustering coefficient of co-offending network of Gs(2006), Gp(2006),
Gd(2006), Gm(2006) and Ga(2006) are respectively 0.28, 0.33, 0.39, 0.49 and
0.39. Because the clustering coefficient in a network shows to what extent friends
of a person are also friends with each other, we can conclude that in co-offending
network of moral crimes with higher clustering coefficient, offenders have closer
collaboration comparing to other types of co-offending networks.
4.2 Group-Level Analysis
Entities of a network are interested in forming groups and interact more closely
to each other inside the group. The specific characteristic of a group is that
there is a higher degree of connectivity inside the group than entities outside
the group. Nowadays, in the field of criminology the idea that crimes are not al-
ways committed by offenders individually but that many crimes are planned and
committed by several offenders working together, is becoming more important.
Also in the last decade there have been more and more experimental studies into
criminal activities that need specific forms of collaboration and organization [15].
For detecting these type of collaborations we need to mathematically formalize
concepts such as group crime, gang, organized crime and corporate crime and
then design efficient algorithms for this purpose. By inspecting relations between
offenders to identify criminal groups, law enforcement organizations can track
the origin and core of what may become an organized crime group or a gang. In
this way a criminal group can be identified prior to its formation and police can
follow such offenders behavior.
As a first step, we studied the distribution of components of the co-offending
network. A component is a connected subset of a graph in which there are paths
between all pairs of nodes [23]. If two offenders were involved in a crime, there
is a path between them. If a third offender was co-offended with any one of the
first two offenders, a path can be built connecting the first offender with the
third offender and so on. If a path between two offenders can be established,
the two offenders are said to belong to the same component of the network. The
notion of a component has particular significance for the study of the spread
Lecture Notes in Social Networks: Co-offending Network Mining 15
of epidemics on a network. In a co-offending network we can find important
situations for epidemic phenomena. Having co-offending networks components
structure can contribute on decreasing crime epidemics such as drug use epidemic
in the society. Let |c|represent the size of component c. Then we define three
types of components: Large components |c| ≥ 1000, Medium components 100
|c|<1000 and small sized components 2 ≤ |c| ≤ 100. In the network Ga(2006),
25%, 1% and 74% of the whole offenders are connected to each other respectively
through large, medium, and small componenets.
100101102103104105
100
101
102
103
104
105
Size
Count
Component
Community
Fig. 7. Size distribution of components and communities
In the second step, we studied the community structure in the co-offending
network. We applied the Girvan-Newman algorithm [18] for detecting commu-
nities on the network Ga(2006). The key idea behind this algorithm is that the
edges that connect highly clustered communities have a higher edge between-
ness. So, the communities are detected by progressively removing edges with
highest betweenness from the network. After every removal, the betweenness of
the edges is recalculated and the process is repeated until the social network is
divided into a specified number of subnetworks, the communities. Figure 7shows
the size distribution of detected communities and also components. The largest
extracted community size has about 4 thousand members, which is relatively
small compared to the largest component with more than 28 thousand nodes.
This group size is too large to be meaningful from a criminological point of
view. There is a need for novel community extraction methods that particularly
address the special requirements of co-offending networks.
16 Lecture Notes in Social Networks: Co-offending Network Mining
4.3 Node-Level Analysis
In addition to partition social networks into groups, we can categorize them
based on the set of relations they have in the network. Such actors take simi-
lar positions within an organization, community the whole social network [23].
Establishing and breaking relationships happens very often in co-offending net-
works. Therefore positions, roles, and power of offenders in a co-offending net-
work change consequently. At the node level of analysis, centrality is the most
comprehensively studied concept. Centrality reveals how important, influential
or powerful a node is, which may reflect the roles of actors in a network. The
centrality of offenders can be determined using different measures such as:
Degree Centrality: In this definition centrality of a node refers to the number
of links incident upon the node.
Betweenness Centrality: is based on the number of shortest path connections
between any two nodes in the network that the node in question lies along.
Closeness Centrality: The idea is that nodes are more central if they can
reach other nodes easily which is measured by averaging the length of the
shortest path from a node to all other nodes.
Eigenvector Centrality: The main idea behind eigenvector centrality is that
nodes receiving many links from other nodes with high centrality measures,
are more central nodes.
Fig. 8. Visualization of the second largest component
As an example, we present the centrality analysis on the second largest com-
ponent. Table 2lists the ranks of the key nodes A to L (see Figure 8) according
Lecture Notes in Social Networks: Co-offending Network Mining 17
to the chosen centrality measure. The numbers within the table indicate the or-
dering of the Top 5 offenders identified by each measure. For example, offender
E was identified as the second most important offender by eigenvector centrality,
but only 4th with betweenness centrality.
Centrality Measure Offenders
A B C D E F G H J K L
Degree 1 3 4 5 2
Betweenness 1 3 5 4 2
Closeness 2 4 1 3 5
Eigenvector 1 5 3 2 4
Table 2. Centrality measures on the second largest component
Although all the different centrality measures tended to identify different
individuals in varying order, all but one measure agreed that offender A was
most important. This strong result was somewhat surprising given that in total
11 offenders were identified to be in the Top 5 by different measures. Offender
A does seem to be an important offender in the network, as this offender has
the largest number of edges, is involved in 3 cliques of at least size 5, and would
fragment the network into 4 pieces if removed. Without this information, a police
force could capture multiple offenders, not realizing that specifically targeting
only offender A in the network would have a huge impact on the network and
remove by far the most important offender. It is also interesting to see that the
above 4 measures identified quite a few offenders along the shortest path between
the two furthest nodes. This path travels through offenders K, C, A, F, G, H
and J.
4.4 Network Evolution Analysis
Like other social networks, a co-offending network is not a static network and
keeps changing over time: offenders may leave or join the network and their po-
sitions may change by obtaining or losing power. Links between offenders may
form or disappear. Offenders groups may appear, split, merge or disappear. Net-
work structure may change from decentralized to centralized, flat to hierarchical
or vise versa. Analysis of all these evolutions is important and also complicated.
But detecting the evolution patterns of a co-offending network, can represent
important information to law enforcement organizations.
We study how the co-offending network evolves over time based on multiple
snapshots of the network. For this purpose we generated five snapshots of the
co-offending network for the years 2001 to 2006. Each snapshot contains the
extracted co-offending network from events that happened from 2001 up to that
time. For example Ga(2004) is the co-offending network of all crimes from 2001
18 Lecture Notes in Social Networks: Co-offending Network Mining
to 2004. Below, we examine the evolution of co-offending network based on these
five snapshots for various network structural properties.
Figure 9(a) demonstrates the evolution of size and number of components
over time. The most interesting observation is that, after one year, in the network
Ga(2002) there is no large component but it grows in a nearly linear trend. On
the other hand, in all networks not many offenders are connected to the medium
sized components. The reason is that the medium sized components are merged
with the large components through some of their nodes and we do not have
them as independent components. In other words, medium sized components
blend in the large components very soon and make them richer, therefore we do
not observe their existence in the network for a long time period. There exists
a similar phenomenon in other real-world social networks, a large component
tends to form with the remaining being singletons and smaller components [21].
The number of nodes that belong to the small components are almost constant
in all five years. The reason is that always some of the small components are
connected to the medium or large components and simultaneously some new
small components appear in the network.
In Figure 9(b) we plot the evolution of the average distance, diameter and
effective diameter of the co-offending network between 2001 and 2006. This find-
ing may be surprising because of the increasing size of the co-offending network,
as network models generally suggest that average distance and diameter should
increase with network size [22]. In our case, all these three measures are increas-
ing in the first 3 years and then they start decreasing in last 2 years. There are
studies which report similar results [30].
Figure 10(a) shows the evolution of the clustering coefficient. There are three
observations. First, clustering is stationary in all five years. As expected, clus-
tering is higher than the expected clustering of a random network with same
number of vertices and edges. Finally, our results is opposite to the empirical
studies of some of social networks [22], where clustering was found to decrease
over time.
5 Network Visualization and Interpretation
Visualization can facilitate social network analysis. Visualization allows the in-
vestigators to discover patterns of interactions among the offenders, including
detecting criminal subgroups, central offenders and their roles, and discovering
patterns of interactions among offenders. Visualization also can provide new in-
sights into network structures for investigators while helping them communicate
with others [26]. Definitely, besides just creating images, the process of visualiz-
ing a social network can generate learning situations. But effective visualization
mostly should be accompanied with a comprehensive and detailed interpreta-
tion. This is needed more in multidisciplinary projects like co-offending network
mining. Well-done visualization and interpretation can fill the gap that exists
between SNA experts and law enforcement officers. This is the reason of having
a component called ”visualization and interpretation” in the knowledge discov-
Lecture Notes in Social Networks: Co-offending Network Mining 19
Fig. 9. (a) Clustering coefficient vs time; (b) Average distance vs time
ery phase of the proposed framework. For applying our visualization tasks and
interpreting them, we selected the second largest component of the co-offending
network, see Figure 8. Before looking at the properties of the offenders within the
network, a high-level description of the entire network is in order. The selected
network, Figure 8, contained 138 vertices, 266 edges and was created from 189
distinct criminal events and all offenders associated to those events. The net-
work does not contain any isolates; they would have been included in another
’network’. Thus reachability is 1.
5.1 Crime type
The network was limited to 4 different crime-types: serious crimes, property
crimes, moral crimes and drug crimes. The below sections analyze the network
by focusing on only a specific crime-type during each analysis. Note that for easy
analysis, all the vertices are fixed, the links however do change depending on the
crime-type being shown. Isolates in any of the networks simply imply that the
offender did not commit the type of crime being analyzed.
20 Lecture Notes in Social Networks: Co-offending Network Mining
(a) Clustering coefficient vs time
(b) Average clustering coefficient per degree
Fig. 10. (a) Clustering coefficient vs time; (b) Average clustering coefficient per degree
Serious crimes: Of the original 138 vertices and 266 edges, limiting the
analysis to co-arresters involved in a serious crime fragmented the original net-
work into many very small pieces, see Figure 11(a). Instead of a single cohesive
network, now there are 11 sub-networks, varying in size from 2 to 8 offenders.
In total 39 vertices are active, with 36 active edges, although only 34 edges are
visible since 2 of the edges are repeat co-arresters in serious crimes. For an edge
to be active, both vertices involved with the edge must be charged with a serious
crime during the same event. The average number of edges per vertex is 0.872,
meaning, overall, offenders tend to commit crimes once or twice, each time with
a different offender.
As expected though, the majority of the resulting sub-networks were small.
This could be because people do not commit these types of crimes as frequently
as other offenses, or people who do commit these types of crimes are sent to jail
for longer periods of time and hence their inability to commit further offenses
with others is taken away from them.
Lecture Notes in Social Networks: Co-offending Network Mining 21
Property Crimes: Since property crimes are easier to commit than crimes
against other people, intuition would tell us that this type of crime would be
more common, leading to more extensive and probably more connected sub-
networks than that for serious crimes. This was indeed so. See Figure 11(b) for
an illustration.
This restriction also created 11 smaller sub-networks, similar to that for
serious crimes. However, the largest property-crime co-arrester network had 27
offenders. 74, or roughly half of the 138 original vertices remained active, and
123, or roughly half of the edges of the original 266 remained active. Of the 123
edges, only 106 were visible, implying that there were quite a few repeat co-
arresters in the sub-network. There are also several sections of the graph which
involve cliques of 6 (M) or 4 (N, O and P) vertices. Compared to the network
produced by serious crimes this network is much more connected.
The difference in networks for serious and property crimes is interesting, in
that the network for property crimes is much more extensive, implying that
property offenders co-offend much more frequently and serious crimes tend to
involve much smaller networks. Information on people offending individually was
not available; hence if a person committed a crime alone, they would not be part
of this analysis.
Morals: Restricting the network to moral crimes, such as prostitution, arson,
child porn, gaming, breach, resulted in a sub-network that had only 104 edges
and 65 vertices active (Figure 11(d)). There are 2 cliques of 6 (points Q and
R) and 3 cliques of 5 (points S, T and U). Offender A is obviously very highly
connected, with 16 different edges, and is involved in 3 sizable cliques. Just the
fact that this person alone is within 3 totally distinct cliques illustrates how well
this person is connected within the network. If the police wanted to disrupt the
network, this person would be a good target: prolific and pretty well embedded
in the network.
It is interesting to see that there is an edge (V) between two individuals which
exists in both the Moral and Property sub-networks implying that Offender A
is not only prolific, but does not restrict themselves to a single type of crime.
Figure 11(c) illustrates the sub-network created when the crime-type is re-
stricted to just drug crimes (such as trafficking, possession, import/export). The
resulting ’network’ is not large, just a single event involving 3 offenders. Note
that this proportion of crimes which are drug-related is not reflective of the other
networks in general that were created (there is a single network of 100 offenders
that shows just drug offenses).
5.2 Spatial Mapping of Co-Offenders
For detecting the patterns between co-arresters home location of offenders are
visualized 12(a). As a result, only 3 very tiny clusters were seen in this visu-
alization, one in Prince George, one on Vancouver Island and finally on in the
Greater Regional Vancouver District (GVRD). The clusters were separated by
huge distances (since there is nothing there but either mountains or water), but
differences within the clusters were lost due to the small scale. Thus the most
22 Lecture Notes in Social Networks: Co-offending Network Mining
(a) Serious Crimes (b) Property Crimes
(c) Drugs Crime (d) Moral Crimes
Fig. 11. Visualization of the second largest component for the crime types
important cluster was chosen, the one located in the GVRD, and analysis was
restricted to that. Further, offenders with no reported location, or a location
which could not be geo-coded, had to be eliminated from this analysis. The re-
sulting set of people can be seen in Figure 12(a), which shows the co-arresting
relationships for offenders in the Greater Vancouver Regional District (GVRD).
Multiple offenders with the same reported home location, in this image, would
overlap and hence would not be visible. In total, 81 vertices were left of the
network, creating a total of 162 edges.
This analysis brought up the question whether offenders tend to co-arrest
with others outside of their own city when committing an offense. For each
event, the home location of each co-arrester involved in the event was compared
to all the other co-arresters in the event. Of all the 155 events, 38 of them had
all the offenders involved living in the same city, and 14 events had all offenders
living within 2 cities. This implies that only about 25% of the population tends
to get arrested with someone else from the same city.
5.3 Distances between home locations
For co-arresting relationships, distances between the home location (at the time
of crime) of the two offenders was calculated and used to construct the network,
see Figure 12(b). Missing edges indicate that at least one of the two vertices
involved in that edge did not have a properly geocodable location. The thickness
of the edge indicates the distance between the two offenders.
Upon visual inspection it can be concluded that co-arresters do not live
that far away from each other. A detailed analysis of the raw distance numbers,
Table 3, confirms this visual result. Taking all the distance measures, the average
Lecture Notes in Social Networks: Co-offending Network Mining 23
Fig. 12. (a) Relationships for offenders in the GVRD; (b) Network showing distance
between home locations of offenders (thickness indicates magnitude of distance. Missing
links indicate missing information)
distance an offender travels to get to their co-arrester’s residence is 10.7km. That
is quite a distance, especially given that the standard deviation is very large
compared to this, at 31.2km. This indicates that the extreme measures dominate
the calculation. Thus the median, at 2.8km, is a more reasonable number to look
at. 2.8kms is not a distance which will faze most people from driving it, and would
most likely put the co-arresters within the same city.
In order to remove outliers from the data the top and bottom 10% of the
data were removed. In this instance the maximum distance fell to 16.8kms,
indicating that 10% of the people actually co-arrest with someone living very
far away. Similarly, the minimum rose from 0km to 0.3km, indicating that 10%
of the people live very close, as in the same building. The average value fell
significantly. All the results indicate that co-arresters tend to live within a few
minutes driving of each other.
24 Lecture Notes in Social Networks: Co-offending Network Mining
Distance Difference Age Difference
All Data Top/bottom 10% removed All Data Top/bottom 10% removed
Min 0 km 0.3 km 0 y 0 y
Max 251 km 16.8 km 35 y 9 y
Average 10.7 km 4.3 km 3.25 y 1.7 y
Median 2.8 km 2.8 km 1 y 1 y
StdDev 31.2 km 4 km 6 y 1.6 y
Table 3. Distance and age difference between co-offenders
Why this is so is not clear. It is possible that convenience dictates how far
apart offenders live and people do not wish to offend with others who live far
away since they do not wish to drive that far. It is also possible that offenders
establish co-offending relationships with others more readily if they live in the
same neighborhood, simply due to the chance of meeting someone randomly is
much larger if both people live in the same neighborhood, than if they live far
apart. Either way, spatial distance does seem to be a barrier to co-offending.
5.4 Offenders Age Differences
The setup of this sub-network is similar to the previous sub-network. The dif-
ference in ages between co-arrester was calculated for all people who co-arrested
together. Figure 13(a) shows the results. Thickness of the edges shows size of
age difference (thickest line is 35y). All people had birth-dates recorded, hence
there are no missing edges.
The network, after a visual analysis, has a lot of very thin edges. In fact,
the median difference in ages was 1y, Table 3. Thus at least 50% of the people
commit crimes with people that are (for all intents and purposes) the same age as
them. Even when the top and bottom 10% of the data was removed, the average
age difference between co-arresters was 1.7y, and the maximum difference was
only 9y.
How do these people meet? Given that their ages are very close together,
it would imply that they meet in school, where they are enclosed with other
people of similar ages. This is not true for work-places, where the co-workers
could be of any age. If co-offending relationships are established in school, then
perhaps more focus could be placed on school activities, such that these would-
be offenders are occupied with other things and do not turn their attention to
mischief. Perhaps if these co-offending relationships are not established so early,
they will not be established later.
The biggest difference in ages was 35y. This prompted the question, are
co-arresters with a large age difference family members? The names of the co-
arresters in these large age-difference relationships was compared and the answer,
surprisingly, is no. After this discovery, all co-arresters were compared for family-
ties (based on encrypted last names) and 4 pairs of co-arresters were found to
be probable family members and had very close ages. 1 pair of co-arresters was
Lecture Notes in Social Networks: Co-offending Network Mining 25
found to have the same family name, but were 17 years apart in age. Thus it does
not seem that family co-arresting is a big problem, but again, due to the limited
comparisons based on encrypted family names, this approach has its problems.
A wife, who takes her husbands name, would not be tied back to her parents,
for example. Two different offenders with last name ’Smith’ would be connected
although it is a very popular name.
Fig. 13. (a) Network showing distance between home locations of offenders (thick-
ness indicates magnitude of distance. Missing links indicate missing information); (b)
Network showing gender of offenders (blue = female)
26 Lecture Notes in Social Networks: Co-offending Network Mining
5.5 Offenders’ Gender
Finally, the network was analyzed by focusing on the gender of the offender.
Analysis of this property of the network, Figure 13(b), reveals that most of the
offenders (89.8%) are male, which is consistent with literature. However, the
distribution of the females in the network does not seem to be random due
to the small cluster of female offenders connected to offender W. The linkages
between these offenders are visible in the ’Property Crime’ sub-network, thus
the cluster (not clique) of 8 offenders were involved in property crimes. The link
between the female and male offender I however is shown on the ’Serious Crime’
sub-network, implying that W is a relatively important offender, involved in at
least 2 offenses, and of larger severity than the offenders connected to W.
6 Concluding Remarks
Research in co-offending network mining often lacks access to large real-world
crime data sets. One reason for this limitation is the highly sensitive nature
of such data and the related privacy issues demanding strict security protocols
as well as data storage and processing facilities that meet exceptionally high
security standards. An interesting open question is to what extend advanced
anonymization techniques can help solving this problem by making secure data
more widely available without compromising privacy.
In our study we have extracted co-offending networks for a number of most
important types of crime, including serious, property, drugs and moral crimes.
The analysis of the co-offending network revealed several interesting insights. In
particular, the ranking of offenders with respect to various centrality measures
agreed well, which allowed a robust discovery of the most important offenders
to be targeted by the police. Surprisingly, the average distance and the diameter
of the co-offending network have shrunken in the last few years, indicating a
densification of the social network. These results are in line with known studies
that show similar phenomena in other types of social networks [30].
The proposed formal model of crime data and co-offending networks provides
a well defined semantic framework for describing in an unambiguous way the
meaning of co-offending networks and their constituent entities at an abstract
level. Specifically, the formal model aims at bridging the conceptual gap between
data level, mining level and interpretation level, and also facilitates separating
the description of the data from the details of data mining and analysis.
Our analysis also pointed out directions that require future research. While a
state-of-the-art method for community detection produced more meaningful re-
sults than a simple baseline method, the communities detected were too large to
be meaningful from a criminological point of view. There is a need for novel com-
munity detection methods addressing the special requirements of co-offending
networks. Finally, the role of visualization as an enabling factor for analytical
reasoning as part of the knowledge discovery process is crucial in any practical
use of the proposed framework by law enforcement and intelligence agencies.
Lecture Notes in Social Networks: Co-offending Network Mining 27
We intend to closely collaborate on innovative visualization techniques with the
recently founded Vancouver Institute for Visual Analytics (VIVA)5, a joint ini-
tiative by the University of British Columbia and Simon Fraser University.
Acknowledgements
We are thankful to RCMP“E” Division and BC Ministry for Public Safety and
Solicitor General for making this research possible by providing Simon Fraser
University with crime data from their Police Information Retrieval System. We
also like to thank the anonymous reviewer(s) for their constructive criticism and
helpful comments on an earlier version of our manuscript for this chapter.
References
1. P. L. Brantingham, Crime Pattern Theory. In B. Fisher and S. Lab (eds.) En-
cyclopedia of Victimology and Crime Prevention. Beverly Hills: Sage Publishing,
2010.
2. N. Memon, J. D. Farley, D. L. Hicks and T. Rosenorn (eds.). Mathematical Meth-
ods in Counterterrorism. Springer, 2009.
3. P. L. Brantingham, U. Gl¨asser, P. Jackson and M. Vajihollahi. Modeling Criminal
Activity in Urban Landscapes. In N. Memon et al. (eds.), Mathematical Methods
in Counterterrorism, Springer, 2009.
4. L. Liu and J. Eck (eds.). Artificial Crime Analysis Systems: Using Computer Sim-
ulations and Geographic Information Systems. IGI Global, 2008.
5. P. L. Brantingham, U. Gl¨asser, P. Jackson, B. Kinney and M. Vajihollahi. Mas-
termind: Computational Modeling and Simulation of Spatiotemporal Aspects of
Crime in Urban Environments. In L. Liu, J. Eck (eds.), Artificial Crime Analysis
Systems: Using Computer Simulations and Geographic Information Systems, IGI
Global, 2008.
6. D. Kim Rossmo. Geographic Profiling. New York: CRC Press, 2000.
7. M. B. Short, P. J. Brantingham, A. L. Bertozzi and G. E. Tita. Dissipation and
Displacement of Hotspots in Reaction-Diffusion Models of Crime. PNAS. 107:3961-
3965, 2010.
8. A. Abbasi, and H. Chen. Applying authorship analysis to extremist-group Web
forum messages. IEEE Intelligent Systems 20(5): 67-75, 2005.
9. R. Adderley and P. Musgrove. Modus operandi modelling of group offending: a
data-mining case study. International Journal of Police Science and Management.
5(4): 265-276, 2003.
10. A. Malm, G. Bichler, and S. Van de Walle. Comparing the ties that bind criminal
networks: Is blood thicker than water?. Security Journal (2010) 23, 5274.
11. U. Brandes, and T. Erlebach. Network Analysis: Methodological Foundations.
Berlin: Springer-Verlag chapter Fundamentals, 2005.
12. A. L. Barabasi and R. Albert. Emergence of scaling in random networks. Science
286, 1999.
13. P. Erdos and A. Renyi. On random graphs. Publicationes Mathematicae 6, 1959.
5See also www.sfu.ca/~viva/.
28 Lecture Notes in Social Networks: Co-offending Network Mining
14. D. J. Watts and S. H. Strogatz. Collective dynamics of small-world networks.
Nature 393, 1998.
15. G. Bruinsma and W. Bernasco. Criminal groups and transnational illegal markets.
Crime, Law and Social Change, Vol. 41 No. 1, 2004.
16. A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in em-
pirical data. http://arxiv.org/abs/0706.1062v1, 2007.
17. E. Dijkstra. A note on two problems in connection with graphs. Numerische Math-
ematik, 1, 269271, 1959.
18. M. Girvan, and M. E. J. Newman. Community structure in social and biological
networks. Proc. Natl. Acad. Sci. USA 99, 7821-7826, 2002.
19. C. Palmer, P. Gibbons, and C. Faloutsos. ANF: A fast and scalable tool for data
mining in massive graphs. SIGKDD, 2002.
20. A. J. Reiss. Co-offending and criminal careers. Crime and Justice: A Review of
Research, 1988.
21. R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social net-
works. In KDD 06: Proceedings of the 12th ACM SIGKDD international conference
on Knowledge discovery and data mining, New York, NY, USA, 2006.
22. A.L. Barabasi, H. Jeonga, Z. Neda, E. Ravasz, A. Schubert, and T. Vicsek. Evo-
lution of the social network of scientific collaborations. Physica, 311:590614, 2002.
23. S. Wasserman and K. Faust. Social network analysis: methods and applications.
Cambridge University Press, 1994.
24. R. V. Hauck, H. Atabakhsh, P. Ongvasith, H. Gupta, H. Chen. Using Coplink to
analyze criminal-justice data. IEEE Computer, Vol. 35, No. 3: 3037, 2002.
25. S. Kaza , and H. Chen, Effect of inventor status on intraorganizational innovation
evolution. Hawaii International Conference on System Sciences (HICSS-42), Big
Island, HI, 2009.
26. L. C. Freeman. Visualizing Social Networks. Journal of Social Structure: vol. 1,
number 1, 2000.
27. J. M. McGloin, C. J. Sullivan, A. R. Piquero, and S. Bacon. Investigating the
stability of co-offending and co-offenders among a sample of youthful offenders.
Criminology 46 (1), 2008.
28. A. J. Reiss, and D. P. Farrington, Advancing knowledge about co-offending: Results
from a prospective longitudinal survey of London males. Journal of Criminal Law
and Criminology 82 (2), 1991.
29. S. Kaza, J. Xu, B. Marshall, and H. Chen. Topological Analysis of Criminal Activ-
ity Networks: Enhancing Transportation Security. IEEE Transactions on Intelligent
Transformation Systems, Volume 10, No. 1, 2009.
30. J. Leskovec, J. M. Kleinberg, and C. Faloutsos. Graph evolution: Densification and
shrinking diameters. ACM TKDD, 1(1):2, 2007.
31. J. Han, and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kauf-
mann, Second edition, 2006.
32. J. Chen, Osmar R. Zaine, and R. Goebel. Detecting Communities in Social Net-
works Using Max-Min Modularity. Proceedings of the SIAM International Confer-
ence on Data Mining, SDM 2009, Sparks, Nevada, USA, 2009.
33. D. Kempe, J. M. Kleinberg, and E. Tardos. Influential Nodes in a Diffusion Model
for Social Networks. Proceedings of Automata, Languages and Programming, 32nd
International Colloquium, ICALP 2005, Lisbon, Portugal, 2005.
34. D. Liben-Nowell, and J. M. Kleinberg. The Link Prediction Problem for Social
Networks. Proceedings of the Twelfth Annual ACM International Conference on
Information and Knowledge Management, CIKM 2003, November 2003.
Lecture Notes in Social Networks: Co-offending Network Mining 29
35. D. Chakrabarti, and C. Faloutsos. Graph mining: Laws, generators, and algorithms.
ACM Computing Surveys, (1): 2006.
36. A. Papachristos. Murder by structure: Dominance relations and the social structure
of gang homicide. American Journal of Sociology, 115, 74-128, 2009
37. H. Chen, K.J. Lynch. Automatic construction of networks of concepts character-
izing document databases. IEEE Trans. Syst. Man Cybernet. 22, 885-902, 2002.
38. N. Borisov. Privacy-Preserving Friends Troubleshooting Network. Proc. of the
ISOC Symposium on Network and Distributed System Security (SNDSS), 2005.
39. X. Liu, J. Bollen, M. L. Nelson, and H. V. de Sompel. Co-authorship networks in
the digital library research community. Information Processing and Management:
an International Journal, Vol 41 pp. 1462-1480, 2005.
40. M.N. Smith, P.J.H. King. Incrementally Visualising Criminal Networks, iv, pp.76,
Sixth International Conference on Information Visualisation (IV’02), 2002.
41. T.W. Valente. Social networks and health: Models, Methods and Applications.
Oxford University Press, 2010.
42. J.J. Xu, and H. Chen. Untangling Criminal Networks: A Case Study. ISI 2003 pp.
232-248, 2003.
43. J.J. Xu, H. Chen. CrimeNet Explorer: A Framework for Criminal Network Knowl-
edge Discovery. ACM Transactions on Information Systems, Vol 23 No 2. pp. 201-
226, 2005.
... Understanding how co-offending networks evolve over time is crucial for identifying the mechanisms which drive their formation. Apart from a few contributions (e.g., Sarnecki, 2001;Charette & Papachristos, 2017;Iwanski & Frank, 2013;Brantingham, Ester, Frank, Glässer, & Tayebi, 2011), the studies that have adopted a network approach to study co-offending have analysed static networks. Static networks are snapshots that aggregate cooffending relationships into a single network, regardless of when the crimes were executed (Faust & Tita, 2019). ...
Preprint
Full-text available
Aims: The study aims at improving our understanding of how co-offending networks evolve - i.e., networks connecting those who co-executed crimes. To this end, we tested four growth mechanisms that explain how social networks evolved on three co-offending networks observed in Bogotá (Colombia) between 2005 and 2018. These mechanisms are popularity (i.e., offenders decide to commit a crime with individuals that have previously co-offended with numerous accomplices); reciprocity (i.e., offenders recruit former recruiters to commit new crimes); reinforcement (i.e., offenders re-select former accomplices); and triadic closure (i.e., offenders commit new crimes with the accomplices of their accomplices). Data and Methods: We identified co-offending networks using records of closed and ongoing criminal investigations (N = 286,591) of adult offenders (N = 274,689) in Bogotá between 2005 and 2018. Three connected components with 4,286, 227, and 211 offenders were observed at the end of the study period. The evolution of these components was examined using temporal information in tandem with discrete choice models and simulations to understand the mechanisms that could explain how these components evolved. Results: The results show that the three co-offending networks evolved differently during the period of interest. Popularity yielded negative statistically significant coefficients for the largest network, suggesting that having more connections reduced the odds of connecting with incoming offenders in this network. Reciprocity and reinforcement yielded mixed results as we observed negative statistically significant coefficients in the network with 211 offenders and positive statistically significant coefficients in the one with 4,286 offenders. Triadic closure produced positive, statistically significant coefficients in all the networks. Conclusions: The results suggest that a combination of growth mechanisms might explain how co-offending networks grow, highlighting the importance of considering offenders' network-related characteristics when studying accomplice selection. They also show that popular co-offenders (i.e., those with multiple connections to other offenders) can not explain how co-offending networks evolve, reducing the importance attributed to frequent offenders when explaining how co-offending networks grow. Moreover, it appears that offenders' accomplices play an essential role in the evolution of co-offending networks. Besides adding evidence about triadic closure as a universal property of social networks, this result indicates that further analyses are needed to understand better how accomplices shape criminal careers.
... Researchers are increasingly placing co-offending at the forefront of the criminological research agenda (Bouchard & Konarski 2014;Brantingham, Ester, Frank, Glässer, Tayebi, 2011;Faust & Tita 2019;Iwanski & Frank 2014;McCarthy, Hagan, Cohen, 1998;Morselli, Grund, Boivin, 2015). Compared with solo offending, co-offending results in more interactions with the justice system (Andresen & Felson 2012) and produces greater harm to victims, property, and society at large (Carrington, 2002;Felson, 2003;Lantz, 2021). ...
Article
Full-text available
The current study aims to expand the geographic breadth of co-offending research by providing one of the first examinations of co-offending within Australia. We find co-offending was more common for some crimes than others. Individuals arrested for homicide had some of the highest co-offending rates and were more frequently observed in the core of the co-offending network. Females had higher rates of co-offending than males, and differences between sexes were most pronounced for sexual assault. However, females were underrepresented in the core of the network as compared to males. Lastly, co-offending declined with age, with the exception of drug offences for which co-offending was slightly more common among older age groups. Despite declines in co-offending overall, all age groups were equally represented in the network's core. Results emphasise the importance of disaggregating co-offending by crime type and examining co-offending across international contexts to better inform theory and policy. ARTICLE HISTORY
... There have been many attempts to describe, understand, predict or control the dynamics and spread of conflict and gang-related violence: from literature-based approaches 17 , data-mining-based network inference 18,19 , reaction-diffusion equations 16 , to combined methods 20,21 . Understanding temporal and spatial evolution of homicides in a metropolitan area is of utmost importance to alleviate and diminish said violence. ...
Article
Full-text available
Homicide is without doubt one of Mexico’s most important security problems, with data showing that this dismal kind of violence sky-rocketed shortly after the war on drugs was declared in 2007. Since then, violent war-like zones have appeared and disappeared throughout Mexico, causing unfathomable human, social and economic losses. One of the most emblematic of these zones is the Monterrey metropolitan area (MMA), a central scenario in the narco-war. Being an important metropolitan area in Mexico and a business hub, MMA has counted hundreds to thousands of casualties. In spite of several approaches being developed to understand and analyze crime in general, and homicide in particular, the lack of accurate spatio-temporal homicide data results in incomplete descriptions. In order to describe the manner in which violence has evolved and spread in time and space through the city, here we propose a network-based approach. For this purpose, we define a homicide network where nodes are geographical entities that are connected through spatial and temporal relationships. We analyzed the time series of homicides in different municipalities and neighborhoods of the MMA, to observe whether or not a global correlation appeared. We studied the spatial correlation between neighborhoods where homicides took place, to observe whether distance is a factor of influence in the frequency of homicides. We constructed yearly co-occurrence networks, by correlating neighborhoods with homicides happening within a same week, and counting the co-occurrences of these neighborhood pairs in 1 year. We also constructed a crime network by aggregating all data of homicides, eliminating the temporal correlation, in order to observe whether homicide clusters appeared, and what those clusters were distributed geographically. Finally, we correlated the location and frequency of homicides with roads, freeways and highways, to observe if a trend in the homicidal violence appeared. Our network approach in the homicide evolution of MMA allows us to identify that (1) analyzing the whole 86-month period, we observed a correlation between close cities, which decreases in distant places. (2) at neighborhood level, correlations are not distance-dependent, on the contrary, highest co-occurrences appeared between distant neighborhoods and a polygon formed by close neighborhoods in downtown Monterrey. Moreover, (3) An elevated number of homicides occur close to the 85th freeway, which connects MMA with the US border. (4) Some socioeconomic barriers determine the presence of homicide violence. Finally, (5) we show a relation between homicidal crime and the urban landscape by studying the distance of safe and violent neighborhoods to the closest highway and by studying the evolution of highway and crime distance over the cartel-related years and the following period. With this approach, we are able to describe the spatial and temporal evolution of homicidal crime in a metropolitan area.
... There have been many attempts to describe, understand, predict or control the dynamics and spread of conflict and gang-related violence: from literature-based approaches [3], data-mining-based network inference [12,13], reactiondiffusion equations [11], to combined methods [14,15]. Understanding temporal and spatial evolution of homicides in a metropolitan area is of utmost importance to alleviate and diminish said violence. ...
Preprint
Full-text available
Homicide is without doubt one of Mexico's most important security problems, with data showing that this dismal kind of violence sky-rocketed shortly after the war on drugs was declared in 2007. Since then, violent war-like zones have appeared and disappeared throughout Mexico, causing unfathomable human, social and economic losses. One of the most emblematic of these zones is the city of Monterrey, a central scenario in the narco-war. To better understand the underlying mechanisms by which violence has evolved and spread through the city, here we propose a network-based approach. For this purpose, we define a homicide network where nodes are geographical entities that are connected through spatial proximity and crime similarity. Data is taken from a crime database spanning 86 months in the Monterrey metropolitan area, containing manually curated geo-located and dated homicides, as well as from Open Street Map for urban environment. Under this approach, we first identify independent crime sectors corresponding to different connected components. Each of these clusters of crime presents crime evolution similar to the one at state and national levels. We then show how crime spread from neighborhood to adjacent neighborhoods when violence was mainly cartel-related and how it was chiefly static at a different time. Finally, we show a relation between homicidal crime and urban landscape by studying the distance of safe and violent neighborhoods to the closest highway and by studying the evolution of highway and crime distance over the cartel-related years and the following period. With this approach, we are able to describe more accurately the evolution of homicidal crime in a metropolitan area.
... Intuitively, such tools analyse the topology of a given network in order to predict the connections that are most likely to form in the future 5 . These tools can also be used to analyse the observed network topology to identify connections that are hidden from the observer, either due to data scarcity, or due to the deliberate concealment of information 6 . Link prediction has numerous applications, ranging from providing recommendations in e-commerce 7 , through discovering the interactions between proteins in biological networks 8 , to finding hidden connections between terrorists 9 or criminals 10 . ...
Article
Full-text available
Our private connections can be exposed by link prediction algorithms. To date, this threat has only been addressed from the perspective of a central authority, completely neglecting the possibility that members of the social network can themselves mitigate such threats. We fill this gap by studying how an individual can rewire her own network neighborhood to hide her sensitive relationships. We prove that the optimization problem faced by such an individual is NP-complete, meaning that any attempt to identify an optimal way to hide one’s relationships is futile. Based on this, we shift our attention towards developing effective, albeit not optimal, heuristics that are readily-applicable by users of existing social media platforms to conceal any connections they deem sensitive. Our empirical evaluation reveals that it is more beneficial to focus on “unfriending” carefully-chosen individuals rather than befriending new ones. In fact, by avoiding communication with just 5 individuals, it is possible for one to hide some of her relationships in a massive, real-life telecommunication network, consisting of 829,725 phone calls between 248,763 individuals. Our analysis also shows that link prediction algorithms are more susceptible to manipulation in smaller and denser networks. Evaluating the error vs. attack tolerance of link prediction algorithms reveals that rewiring connections randomly may end up exposing one’s sensitive relationships, highlighting the importance of the strategic aspect. In an age where personal relationships continue to leave digital traces, our results empower the general public to proactively protect their private relationships.
... Intuitively, based on the current network topology, this problem involves predicting the connections that are most likely to form in the future [30]. An alternative interpretation of this problem is to identify the connections that are hidden from an observer, either due to data scarcity, or due to the deliberate concealment of information [8]. Link prediction has numerous applications, from providing recommendations to customers in e-commerce [13], through discovering the interactions between proteins in biological networks [10], to finding hidden connections between terrorists [2] or criminals [43]. ...
Preprint
Link prediction is one of the fundamental research problems in network analysis. Intuitively, it involves identifying the edges that are most likely to be added to a given network, or the edges that appear to be missing from the network when in fact they are present. Various algorithms have been proposed to solve this problem over the past decades. For all their benefits, such algorithms raise serious privacy concerns, as they could be used to expose a connection between two individuals who wish to keep their relationship private. With this in mind, we investigate the ability of such individuals to evade link prediction algorithms. More precisely, we study their ability to strategically alter their connections so as to increase the probability that some of their connections remain unidentified by link prediction algorithms. We formalize this question as an optimization problem, and prove that finding an optimal solution is NP-complete. Despite this hardness, we show that the situation is not bleak in practice. In particular, we propose two heuristics that can easily be applied by members of the general public on existing social media. We demonstrate the effectiveness of those heuristics on a wide variety of networks and against a plethora of link prediction algorithms.
Chapter
Crime is a purposive deviant behavior that is an integrated result of different social, economical, and environmental factors (Boba, Crime analysis and crime mapping. Sage, Thousand Oaks, 2013). Crime imposes a substantial cost on society at individual, community, and national levels (McCollister et al, Drug Alcohol Depend 108(1):98–109, 2010). Criminality worldwide makes trillions of dollars yearly, turning crime into one of the world’s “top 20 economies” (http:// www. cbc. ca/ news/ world/ crime-one-of-world-s-top-20-economies-un-says-1. 1186042, 2012). Based on the most recent report (Easton et al, www. fraserinstitute. org/ uploadedFiles/ fraser-ca/ Content/ research-news/ research/ publications/ cost-of-crime-in-canada-2014. pdf, 2014), the total cost of crime in Canada during 2012 is estimated as $81.5 billion, approximately 5.7 % of national income. Given such whopping costs, crime reduction and prevention strategies have become a top priority for law enforcement agencies.
Chapter
Police departments have long used crime data analysis to assess the past, but the recent advances in the field of data science have introduced a new paradigm, called predictive policing which aims to predict the future. Predictive policing as a multidisciplinary approach brings together data mining and criminological theories which leads to crime reduction and prevention. Predictive policing is based on the idea that while some crime is random, the majority of it is not. In predictive policing crime patterns are learnt from historical data to predict future crimes.
Chapter
Co-offending networks are generally extracted from police recorded crime data. For doing so, we need to have a clear view of crime data. In this chapter, we first introduce a unified formal model of crime data as a semantic framework for defining in an unambiguous way the meaning of co-offending networks at an abstract level. Then, we introduce a real-world crime dataset, referred to as BC crime dataset which is used in this book, and the BC co-offending network which is extracted from this dataset. The BC crime dataset represents 5 years of police arrest-data for the regions of the Province of British Columbia which are policed by the RCMP, comprising several million data records.
Article
Social Networks and Health provides a comprehensive introduction to how social networks influence health behaviors. Section one provides an introduction to major research themes and perspectives used to understand how networks form, evolve, and channel the spread of ideas and behaviors. An intellectual history of the field is provided as well as conjectures on why network science took so long to develop. Methodologies for studying networks and assessing personal network data are discussed. Section two covers algorithms and applications of the most common network metrics divided into four chapters: centrality, groups, positions, and network level. For each chapter, descriptions of how the metrics are calculated and how they influence health behavior are presented. Section three reviews applications of social network analysis to health behaviors. The actor-oriented stochastic evolution model is presented first which provides a way to statistically test network evolution properties. Diffusion of innovations models are presented next which describe how networks influence the spread of ideas and practices within and between communities. Network interventions are also presented and a typology describing network interventions and evidence from empirical studies presented. This book enables researchers to understand how network data are collected and processed; and how to calculate appropriate metrics and models used to understand network influences on health behavior. Simple examples and data are presented throughout so researchers can adopt this methodology and perspective in their own investigations. Examples of health behaviors include smoking, substance use, contraception, HIV/AIDS, obesity, and many others.
Book
The present work presents the most current research from mathematicians and computer scientists from around the world to develop strategies for counterterrorism and homeland security to the broader public. New mathematical and computational technique concepts are applied to counterterrorism and computer security problems. Topics covered include strategies for disrupting terrorist cells, border penetration and security, terrorist cell formation and growth, data analysis of terrorist activity, terrorism deterrence strategies, information security, emergency response and planning. Since 2001, tremendous amounts of information have been gathered regarding terrorist cells and individuals potentially planning future attacks. This book addresses this need to develop new countermeasures. Interested readers include researchers, policy makers, politicians, and the members of intelligence and law enforcement agencies.
How do real graphs evolve over time? What are normal growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs , identifying properties in a single snapshot of a large network or in a very small number of snapshots; these include heavy tails for in- and out-degree distributions, communities, small-world phenomena, and others. However, given the lack of information about network evolution over long periods, it has been hard to convert these findings into statements about trends over time. Here we study a wide range of real graphs, and we observe some surprising phenomena. First, most of these graphs densify over time with the number of edges growing superlinearly in the number of nodes. Second, the average distance between nodes often shrinks over time in contrast to the conventional wisdom that such distance parameters should increase slowly as a function of the number of nodes (like O (log n ) or O (log(log n )). Existing graph generation models do not exhibit these types of behavior even at a qualitative level. We provide a new graph generator, based on a forest fire spreading process that has a simple, intuitive justification, requires very few parameters (like the flammability of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study. We also notice that the forest fire model exhibits a sharp transition between sparse graphs and graphs that are densifying. Graphs with decreasing distance between the nodes are generated around this transition point. Last, we analyze the connection between the temporal evolution of the degree distribution and densification of a graph. We find that the two are fundamentally related. We also observe that real networks exhibit this type of relation between densification and the degree distribution.