Available via license: CC BY-NC-ND 4.0
Content may be subject to copyright.
Digital Object Identifier 10.1109/ACCESS.2023.DOI
Scalable Semi-supervised Graph Learning
Techniques for Anti Money Laundering
Rezaul Karim1,2, Felix Hermsen2, Sisay Chala2,1, Paola de Perthuis3,4, and Avikarsha Mandal2,1
1Information Systems and Databases, RWTH Aachen University, Germany
2Department of Data Science and Artificial Intelligence, Fraunhofer FIT, Germany
3École Normale Supérieure (ENS) of Paris, France
4Cosmian, Paris, France
E-mail: rezaul.karim@rwth-aachen.de
ABSTRACT Money laundering is the process where criminals move large sums of illicit money to
hidden locations and disguise them as legal funds using financial services. The United Nations (UN)
estimates 2 to 5% of global GDP, which is approx. 0.8 to 2.0 trillion dollars are laundered globally
every year. Therefore, accurate identification of such globally alarming activities is crucial to enforce
anti-money laundering (AML) measures. To date numerous techniques have been proposed to
detect money laundering from a graph of money transfers between bank accounts by analyzing the
structural and behavioural dynamics of its dense subgraphs, thereby not taking into consideration
that money laundering involves high-volume flows of funds through chains of bank accounts. Some
other approaches model the transactions in the form of multipartite graphs to detect the complete
flow of money from source to destination. However, most of the existing methods result in lower
detection accuracy or higher computational cost, making them less reliable and practical for real
financial systems. As a consequence, the current AML approaches can only prevent and detect a
fraction of money laundering activities could be prevented. In this paper, we propose an efficient
approach to AML by employing semi-supervised graph learning techniques on a large-scale financial
transactional graph in both pipeline and end-to-end settings to identify nodes that may be involved
in potential money laundering. We evaluated our approach on four datasets: AMLSim, Elliptic,
IBM AML, and SynthAML with a view to its scalability and practicality for real financial systems.
Further, we provide global (e.g., what factors contribute more in money laundering scenarios) and
local (e.g., how money gets laundered between nodes) explanations to improve the interpretability
of the AML model. Experimental results suggest that our approach is scalable as well as effective
at spotting money laundering in both real and synthetic transaction graphs.
INDEX TERMS Anti-money laundering, Machine learning on graphs, Graph embeddings.
I. INTRODUCTION1
Money laundering is a globally challenging economic2
concern. The UN Vienna 1988 Convention describes3
it as “the conversion or transfer of property, knowing4
that such property is derived from any offence (s), to5
conceal or disguise the illicit origin of the property or of6
assisting any person who is involved in such offence (s)7
to evade the legal consequences of his actions.”. Being8
a global issue, money laundering results in approxi-9
mately 0.8to 2.0$ trillion being laundered every year,10
which equates to 2 to 5% of the world’s GDP [1], [2].11
Criminals hide the sources of illegal money with real12
estate purchases or by inflating legitimate invoices [3].13
As shown in fig. 1, money laundering involves three14
main stages: placement, layering, and integration. Place- 1
ment is introducing illicit funds into financial systems. 2
Layering is conducting complex transactions to obscure 3
the origins of the funds. Integration is withdrawing the 4
proceeds from destination accounts and utilizing them 5
for lawful purposes. Money laundering can be modelled 6
by two major topologies: in the first topology, there is 7
one source account, one destination account, and many 8
middle accounts that form a vertical chain. First, the 9
source splits the money among the middle accounts, 10
which then send the money to the destination account. 11
In the second topology, there is one source account, 12
one destination account, and a horizontal sequence of 13
middle accounts. Each middle account transfers the 14
VOLUME XXXX, 2020 1
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3383784
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Karim and Mandal et al. Semi-supervised Graph Learning Techniques for Anti Money Laundering
whole amount of money to the next account in the1
sequence until it reaches the destination account.2
Based on such topologies, a money transaction graph3
can be created by representing a single account as a4
vertex and a transaction between two accounts as an5
edge. Financial institutions are responsible for forensic6
analysis and complying with regulatory standards such7
as know your customer (KYC), transaction monitoring,8
suspending suspicious accounts, and submitting sus-9
picious activities reports (SARs) to law enforcement10
agencies1. Preventing criminals from moving such illicit11
funds through financial systems is known as anti-money12
laundering (AML). Since failure to AML compliance13
may impose severe penalties, financial organizations14
must identify illicit activities reliably. For this, rigorous15
analysis is carried out using sophisticated techniques to16
determine whether a SAR needs to be filed or whether17
the account in question should be suspended.18
However, the current AML approaches face several19
limitations and challenges. One of the main challenges20
is the lack of access to real financial transaction data,21
which is highly restricted due to proprietary and privacy22
reasons [2]. First, access to real financial transaction23
data is highly restricted due to both proprietary and24
privacy reasons [2]. AML efforts suffer from a lack25
of real data (or at least, realistic synthetic data that26
mimics AML scenarios), labels as well as annotations27
of important attributes of real data [2]. This hinders28
the development and evaluation of AML methods, as29
they cannot be tested on realistic scenarios and data.30
The second challenge is the scarcity and the imbalance31
of labelled data, e.g., as the normal transactions and32
accounts vastly outnumber the illicit ones. A concrete33
example of two parties sharing the same IP address34
on a transaction graph does not necessarily mean they35
are physically connected since their IP addresses might36
be associated with a mobile network. However, highly37
skewed labelled data exhibit limited adaptability and38
unreliable results. The third challenge is that the ground39
truth of money laundering is hard to obtain and verify.40
This leaves most of the real-life data noisy and un-41
labeled or sparsely labelled. However, the majority of42
the existing methods do money laundering detection in43
a supervised manner. This reduces the reliability and44
adaptability of the supervised learning methods, as they45
may suffer from overfitting or underfitting problems.46
A fourth challenge is the complexity and the scala-47
bility of the graph model, which can contain billions48
of nodes and edges2. This requires efficient and scal-49
able methods that can handle large-scale graphs and50
extract meaningful and discriminative features from51
them. The fifth challenge is that criminals often mask52
the true nature of their transactions using complicated53
1For example, the global framework for anti-money launder-
ing (AML) is regulated by the Financial Action Task Force.
2e.g., mappings of billions of edges between millions of entities.
account layering or multi-hop transactions, leaving the 1
identification of money laundering a complex problem. 2
Our empirical study outlines that most existing AML 3
approaches yield either lower detection accuracy or are 4
inscalable to large graphs, making them less reliable and 5
impractical for real-life financial systems. Further, most 6
of the existing methods are either too simple or too 7
complex, resulting in lower detection accuracy or higher 8
computational cost, making them less suitable and prac- 9
tical for real-life financial systems. As a consequence, the 10
current AML approaches can only prevent and detect a 11
tiny fraction of money laundering activities, leaving a 12
huge gap between the actual and the reported cases. 13
On a potentially positive note datasets such as Ellip- 14
tic, IBM AML, AMLSim [5], and AMLworld [2] have 15
been produced that build multi-agent virtual worlds, 16
where some of the agents are criminals with illicit 17
income to launder. These are a great addition to tackle 18
data scarcity issues and to advance AML research. 19
Our empirical investigation suggests that one possible 20
solution to deal with noisy data could be using the 21
attention mechanism in the graph convolutional kernel 22
and controlling the message passing based on the type of 23
connection between these two parties and their (node) 24
profiles. For labelled data scarcity, graph structural 25
information can help significantly boost the performance 26
of an AML model without much dependence on labels. 27
On a similar note, Schlichtkrull et al. [6] show that node 28
features are important, as domain knowledge of a node 29
in a KG can be formulated as node features for the graph 30
model, and rich node features are likely to boost the 31
model performance. Our study found that node features 32
could be important attributes and rich node features are 33
likely to boost the model performance. 34
Inspired by the successes of recent graph-based ap- 35
proaches, the potential of topological node features, 36
and the drawbacks of existing approaches, we employ 37
semi-supervised learning techniques on large transaction 38
graphs to detect money laundering. We hypothesize that 39
similar to network analysis that involves predictions over 40
edges, nodes in a transactional graph displaying distinct 41
characteristics from regular nodes can be classified as 42
potential money launders. Our semi-supervised learning 43
approach involves using topological node features and 44
annotations from alerts to embed graph nodes into a 45
lower-dimensional vector space. The embeddings, along 46
with other features are then used to train tree-based 47
ensemble classifiers such as random forest (RF), extreme 48
gradient boosted trees (XGBoost), and light gradient 49
boosted machine (LightGBM) that predict the suspi- 50
ciousness of a target node in potential money laundering 51
activities based on its direct or indirect connections to 52
nodes that are known to be suspicious. The overall con- 53
tributions of this paper can be summarized as follows: 54
•We address a globally challenging economic concern 55
- money laundering with a view to its criticality and 56
2VOLUME 11, 2023
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3383784
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Karim and Mandal et al. Semi-supervised Graph Learning Techniques for Anti Money Laundering
FIGURE 1: Main stages in money laundering activities (source: recreated based on [3])
FIGURE 2: Two common topologies of money laundering networks (recreated based on literature [4])
VOLUME 11, 2023 3
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3383784
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Karim and Mandal et al. Semi-supervised Graph Learning Techniques for Anti Money Laundering
importance towards deploying scalable and robust1
AML models into real financial systems.2
•We employ state-of-the-art (SOTA) semi-supervised3
learning techniques on transaction graphs, where4
both spatial and temporal information, together5
with topological graph node features, are modelled6
by combining the power of graph-based represen-7
tation learning and tree-based ensemble models.8
Overall, our approach, to best-of-our-knowledge, is9
a “first-of-its-kind” approach in which both pipeline10
and end-to-end approaches are considered.11
•Unlike existing approaches that evaluated their12
studies on a limited number of datasets, our ap-13
proach considers evaluating the AML capability14
on several real and synthetic datasets, including15
AMLSim, Elliptic, IBM AML, and SynthAML, with16
a view to scalability and efficiency.17
•We provide comprehensive evaluations of our ap-18
proach, both quantitative and qualitatively.19
•To improve the interpretability, we provide20
global (e.g., what factors contribute more in money21
laundering scenarios) and local (e.g., how money22
gets laundered between nodes) explanations.23
•We provide guidelines on how an AML model can be24
deployed and integrated into real financial systems25
such as banks. Besides, we provide several outlooks26
for network security and financial crime analysts if27
they could employ semi-supervised graph learning28
on large-scale transaction graphs for effective iden-29
tification of potential money laundering cases.30
•We are in the process of making available Python31
notebooks and codes that will help researchers32
reproduce the result interactively or extend the im-33
plementation by changing the network architectures34
or customising their datasets.35
The rest of the paper is structured as follows: Sec-36
tion II critically reviews some related works. Section III37
describes our proposed approach in detail. Section IV re-38
ports some experiment results, including a comparative39
analysis with baseline models. Section V summarizes40
this research with potential limitations and points to41
some possible outlooks before concluding the paper.42
II. RELATED WORK43
Numerous approaches have been proposed to accurately44
identify money-laundering activities [7]. One of the45
earliest and most common money laundering methods46
includes the rule engine and the decision tree-based47
models and predominantly relied on rule-based classi-48
fication, as shown in fig. 3. Another earlier approach to49
AML employs transaction monitoring systems that use50
predominantly rules-based thresholding protocols that51
are tuned to the volume and velocity of transactions and52
employ tiered escalation procedures. Rules are a set of53
logical expressions designed by human domain experts54
to target a particular fraud problem. Rajput et al. [8]55
developed an ontology-based expert system to detect 1
suspicious transactions. Although rules are efficient for 2
simple fraud detection, they are inefficient and do not 3
scale in complicated fraud or unknown fraud cases – 4
especially for large-scale graphs involving many nodes. 5
Moreover, rule-based algorithms are easy to evade by 6
fraudsters or weak against adversarial attacks [7]. 7
FIGURE 3: General workflow of rule-based suspicious
activities reports (SARs) alerting
Other approaches from a graph of money transfers 8
between accounts use a variety of methods ranging 9
from simple logistic regression (LR), support vector 10
machines (SVM), RF, and multilayer perceptron (MLP) 11
to more sophisticated approaches based on GNNs [9]. 12
Several approaches consider structural and behavioural 13
dynamics of dense subgraph detection [4], [7], [10]. 14
Michalak et al. [10] used fuzzy matching to capture 15
subgraphs that are more likely to contain suspicious 16
accounts involved in fraudulent activities. The approach 17
proposed by Soltani et al. [4] finds structural similarity- 18
based pairs of transactions with common attributes and 19
behaviours that potentially involve money laundering. 20
Money laundering often involves high-volume flows 21
of funds through chains of bank accounts between en- 22
tities [7]. From a transaction graph, many existing ap- 23
proaches, therefore, attempted to detect money launder- 24
ing by employing structural and behavioural dynamics 25
of dense subgraph detection, thereby not taking into con- 26
sideration the fact that money laundering involves high- 27
volume flows of funds through chains of bank accounts. 28
4VOLUME 11, 2023
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3383784
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Karim and Mandal et al. Semi-supervised Graph Learning Techniques for Anti Money Laundering
Some other approaches tried to assess if the capital flow1
is involved in money laundering activities using radial2
basis function (RBF) [7]. However, methods that do3
not perform flow tracking may yield lower detection4
accuracy and cannot provide theoretical guarantees. The5
reason is that the flow across multiple nodes is impor-6
tant for accuracy and robustness against camouflage7
in money laundering activities [11]. Thus, some recent8
approaches tried to model the transactions in the form9
of multipartite graphs to detect the complete flow of10
money from source to destination in an unsupervised11
manner. FlowScope [7] is a recent flow-based approach,12
which attempted to detect money laundering behaviour13
by identifying the chains of transactions w.r.t flows.14
Graph analytics techniques offer powerful represen-15
tations for financial transaction data, whereas graph-16
based data representations resemble the connectivity of17
underlying data objects. Graphs offer powerful represen-18
tations for financial transaction data, whereas graph-19
based data representations resemble the connectivity20
of the underlying data objects. Unlike approaches that21
consider structural and behavioural dynamics of dense22
subgraphs, graph-based approaches, such as graph neu-23
ral networks (GNNs) [9] have benefited from their24
representation learning capabilities (e.g., graph embed-25
ding (GE) techniques) from additional graph features.26
These structures take into account both the numerical27
attributes of graph nodes as well as the edges connecting28
them explicitly. This makes GNNs an effective means to29
extract complex patterns of interactions between them.30
This can result in an additional accuracy lift incorpo-31
rating graph features into traditional ML models [6].32
Subsequently, graph analytics techniques on graphs have33
emerged as an increasingly effective means for AML.34
Graph convolutional networks (GCN) [9] are applied35
for financial crime detection in transaction networks36
involving money laundering detection [5], [12], phishing37
detection on the Ethereum blockchain [13], detection of38
fraudulent transactions [14], and for detecting patterns39
associated with financial crimes [15]. In particular, in40
a recent study, Altman et al. [2] have outlined that41
a lower illicit ratio and a longer period of launder-42
ing patterns scenarios pose greater challenges in terms43
of true detection of money laundering because. They44
showed that GNN models can recognize laundering45
patterns in extremely imbalanced multi-graph datasets.46
Another GNN architecture called graph substructure47
network (GSN) [16] is proposed that can take advantage48
of pre-calculated subgraph pattern counts to improve49
the expressivity of GNNs. Further, since large transac-50
tional graphs contain billions of nodes and edges, ap-51
proaches like GraphSAGE [17] and FastGCN [18] exhibit52
both efficiency and scalability when it comes to graph53
representation learning [18]. Since transaction graphs54
are often large, scalability is crucial. Therefore, recent55
approaches, e.g., FastGCN achieved high accuracy on56
large benchmark datasets while outperforming GCN and 1
GraphSAGE by up to two orders of magnitude [5]. 2
In low-labelled data scenarios, unsupervised tech- 3
niques can learn low-dimensional representations of 4
nodes by leveraging graph structures and features. Se- 5
mantic Web (SW) technologies address data variety and 6
offer a unifying data model by which transaction data 7
can be mapped in a graph structure called knowledge 8
graphs (KGs) [19]. A KG can be defined as G=9
{E, R, T }, where Gis a labelled and directed multi- 10
graph, and E, R, T are sets of entities, relations, and 11
triples, respectively. A triple in Gcan be formalized as 12
(h, r, t)∈T, where h∈Eis the head node, t∈E13
is the tail node and r∈Ris the edge connecting 14
hand t(i.e., relation rholds between hand t) [20]. 15
Nodes in a KG represent entities and edges represent 16
binary relations between entities. The effectiveness of 17
a GE model depends on its capability to learn useful 18
representations of the nodes that play a significant role 19
in any downstream learning tasks such as link prediction. 20
A GE technique aims to embed entities and relations 21
in a KG into a low-dimensional dense feature space 22
while preserving its properties [21]. GE models involve 23
three steps: entity and relation representation, scoring 24
function definition, and learning entity and relation rep- 25
resentation [21]. Translation embedding methods, e.g., 26
TransD and TransE [22] create embeddings by represent- 27
ing relations as translations from a head entity to a tail 28
entity [23]. Embeddings are optimized w.r.t proximity 29
measure hLr≈tto preserve the relationship in the 30
graph. The resulting embeddings of entities and relations 31
provide denser representations of the domain, making 32
them suitable for a variety of downstream tasks. How- 33
ever, most translation embedding methods have limited 34
capacity in modelling complex relations [22]. To address 35
these shortcomings, GraphSAGE [17] is proposed, which 36
learns embeddings of unlabeled nodes by utilizing the 37
graph structure and node features. GraphSAGE can 38
extract embeddings of unseen nodes without requiring 39
retraining. Unlike Node2Vec which learns a lookup table 40
of node embeddings, GraphSAGE learns a function 41
that generates embeddings by sampling and aggregating 42
attributes from each node’s local neighbourhood and 43
combining those with node’s attributes [17]. 44
Transaction graph data often have a complex tempo- 45
ral dependency, where historical transactions have an 46
impact on current transactions, e.g., transactions have 47
complex spatial correlation [24]. However, the majority 48
of GE models take into consideration only spatial in- 49
formation thereby ignoring temporal information, even 50
though each transaction has an associated timestamp. 51
Spatio-temporal variants of GNN [25] is employed in 52
several applications starting from predictive learning 53
in urban computing [26] to money laundering fraud 54
detection [24]. EvolveGCN [27] is proposed to extract 55
node embeddings by integrating both spatial and tem- 56
VOLUME 11, 2023 5
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3383784
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Karim and Mandal et al. Semi-supervised Graph Learning Techniques for Anti Money Laundering
FIGURE 4: General workflows of using GNNs for anomaly detection in financial transactional graphs
poral information. EvolveGCN uses a recurrent neural1
network (RNN) to evolve the parameters of GCN along2
the temporal axis, providing flexibility for modelling3
temporal data without relying on node embeddings.4
Attention-based architectures that are employed by5
leveraging attention mechanisms also show potential6
when dealing with arbitrarily structured graphs. Fol-7
lowing a self-attention strategy, hidden representations8
of each node in the graph are computed by attending9
to its neighbours. One particular approach is called dy-10
namic graph transformer (DGT) [28]. It consists of two11
modules: tansformer module that captures cross-domain12
knowledge using attention mechanism, and the pooling13
that generates informative node embeddings using the14
final attention layer. Further, attention architectures15
have several interesting properties. For example, the16
attention mechanism is efficient as it is parallelizable17
across node-neighbour pairs, it can be applied to graph18
nodes having different degrees by specifying arbitrary19
weights to the neighbours, and the underlying model can20
be applied to inductive learning problems [2]. Further,21
attention-based approaches are generalizable to unseen22
graphs. Subsequently, they are for node classification [2].23
III. METHODS24
We employ semi-supervised graph learning techniques25
on transaction graphs to identify nodes involved in po-26
tential money laundering. We employ both pipeline and27
end-to-end approaches. For the former, an embedding28
model is first trained to generate node embeddings that29
are used, along with additional node features, to train30
binary classifiers along with other local graph features.31
For the latter, the node classification is performed in an32
end-to-end setting, without having to train a classifier.33
A. PROBLEM FORMULATION34
We employ semi-supervised learning that uses annota-35
tions from alert data and embeds graph nodes into a36
lower-dimensional vector space, which are used to train 1
a binary classifier to predict the suspiciousness of a node 2
w.r.t its direct or indirect connections to nodes that 3
are known to be suspicious. To apply semi-supervised 4
learning, we randomly remove a certain percentage of 5
nodes from the graph along with all edges connected to 6
them. Next, we train a GE model on the remaining sub- 7
graph. During inference, we generate embeddings for the 8
removed nodes using the trained GE model that is used 9
to predict the labels of the held-out nodes once they are 10
re-inserted back into the network. 11
Let a graph G= (V, E )where Vrepresents accounts 12
and Erepresents transfers. We split Vinto three sets: 13
Xand Y, which contain the outer accounts with net 14
transfers into and out of the bank, respectively, and W,15
which contains the inner accounts of the bank. For any 16
vi, vj∈Vwhere (i, j)∈E,eij denotes the amount 17
of money transferred from account vito vj. Given this 18
setup, we can represent a directed KG as a set of triplet 19
facts (h, r, t)∈Fsuch that G= (V, E, F )denotes a link 20
r∈Rfrom the head h∈Vto the tail t∈V. Let Γ21
be the embedding model that maps each node viof the 22
graph to a vector vi∈Rd, where dis the dimension 23
of the embeddings and Nis the number of nodes. This 24
embedding Γcaptures the information of the graph and 25
is used to generate a set of vectors
Vfor all nodes. It is 26
to be noted that depending on the embedding method Γ27
and embedding dimension, different embedding vectors 28
can be generated for the entities. The task is then 29
training a classifier fon
Vto predict if a node is of 30
suspiciousness, where the prediction ˆyifor embedding 31
vector vifor the ith node can be defined as follows: 32
ˆyi=f(vi) = 1,if flagged, e.g., SAR or illicit
0,otherwise. (1)
6VOLUME 11, 2023
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3383784
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Karim and Mandal et al. Semi-supervised Graph Learning Techniques for Anti Money Laundering
B. GENERATING DIRECTED GRAPHS1
We generate a directed transactional graph from2
transactions, alerts (e.g., transactions containing flags3
SARs/illicitness), and party datasets, followed by pre-4
processing and encoding categorical features. Then, we5
generate nodes and edges that are used to form a6
directed graph. Finally, we annotate the nodes with7
alert datasets or additional features (i.e., labelled SARs8
indicating whether a node was involved in any of the9
known money laundering schemes or illicit).10
C. GRAPH EMBEDDINGS11
Since ML classifiers require fixed-length vectors as in-12
put, we employ different unsupervised graph represen-13
tation learning techniques such as Node2vec, Attri2Vec,14
GraphSAGE, and DGT models to generate node em-15
beddings. They represent the neighbourhood of a node16
and their relations to the neighbouring nodes. Using17
Node2Vec, a corpus of text Cis generated by performing18
uniform random walks starting from each entity in the19
graph [29]. Then, Cof edge-labelled random walks are20
used as the input for learning embeddings of each node21
using skip-gram (SG)-based Word2vec [30] model. From22
a given a sequence of facts (w1, w2, ..., wn)∈ C, SG aims23
to maximize the average log probability Lpaccording to24
the context within the fixed-size window [30]:25
Lp=1
N
N
X
n=1 X
−c≤j≤c,j=0
log p(wn+j|wn),(2)
where cis a context. Negative sampling is used to set26
p(wn+j|wn)by replacing log p(wO|wI)with a function27
to discriminate the target (wo)and by drawing kwords28
from a noise distribution Pn(w)as follows [30]:29
log σv′⊤
wOvwI+
k
X
i=1
Ewi∼Pn(w)log σ−v′⊤
wivwI.
(3)
The embedding of a concept coccurring in corpus C30
is the vector vsin eq. (3) derived by maximizing eq. (2).31
Both Word2vec and Node2Vec algorithms follow a 2-step32
representation learning technique. Step 1 involves the33
use of second-order random walks to generate sentences34
from a graph, where a sentence is a list of node IDs. A35
corpus (the set of all sentences) is then used to learn an36
embedding vector for each node in step 2. Each node ID37
is considered a unique token in a dictionary having a size38
of number of nodes Nin G. Attri2Vec [31] is trained to39
learn node representations with non-linear mapping on40
node content attributes. To capture structural similar-41
ity in learned node representations, Attri2Vec employs42
DeepWalk to make nodes sharing similar random walk43
context nodes represented closely in the subspace. For44
each (target, context) node pair (vi, vj)from random45
walks, Attri2Vec learns the representation vifor the46
target node viby using it to predict the existence of 1
context node vjusing a three-layer network. The rep- 2
resentation of a node viin the hidden layer is obtained 3
by multiplying its raw feature vector in the input layer 4
with the input-to-hidden weight matrix Win [31]. 5
For a large set of “positive” (target, context) node 6
pairs from random walks and an equally large set of 7
“negative” node pairs randomly selected from Gaccord- 8
ing to a certain distribution, GraphSAGE learns a bi- 9
nary classifier that predicts whether arbitrary node pairs 10
are likely to co-occur in a random walk on graph [17]. 11
GE models we consider so far take into consideration 12
only spatial/structural information, thereby ignoring 13
temporal information. We learn knowledge from such a 14
dynamic graph, with the hypothesis that the AML could 15
benefit from it since DGT can capture both spatial and 16
temporal information simultaneously [28]. 17
Let node vt
1and vt
2be involved in a transfer at time 18
t, where their common connections had multiple trans- 19
actions in previous timestamps. This temporal relation 20
can be modelled as ut−1
1−ut−1
2and ut−2
1−ut−2
2. To 21
extract spatial-temporal knowledge, node encodings are 22
aggregated within a substructure node set into node em- 23
beddings. Attention is applied to exchange information 24
across nodes. An attention layer is represented as [28]: 25
H(l)= att H(l−1) = softmax Q(l)K(l)⊤
√dV(l),(4)
where H(l)and H(l−1) is the output embedding for 26
the land (l−1)th layer, respectively; dis the dimen- 27
sion of node embedding, att signifies the self-attention 28
operation; Q(l),K(l),V(l)∈R(τ(k+2))×dare query-, key- 29
, and value matrices for feature transformation and 30
information exchange represented as [28]: 31
Q(l)=H(l−1)W(l)
Q,
K(l)=H(l−1)W(l)
K,
V(l)=H(l−1)W(l)
V,
(5)
where W(l)
Q,W(l)
K,W(l)
V∈Rd×dare the learnable 32
parameter matrices of the l-th attention layer. In an 33
attention layer, Q(l)and K(l)calculate the contributions 34
of different nodes’ embeddings, while V(l)projects the 35
input into a new feature space that is combined as of 36
eq. (4) to acquire the output embedding of each node by 37
aggregating the information of all nodes adaptively [28]. 38
Input to transformer H(0) represents an encoding matrix 39
of the target edge Xet
tgtby setting d=denc. The 40
output of the final attention layer H(L)is extracted as 41
the output node embedding matrix ˜
Z, where each row 42
represents a node embedding vector. 43
D. COMPUTING GLOBAL TOPOLOGICAL FEATURES 44
Global topological features in a directed graph such 45
as shortest paths, centrality, communities, in-flow/out- 46
VOLUME 11, 2023 7
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3383784
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Karim and Mandal et al. Semi-supervised Graph Learning Techniques for Anti Money Laundering
FIGURE 5: Workflow of pipeline methods for identifying money laundering nodes
FIGURE 6: Workflow of end-to-end approach based on GCN for identifying money laundering nodes
flow, and triangle counts may provide useful signals,1
especially communities within a graph, where suspicious2
parties within the same community might be more likely3
to be globally suspicious too. Such features may help4
ML models perform better in financial crime detection.5
We employ the Dijkstra shortest-path algorithm as the6
means to calculate how close is a party of interest to7
a party that has exhibited suspicious behaviour in the8
past. For a directed graph G, the shortest path from a9
node uto another node vis computed as [32]:10
d(u) = min
v∈N+(u)d(v) + w(u, v),(6)
where d(u)is the shortest distance from the source11
node to node u,N+(u)is the set of out-neighbors of12
node u, and w(u, v)is the weight of the edge from13
node uto node v. We employ the PageRank as the14
centrality algorithm which measures the influence of a15
specific graph node on other nodes. We hypothesize that16
page ranks of individual nodes can be a useful feature17
for the classifiers – especially when combined with the18
information if the influential parties were known to be 1
suspicious for known or past money laundering. 2
For a directed graph G, the PageRank of a node is 3
calculated as the probability of a random surfer landing 4
on that node after following a series of links [33]: 5
P R(u) = 1−d
N+dX
v∈N−(u)
P R(v)
k+
v
,(7)
where P R(u)is the PageRank of node u,dis the 6
damping factor, Nis the number of nodes in the graph, 7
N−(u)is the set of in-neighbors of u, and k+
vis the out- 8
degree of node v. We employ weakly connected com- 9
ponents (WCCs) for the community detection. WCss 10
are used to determine groups of nodes sharing common 11
characteristics or heavily interacting with each other 12
within G. The set of WCCs for Gis computed as [32]: 13
W CC (G) = C⊆V(G)| ∀u, v ∈C, u ⇝v∨v⇝u,
(8)
where W CC (G)is the set of WCCs and V(G)is the 14
set of vertices for G, and u⇝vsignify the existence of 15
8VOLUME 11, 2023
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3383784
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Karim and Mandal et al. Semi-supervised Graph Learning Techniques for Anti Money Laundering
directed path from uto v. The underlying properties of1
a community such as its size, the number of transactions2
within the community, how many parties within the3
community have had an SAR filed on them, and who4
are the most influential parties within the community5
are calculated as additional features for the classifiers.6
In a directed graph G, a triangle is formed by a7
directed cycle of length three, and each vertex in the8
cycle has a reciprocal edge with another vertex in the9
cycle. Triangles can help detect anomalies or outliers in10
G, e.g., if a node has a much higher or lower triangle11
count than its neighbours, it might indicate that it is12
behaving abnormally or differently from the rest G[34]:13
T(G) = 1
3X
u∈V(G)
T(u),(9)
where T(G)is the total number of triangles in G,14
V(G)is the set of vertices, and T(u)is the number of15
triangles that pass through vertex u. Term T(u)in the16
above formula is calculated as follows [34]:17
T(u) = X
v∈N+(u)
d↔(v),(10)
where N+(u)is the set of out-neighbors of uand18
d↔(v)is the number of reciprocal edges incident to v.19
E. TRAINING OF CLASSIFIERS20
We train RF, LightGBM, and XGBoost ensemble mod-21
els on learned embeddings. Internal nodes in a decision22
tree (DT) represent feature values w.r.t boolean condi-23
tions and leaf nodes represent predicted labels. From a24
set of embedding vectors
V, a DT iteratively splits X∗
25
into multiple subsets w.r.t to threshold values of features26
at each node until each subset contains instances from27
one class only. Each branch in a DT represents a possible28
outcome, where the interaction between prediction ˆy∗
i
29
and feature x∗
iis ˆy∗
i=f(x∗
i) = PN
j=1 cjI{X∗
i∈Rj},Rj
30
is the subset of the data representing the combination of31
rules at each node, and I{.}is an identity function [35].32
In the case of tree ensemble models, the prediction33
function f(x∗)is the sum of individual feature contribu-34
tions plus the average contribution for the initial node35
for the dataset and Kpossible class labels that change36
along the prediction path w.r.t objective function (e.g.,37
gini impurity or entropy) causing the split [35]:38
f(x) = cfull +
M
X
k=1
σ(x, k),(11)
where cfull is the average of the entire training set X39
dataset (initial node), M is the total number of features.40
For graph-based node classification, which is technically41
predicting the label of a node uat time t, we follow the42
usual training procedures for EvolveGCN [27] and Fast-43
GCN [18], followed by the standard GCN-like approach:44
the activation function of the last graph convolution 1
layer is set to sigmoid so that hu
tis a probability vector 2
over two probable classes (i.e., licit and illicit). 3
IV. EXPERIMENTS 4
In this section, we report our evaluation results. 5
A. DATASETS 6
We evaluated our approach on four datasets: AMLSim, 7
Elliptic, IBM AML, and SynthAML [36]. The IBM AML 8
is a synthetic financial transaction dataset generated 9
using an agent-based generator, followed by calibrated 10
to match real transactions as closely as possible. This 11
dataset has 6 versions that are divided into two groups of 12
three: i) group HI has a relatively higher illicit ratio, and 13
ii) group LI has a relatively lower illicit ratio. Both HI 14
and LI internally have three sets of data: small, medium, 15
and large. All these datasets are independent, e.g. the 16
small datasets are not a subset of the medium ones. For 17
our study, only the HI-medium version is used with a 18
focus on node classification (given it has higher illicit 19
ratios), leaving the LI version for future study. The 20
SynthAML is also a synthetic dataset for benchmark- 21
ing statistical and machine learning (ML) methods for 22
AML. It employs Synthetic Data Vault (SDV) to tune 23
a probabilistic model with real data from Danish bank 24
Spar Nord with approx. 440,000 clients. 25
SynthAML contains 20,000 AML alerts and over 16 26
million transactions in two tables. The alert dataset 27
has an alert ID, the date the alert was raised, and 28
the outcome of the alert1. Each transaction has four 29
features: i) a transaction timestamp, ii) the transaction 30
entry (credit vs. debit), iii) the transaction type (card, 31
cash, international, or wire), and iv) the transaction 32
size (measured in log Danske Kroner and standardized 33
to have zero mean and unit variance). The AMLSim 34
dataset, which is generated with a multi-agent simu- 35
lation platform is tailored for an AML problem. Each 36
agent behaves as a bank account transferring money 37
to other agent accounts in which a few agents conduct 38
nefarious activity modelled on real-world patterns. We 39
generate a dynamic directed transaction graph con- 40
taining semi-realistic suspicious activities, based on the 41
following information and graph generation process: 42
•Accounts: whose transactions are monitored. 43
•Alerts: transactions that are frequently or periodi- 44
cally monitored and triggered alerts (illicit or SARs) 45
according to AML guidelines. 46
•Transactions: list of all transactions (both normal 47
and SARs) including sender and receiver accounts. 48
Each node represents an account that has an account 49
number, account type, owner name, and date/time cre- 50
ated. Nodes are designated with cash in,cash out,51
debit,payment,transfer or deposit activities and 52
1i.e. if the alert was reported to the authorities or dismissed.
VOLUME 11, 2023 9
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3383784
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Karim and Mandal et al. Semi-supervised Graph Learning Techniques for Anti Money Laundering
(a) Node centrality: larger node size signifies larger scores.
A central party having more transaction inflow
(in red) and outflow (in green) might be
suspicious behaviour
(b) Shortest path (in red) determines how close
a party is to a known suspicious party
(c) Communities within a graph, where suspicious parties
within the same community might be more
likely to be globally suspicious too
FIGURE 7: Additional features based on graph algorithms
are of either organization or individual types. Each1
edge has a transaction ID, amount, and timestamp.2
Data is sparsely labelled with flagged transactions (e.g.,3
transactions that violate volume and velocity rules) and4
SARs (e.g., transactions that are confirmed for suspi-5
ciousness). Elliptic dataset 1is a graph network of Bit-6
coin transactions with handcrafted features constructed7
using publicly available information. This anonymized8
data set is a transaction graph collected from the Bitcoin9
blockchain. The dataset maps Bitcoin transactions to10
real entities in two categories: licit and illicit [37]:11
•Licit: licit transactions contain usual exchanges,12
wallet providers, miners, licit services, etc.13
•Illicit: illicit transactions contain scams, malware,14
terrorists, ransomware, Ponzi schemes, etc.15
The graph contains 203,769 node transactions and16
234,355 directed edge payments flow out of which 2% are17
1https://www.kaggle.com/datasets/ellipticco/elliptic-data-set
illicit and 21% are licit. The remaining 77% of samples 1
are labelled as unknown transactions. Each node has 166 2
features associated, where the first 94 features represent 3
local information (i.e., time-step, number of inputs/out- 4
puts, transaction fee, output volume and aggregated 5
figures, e.g., average BTC received and spent by inputs 6
and outputs, the average number of incoming and out- 7
going transactions). The remaining 72 features repre- 8
sent aggregated features that are obtained using one- 9
hop backwards/forwards transaction information from 10
the centre node (i.e., min/max standard deviation and 11
correlation coefficients of neighbour transactions w.r.t 12
number of inputs/outputs, transaction fee, etc.). Time 13
steps are associated with each node, representing an 14
estimated time when the transaction is confirmed. 49 15
timesteps are evenly spaced with an interval of 2 weeks. 16
Using the SynthAML dataset, we generate a directed 17
transaction graph similar to the AMLSim dataset. Then, 18
similar to Jensen et al. [36], we focus on node classifi- 19
10 VOLUME 11, 2023
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3383784
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Karim and Mandal et al. Semi-supervised Graph Learning Techniques for Anti Money Laundering
cation based on synthetic data only, where we classify1
alerts based on their outcomes. As for the training set,2
alerts raised between January 1, 2020 and December3
31, 2020, were used. As for the test set, we used alerts4
raised between January 1, 2021 and December 31, 2021.5
However, unlike using statistical graph features like min,6
mean, median, max, standard deviation, counts, and7
sum per transaction type and entry for all transactions8
associated with each alert, which yields as many as 79
×2×4 = 56 features per alert. As for the IBM AML10
dataset, we split the transaction indices after ordering11
them w.r.t their timestamps similar to [2], where the12
data split is defined by two timestamps: t1and t2. Train13
set includes transactions before time t1, the validation14
set includes transactions between times t1and t2, and15
the test set includes transactions after t2. These yield16
three dynamic graphs at times t1, t2, and t3=tmax.17
B. EXPERIMENT SETTINGS18
First, each dataset is subset chronologically for training,19
validation, and testing (e.g., 60%, 20%, and 20% splits or20
temporal splits w.r.t associated timestamp). As for the21
pipeline methods, Node2Vec, Attri2Vec, GraphSAGE,22
and DGT models are trained to generate node embed-23
dings that are used to train the classifiers 1. As for24
the end-to-end approach, SkipGCN [9], EvolveGCN [27],25
and FastGCN [18]. As for SkipGCN and FastGCN, they26
are trained in batch to reduce training costs through27
neighbourhood sampling, while EvolveGCN is trained28
to capture the dynamism by evolving GCN parameters29
using the AdaGrad optimizer with varying learning rates30
and batch sizes. Schlichtkrull et al. [6] show that if31
there are more than three convolutional layers, node32
features will be over-smoothed, e.g., all the nodes on33
the graph look similar. Inspired by this, we configured34
GCN, FastGCN, and EvolveGCN to have three con-35
volutional layers. Further, since classes are imbalanced36
in both datasets, we trained SkipGCN, FastGCN, and37
EvolveGCN using weighted cross-entropy loss to provide38
higher importance to minority class (i.e., illicit/SARs).39
We used open-source StellarGraph library2for com-40
puting node embeddings, which is based on Biases-41
RandomWalk and Word2Vec from the Gensim library.42
Attri2Vec was trained using a skip-gram model with43
a window size of 5, a graph walk depth of 5, and 50044
walks per entity. DGT model was trained 3by varying45
the number of layers Lbetween [1,5]. While generating46
graph-based features, the damping factor for PageRank47
is set between [0.75 and 0.85]. For the sake of semi-48
supervised learning, we randomly removed 10-20% of the49
nodes from each experiment and trained the GE models50
on the reduced sub-graph. We generated embeddings for51
1GitHub: https://github.com/rezacsedu/graph-based-aml.
2https://github.com/stellargraph/stellargraph
3https://github.com/yuetan031/TADDY_pytorch
the removed nodes using the trained GE model during 1
inference and used these embeddings to predict the 2
labels of the originally held-out nodes after re-inserting 3
them back into the network. Since our semi-supervised 4
learning techniques use both labelled and unlabeled 5
data, a model can suffer from overfitting, especially 6
when the labelled data is scarce or noisy, or when 7
the unlabeled data is not representative of the target 8
distribution. To mitigate this, we employ two strategies: 9
aside from l1,l2, and dropout regularization, we employ 10
consistency regularization using Mean Teacher which 11
enforces the model to learn only smooth and robust 12
features during node embeddings. Second, we employ a 13
self-training technique to iteratively label the unlabeled 14
data using the model’s predictions, followed by adding 15
the most confident ones to the training set. 16
The open-source implementations were used to train 17
the RF4, LightGBM5, and XGBoost6ensemble clas- 18
sifiers. The best hyperparameters were selected via 19
random search and 5-fold cross-validation settings. We 20
evaluated the performance of each trained classifier w.r.t 21
area under the precision-recall curve (AUPR), Matthews 22
correlation coefficient (MCC), and F1 scores. 23
C. ANALYSIS OF NODE CLASSIFICATIONS 24
Table 1 summarizes the results of the prediction task 25
based on pipeline methods. As for the pipeline methods, 26
the combination of DGT and XGBoost outperforms all 27
other combinations across datasets. The results also re- 28
veal that each classifier performs worse when trained on 29
embeddings generated by the Node2Vec and Attri2Vec 30
models. Although we observed slightly different results 31
for the IBM AML and SynthAML datasets, the overall 32
trend is similar. When our pipeline approach was tested 33
on SynthAML [36] dataset, it turned out that the best 34
XGBoost classifiers significantly outperformed their best 35
classifiers RF and LightGBM, showing an F1-score of 36
0.726 against the same score of 0.64. Besides, the DGT + 37
XGBoost slightly outperformed the GNN-based models 38
when tested on the IBM AML HI-Medium version. 39
On the other hand, end-to-end methods outperform 40
every pipeline method, as marked in green in tables 41
table 2 and 3. The EvolveGCN model outperforms all 42
pipeline methods with tree-ensemble classifiers, indicat- 43
ing the effectiveness of end-to-end methods compared 44
to their pipeline counterparts. Moreover, EvolveGCN 45
consistently outperforms both Skip-GCN and FastGCN, 46
although the improvement is not very substantial. 47
When it comes to comparative analysis between 48
pipeline and end-to-end methods, the performance of 49
GraphSAGE and XGBoost is comparable to that of 50
DGT + XGBoost. However, when local and global 51
4https://scikit-learn.org/stable/modules/generated/sklearn.
ensemble.RandomForestClassifier.html/
5https://lightgbm.readthedocs.io/en/stable/
6https://XGBoost.readthedocs.io/
VOLUME 11, 2023 11
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3383784
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Karim and Mandal et al. Semi-supervised Graph Learning Techniques for Anti Money Laundering
TABLE 1: Node classification results for pipeline methods
AMLSim Elliptic IBM AML SynthAML
GE model Classifier AUPR F1 MCC AUPR F1 MCC AUPR F1 MCC AUPR F1 MCC
Node2Vec RF 0.751 0.760 0.651 0.806 0.817 0.622 0.637 0.647 0.529 0.661 0.643 0.547
LightGBM 0.753 0.760 0.649 0.800 0.813 0.623 0.640 0.651 0.528 0.662 0.640 0.536
XGBoost 0.806 0.801 0.752 0.885 0.874 0.673 0.642 0.649 0.531 0.668 0.649 0.552
Attri2Vec RF 0.775 0.782 0.669 0.821 0.832 0.665 0.647 0.656 0.546 0.668 0.641 0.545
LightGBM 0.769 0.791 0.657 0.823 0.832 0.659 0.651 0.657 0.547 0.669 0.643 0.542
XGBoost 0.806 0.801 0.752 0.902 0.894 0.673 0.652 0.658 0.551 0.671 0.652 0.549
GraphSAGE RF 0.802 0.804 0.675 0.891 0.882 0.778 0.667 0.668 0.567 0.683 0.692 0.563
LightGBM 0.794 0.809 0.685 0.883 0.874 0.750 0.669 0.671 0.568 0.682 0.688 0.557
XGBoost 0.815 0.816 0.701 0.912 0.905 0.782 0.669 0.671 0.569 0.694 0.687 0.576
DGT
RF 0.813 0.825 0.693 0.907 0.897 0.753 0.677 0.675 0.576 0.687 0.694 0.581
LightGBM 0.805 0.811 0.677 0.894 0.862 0.733 0.679 0.680 0.579 0.689 0.691 0.585
XGBoost 0.833 0.832 0.715 0.918 0.915 0.792 0.683 0.684 0.582 0.712 0.713 0.593
XGBoost∗0.852 0.846 0.727 0.925 0.918 0.802 0.725 0.7014 0.623 0.727 0.726 0.618
XGBoost∗∗ 0.864 0.859 0.738 0.939 0.932 0.814 0.736 0.712 0.632 0.738 0.737 0.627
topological features were added along with node embed-1
dings, we observed noticeable improvements in pipeline2
methods that we discuss in section IV-F.3
D. ANALYSIS OF NODE EMBEDDINGS4
One of the factors that we explored in our experiments5
was the embedding dimension, which is the number of6
features used to represent each node in the graph. We7
tested different values of dfor each GE model, ranging8
from 32,64,128,256,300. We then visualized the node9
embeddings in a low dimensional space using t-SNE, as10
shown in fig. 8. The plots reveal that there are clear11
differences between the embeddings of normal and illicit12
classes, meaning that the embeddings capture some13
information that can help identify fraudulent or SAR14
accounts/nodes. Moreover, we observe that the embed-15
dings generated by GraphSAGE and DGT models are16
more clustered and separated than those produced by17
Node2Vec and Attrib2Vec models. This indicates that18
GraphSAGE and DGT models learn more meaningful19
and discriminative representations of the nodes, which20
can benefit a classifier in detecting fraud and nodes.21
E. EFFECTS OF TEMPORAL INFORMATION22
In table 1, we present the results of applying the23
XGBoost model on all datasets. We compare the per-24
formance of the XGBoost model when it uses different25
types of node features as input. We show that the perfor-26
mance of the XGBoost model is improved when it uses27
the temporal information that is learned by the DGT28
model, which is highlighted in grey in the table. This29
indicates that the DGT model can capture the temporal30
dynamics of the graph and generate more informative31
and discriminative features for the nodes. The XGBoost32
model that uses the DGT features outperforms other33
pipeline methods that use static or semi-static features,34
such as Node2Vec, Attrib2Vec, or GraphSAGE.35
We also observe that the combination of GraphSAGE36
and XGBoost model achieves the best performance on37
the Elliptic dataset, which suggests that the Graph-38
SAGE model can leverage the node attributes and the39
graph structure to generate more expressive and relevant 1
features for the nodes. We attribute the success of these 2
dynamic models to the fact that the input features 3
contain useful information about the nodes and their 4
relationships and that the transformers can learn more 5
abstract and higher-level features that are crucial for 6
distinguishing between normal and illicit nodes. These 7
results also demonstrate that the representation learning 8
capability of these models is dependent on the quality of 9
the input features and that the learned representations 10
are reflected in the classification outcomes. 11
F. ANALYSIS OF COMBINED FEATURES’ EFFECTS 12
In this section, we analyse how the node embeddings 13
obtained from different models can be combined with 14
other types of features to enhance the node classifica- 15
tion performance. Pareja et al [27] showed that adding 16
aggregated information to the original inputs, such as 17
the node degree or the node label, can improve the F1 18
scores for anomaly detection tasks. Motivated by this 19
idea, we experimented with different combinations of 20
node embeddings, local node features, global topological 21
features, and original input space, and we retrained 22
individual classifiers on the extended feature space. We 23
show examples for the AMLSim dataset, where we used 24
the local node features and global topological features. 25
We compared the results with the baseline methods that 26
use only node embeddings or only original inputs as 27
features. Figure 9 illustrates the effects of combining 28
node embeddings and other features such as local graph 29
features, and topological graph features on all datasets: 30
AMLSim, Elliptic, IBM AML, and SynthAML. 31
We can observe that adding local node features and 32
global topological features to the node embeddings 33
could help slightly increase the classification accuracy, 34
making it similar to end-to-end methods that use 35
graph neural networks. Specifically, we can see that the 36
node classification accuracy with the DGT embeddings 37
+ local graph features + XGBoost combination im- 38
proved (marked in blue in table 1 and fig. 9a) by 1 to 2% 39
for all datasets compared to the baseline methods. The 40
12 VOLUME 11, 2023
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3383784
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Karim and Mandal et al. Semi-supervised Graph Learning Techniques for Anti Money Laundering
TABLE 2: Node classification results in end-to-end settings: AMLSim and Elliptic datasets
AMLSim Elliptic
Model AUPR F1 MCC AUPR F1 MCC
Skip-GCN 0.834 (0.792) 0.915 (0.875) 0.881 (0.763) 0.928 (0.793) 0.916 (0.873) 0.854 (0.763)
FastGCN 0.841 (0.804) 0.927 (0.890) 0.903 (0.781) 0.933 (0.805) 0.925 (0.881) 0.875 (0.781)
EvolveGCN 0.869 (0.813) 0.934 (0.902) 0.891 (0.773) 0.941 (0.813) 0.934 (0.891) 0.891 (0.773)
TABLE 3: Node classification results in end-to-end
settings: for IBM AML and SynthAML datasets
IBM AML SynthAML
Model AUPR F1 MCC AUPR F1 MCC
Skip-GCN 0.663 0.664 0.562 0.685 0.677 0.562
FastGCN 0.675 0.678 0.595 0.705 0.692 0.583
EvolveGCN 0.695 0.698 0.615 0.726 0.727 0.675
node classification accuracy further improved by 1.5%1
for all datasets with the most impactful features (among2
DGT embeddings, local node features, and topological3
features) + XGBoost combination (marked in green4
in table 1 and fig. 9b), making the node classification5
results comparable to EvolveGCN, which is the SOTA6
method for dynamic graphs. These findings suggest that7
the node embeddings learned by the dynamic models can8
be enriched by incorporating other features that capture9
the local and global properties of the nodes and that the10
extended feature space can help the classifiers distin-11
guish between normal and illicit nodes more effectively.12
We also observed for the IBM AML and SynthAML13
datasets that the encapsulation of node embeddings14
with the additional node features and full feature space,15
helped the pipeline methods outperform their end-to-16
end counterparts. Even though the micro averages for17
all pipeline approaches are above 0.93, they are not very18
informative for highly imbalanced datasets. Neverthe-19
less, in financial crime forensics, the minority illicit class20
is of primary interest. Therefore, we report the minority21
F1 scores for all datasets we experimented with.22
G. EXPLAINING MONEY LAUNDERING23
The black-box nature of a GNN or GE model1may raise24
concerns about transparency and accountability when25
an AML model is deployed in a real financial system.26
The latent factors learned by a GNN or GE model27
are not easily interpretable. Thus, predictions made by28
such a complex model cannot be traced back, making29
it unclear how or why they arrived at a certain out-30
come [38], [39]. On the other hand, disentangling them31
can provide insights into what features were captured by32
the representations and relevant for the tasks [39].33
Explainable artificial intelligence (XAI) aims to make34
AI systems more transparent and understandable to35
humans by interpreting how black-box models should36
make decisions [39]. An interpretable AML model can37
reveal the factors that impact (e.g., statistically signif-38
1Complex ensemble or DNN models oftentimes tend to be less
and less interpretable and may end up as black-box methods.
icant features) its outcomes and explain the interac- 1
tions among them [39], [40]. Thus, an interpretable ML 2
model that emphasizes transparency and traceability 3
of its logic can explain why and how it arrived at 4
certain decisions, reducing negative consequences. In 5
the context of our anti-money laundering scenario, local 6
interpretability can provide reasons for a decision made 7
for a specific party or reference to similar cases, allowing 8
the identification of unique characteristics of a party or 9
multiple parties in a small group like communities. In 10
contrast, global interpretability could show the overall 11
behaviour of an AML model at a high level. While local 12
explanations focus on explaining individual predictions, 13
global explanations explain entire model behaviour. 14
A transaction graph can be viewed as the dis- 15
crete symbolic representation of knowledge about dif- 16
ferent types of transactions. Therefore, we employed 17
graph-based explanation techniques such as GNNEx- 18
plainer [41] and SHAP to compute the contributions of 19
the neighbouring node types and edges when predicting 20
the suspiciousness or involvement of a targeted node/- 21
party with money laundering. To provide local explain- 22
ability, we highlighted some nodes that are flagged with 23
“alert types” such as gather scattered,scattered 24
gather, and cycles for the AMLSim dataset in fig. 10. 25
As shown in fig. 10a, gather scattered is when 26
launderers (node in colour) collect small amounts of 27
cash from various sources2and deposit them into a 28
single account or location. This reduces the risk of 29
detection by avoiding large cash transactions that may 30
raise suspicion. As shown in fig. 10b, scatter gathered 31
is when launderers (node in colour) transfer the gath- 32
ered money into multiple accounts across countries or 33
jurisdictions (e.g., offshore accounts, shell companies, 34
or foreign investments). This increases the complexity 35
and anonymity of the money trail, making it harder 36
to follow and recover. As shown in fig. 10c, cycles is 37
when launderers (node in colour) repeat the process of 38
gathering and scattering money multiple times, using 39
multiple intermediaries involving different currencies3.40
This further obscures the origin and destination of 41
the money and creates layers of transactions that can 42
confuse or mislead investigators. The feature impact plot 43
for the XGBoost classifier is depicted in fig. 10, showing 44
which features contributed most. 45
2Launderers may gather scattered cash from human traffick-
ing, drugs sales, or tax evasion and deposit into a bank account.
3They may cycle money through various banks or individuals
using wire transfers or cryptocurrencies.
VOLUME 11, 2023 13
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3383784
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Karim and Mandal et al. Semi-supervised Graph Learning Techniques for Anti Money Laundering
(a) Node2Vec (b) Attrib2Vec
(c) GraphSAGE (d) DGT
FIGURE 8: t-SNE projection of the node embeddings for the AMLSim dataset into lower dimensional space
As shown, node embeddings, followed by other derived1
and global topological features for individual nodes such2
as the shortest path to a node known as SAR, triangle3
counts, number of communities a node part of, centrality4
score, PageRank (learned from centrality measures),5
indegree, outdegree, account type (i.e., individual/pri-6
vate or organizational), transaction type (e.g., cash in-7
/out, transfer, debit, parent, deposit), and amount (e.g.,8
amount in $), are more important than local graph9
features. It is to be noted that some features such as10
account number, owner name, transaction ID, etc. are11
excluded from the original feature space as they do not12
carry meaningful information for GE or a classifier.13
H. AML MODELS IN REAL FINANCIAL SYSTEMS 1
Since financial fraudulent activities are getting rampant, 2
financial institutions should deploy accurate and robust 3
AML models to satisfy the regulators. Such AML models 4
are expected to exhibit fewer false positives (e.g., alerts 5
that eventually turn into real money laundering) and 6
fewer false negatives (e.g., real money laundering cases 7
that were not detected). False positives are more critical 8
since inaccurate detection will tend human experts to 9
flag too many transactions as illicit, leading to higher 10
human investigation costs (ref. fig. 3). Further, deploying 11
and inferencing money laundering in real-time based on 12
large-scale transaction graphs could be very challenging 13
for many reasons. For example, criminals often mask the 14
true nature of their transactions using complicated ac- 15
14 VOLUME 11, 2023
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3383784
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Karim and Mandal et al. Semi-supervised Graph Learning Techniques for Anti Money Laundering
(a) Embeddings + topological features + input features (b) Node embeddings + impactful features
FIGURE 9: Effects of combining node embeddings, local node features, topological graph features, and most
impactful features into original feature space
count layering or multi-hop transactions. This makes the1
identification of money laundering a complex problem.2
The more trainable parameters an AML model will3
have, the larger its size will be, making the deployment4
infeasible for devices with limited memory and com-5
puting, e.g., IoT devices [38]. Additionally, since real-6
time graph updating is a heavy operation in most cases,7
deploying large models even on cloud infra can lead to8
poor response times due to network latency. This is9
unacceptable for many real-time applications like our10
money laundering scenario [39]. One potential solution11
is employing real-time prediction in batches to reduce12
the overhead. Further, for a financial system, the simpler13
the AML model the better from an operational point of14
view. However, the simple model might not be able to15
capture all the useful signals and may not effectively16
capture true money laundering. We would argue on17
selecting the best model, which is efficient and light18
at the same time, yet yields both lower false positives19
and false negatives. There should be a balance between20
selecting the best model regarding its effectiveness for21
AML and its underlying deployment infrastructure.22
Nevertheless, since pipeline methods involving GE23
and classification take considerably longer time than24
their end-to-end counterparts, our empirical study rec-25
ommends that the end-to-end approach to AML may26
be more efficient in terms of accuracy and computation27
time. Based on these considerations, integrating an end-28
to-end AML model into a real financial system would be29
more convenient. Moreover, scalable model deployment30
and inferencing pipeline backed by GPU support ensure31
faster inferencing in a real-time setting.32
V. CONCLUSION AND OUTLOOK33
In this paper, we employ semi-supervised graph learning34
techniques on financial transaction graphs to detect35
potential money laundering activities, involving gather36
scattered, scatter gathered, and cycles. We trained37
Node2Vec, Attri2Vec, GraphSAGE, and DGT embed-38
ding models to embed graph nodes into a lower di- 1
mensional vector space. RF, XGBoost, and LightGBM 2
classifiers were then trained on embedding space to 3
predict the suspiciousness of a node being involved 4
in money laundering. Besides, we trained SkipGCN, 5
FastGCN, and EvolveGCN in the end-to-end setting 6
for the same. Our findings on several datasets show 7
that graph analytics can be an effective means for 8
identifying laundering transactions. The representation 9
learning power of GE and GNN-based models applied to 10
graphs tends to improve node classification accuracy. 11
Even though combining embeddings with local and 12
global node features significantly boosted the perfor- 13
mance of pipeline methods, end-to-end methods con- 14
sistently outperformed all pipeline methods, albeit the 15
XGBoost model benefited from temporal information 16
captured by different embedding models. Therefore, an 17
end-to-end approach to AML may not only be more 18
efficient w.r.t accuracy and computation time but also 19
integrating an end-to-end AML model into a financial 20
system would be more convenient. On the other hand, 21
since graph analytics to AML is relatively new research, 22
leaving much room for improvements. First, there are 23
many other techniques employed by criminals to hide 24
their illicit funds. Therefore, it is crucial to have effective 25
AML laws and regulations, as well as vigilant and 26
cooperative authorities to combat this global problem. 27
Second, our semi-supervised learning approach to pub- 28
licly available data is less sensitive, hence there is no risk 29
of exposing sensitive information. However, similar to 30
deep models that are vulnerable to adversarial attacks, 31
AML detection mechanisms are susceptible to attacks 32
too, e.g., an adversary having access to a model could 33
manipulate transactions to trick the AML detection 34
system into misclassifying illicit transactions as licit [42]. 35
Third, in real banking settings, money launderers 36
may take advantage of disclosing private transactions 37
across banks and splitting transferred amounts into 38
intermediaries. Differential privacy [43] could be a po- 39
VOLUME 11, 2023 15
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3383784
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Karim and Mandal et al. Semi-supervised Graph Learning Techniques for Anti Money Laundering
(a) Gather scatter (b) Scatter gather
(c) Cycle
FIGURE 10: Different (alter) types of transactions in the AMLSim graph
tential approach. Some attacks can be prevented by not1
disclosing the data1, by using cryptographic primitives,2
and by making available the model to trusted authorities3
only. Nevertheless, research [4] has outlined to incorpo-4
ration of effective differentially-private graph topology5
and model-sharing techniques in distributed banking6
settings. Further, a functional encryption schema by7
leveraging similarity calculations is investigated [45] to8
allow financial authorities on data in a federated learning9
setting in banks. In such a setting, the vector size would10
be as large as the number of existing bank accounts. In11
1For example, what if attackers have access to the model to
perform input reconstruction or membership inference [44].
contrast, using KGE methods that are compatible with 1
new privacy and data accessibility constraints, could 2
significantly reduce the size. In this regard, a variant 3
of the fast random projection described below could be 4
a potential idea to obtain transaction/edge embeddings: 5
1) Each node nin the subgraph and its direct neigh- 6
bours receives an initial random embedding en,07
from a public hash function on the node, under the 8
constraint that no output is null. 9
2) For each node nin the subgraph, denoting Nthe 10
set of its direct neighbours, and wi→jthe amounts 11
sent from account ito account j(i.e., the weight on 12
oriented edge from ito j, for any nodes i,j, and 13
16 VOLUME 11, 2023
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3383784
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Karim and Mandal et al. Semi-supervised Graph Learning Techniques for Anti Money Laundering
FIGURE 11: Explaining a money laundering example for the AMLSim dataset using the SHAP waterfall plot for
the XGBoost classifier. The bottom starts as the expected output value; each row shows how the positive
(red) or negative (blue) contribution of individual features pushes the value from the expected output
over the training set to model output. Assuming the true class label is illicit, the positive values
imply probabilities of >0.4that the party is truly involved in money laundering
setting that weight to 0 when the nodes are not1
connected), nreceives a new embedding: en,1:=2
Pm∈Nem,0·wm→n(as for outgoing transaction3
embeddings: en,1:= Pm∈Nem,0·wn→m).4
3) Vectors from step 2 are then normalized; each node5
nin subgraph receives embedding: en,3:= en,2
||en,2||2,6
where ||.||2is the euclidean norm and en,in is the7
embedding (resp. en,out for outgoing transactions).8
In future, we intend to introduce sophisticated attacks9
on the model, followed by exploring defence strategies.10
This will enable us to perform both inner-products11
and similarity calculations on embedding vectors, e.g.,12
σ(n, m) = ⟨en,in;em,in ⟩×⟨en,out;em,out ⟩by comparing13
nodes nfrom bank 1 and mfrom bank 2. In such a14
setting, nodes showing high similarity values should be15
labelled as suspicious accounts which might be part of16
the same money-laundering network, having the bulk of17
their transactions from and to the same neighbours.18
Fourth, with the rapid development of a cashless19
society engaged in global economic exchange, the ad-20
vent of cryptocurrency has catalyzed a paradigm shift21
in peer-to-peer transactions and extranational financial 1
governance. Cryptocurrencies not only impose great 2
challenges to AML but also increase difficulty across 3
cryptocurrency types. Another challenge is temporal 4
dynamics with the emergence/disappearance of new 5
entities in the blockchain. Weber et al. [5] have shown 6
that at timestep the market may appear to follow 7
Dark Market shutdown, where no models (including 8
EvolveGCN or DGT) would be able to capture such high 9
volatility and consequently may not perform well. 10
Fifth, since real-life data is more noisy and full of 11
uncertainties, more sophisticated methods need to be 12
developed to capture complex laundering patterns with 13
extremely low illicit ratios [2]. This could be the case 14
for synthetic data as well, where even efficient AML 15
models may face challenges regarding true detection of 16
money laundering in the presence of lower illicit ratios 17
and longer periods of laundering patterns [2]. A concrete 18
example outlined by Altman et al. [2] is the LI version of 19
IBM ML dataset. Therefore, we intend to focus on the LI 20
version of it as well, with a focus on node classification. 21
Sixth, it is hard to initialise the node features. 22
VOLUME 11, 2023 17
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3383784
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Karim and Mandal et al. Semi-supervised Graph Learning Techniques for Anti Money Laundering
Therefore, we would like to employ self-supervised pre-1
training for the sake of feature initialisation. We believe2
that above mentioned reactive and proactive measures3
would help improve both the representation learning4
capability and adversarial robustness of an AML model.5
Nevertheless, we hope the approach presented in this6
paper will make non-trivial contributions and give net-7
work security and financial crime analysis some insights8
into how to employ semi-supervised graph learning on9
large-scale transaction graphs for effective identification10
of potential money laundering cases.11
REFERENCES12
[1] R. Frumerie, “Money laundering detection using tree boosting13
and graph learning algorithms,” 2021.14
[2] E. Altman, B. Egressy, J. Blanuša, and K. Atasu, “Realistic syn-15
thetic financial transactions for anti-money laundering models,”16
arXiv preprint arXiv:2306.16424, 2023.17
[3] “Money laundering,” https://www.unodc.org/romena/en/18
money-laundering.html, accessed: Accessed: 2023-10-13.19
[4] R. Soltani, U. T. Nguyen, Y. Yang, M. Faghani, A. Yagoub, and20
A. An, “A new algorithm for money laundering detection based21
on structural similarity,” in 2016 IEEE 7th Annual Ubiquitous22
Computing, Electronics & Mobile Communication Conference23
(UEMCON). IEEE, 2016, pp. 1–7.24
[5] M. Weber, J. Chen, T. Suzumura, A. Pareja, and T. B. Schardl,25
“Scalable graph learning for anti-money laundering: A first26
look,” arXiv preprint arXiv:1812.00076, 2018.27
[6] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg,28
I. Titov, and M. Welling, “Modeling relational data with graph29
convolutional networks,” in The Semantic Web: 15th Interna-30
tional Conference, ESWC 2018, Heraklion, Crete, Greece, June31
3–7, 2018, Proceedings 15. Springer, 2018, pp. 593–607.32
[7] X. Li, S. Liu, Z. Li, X. Han, C. Shi, B. Hooi, H. Huang, and33
X. Cheng, “FlowScope: Spotting money laundering based on34
graphs,” in AAAI conference on artificial intelligence, vol. 34,35
no. 04, 2020, pp. 4731–4738.36
[8] Q. Rajput, N. S. Khan, A. Larik, and S. Haider, “Ontology based37
expert-system for suspicious transactions detection,” Computer38
and Information Science, vol. 7, no. 1, p. 103, 2014.39
[9] T. N. Kipf and M. Welling, “Semi-supervised classification with40
graph convolutional networks,” arXiv:1609.02907, 2016.41
[10] K. Michalak and J. Korczak, “Graph mining approach to42
suspicious transaction detection,” in Federated conference on43
information systems (FedCSIS). IEEE, 2011, pp. 69–75.44
[11] B. Hooi and C. Faloutsos, “Fraudar: Bounding graph fraud in the45
face of camouflage,” in ACM SIGKDD conference on knowledge46
discovery & data mining, 2016, pp. 895–904.47
[12] M. Cardoso, P. Saleiro, and P. Bizarro, “Laundrograph: Self-48
supervised graph representation learning for anti-money launder-49
ing,” in Proceedings of the Third ACM International Conference50
on AI in Finance, 2022, pp. 130–138.51
[13] H. Kanezashi, T. Suzumura, X. Liu, and T. Hirofuchi, “Ethereum52
fraud detection with heterogeneous graph neural networks,”53
arXiv preprint arXiv:2203.12363, 2022.54
[14] S. X. Rao, S. Zhang, Z. Han, Z. Zhang, W. Min, Z. Chen,55
Y. Shan, Y. Zhao, and C. Zhang, “xfraud: explainable fraud56
transaction detection,” arXiv preprint arXiv:2011.12193, 2020.57
[15] Z. Chen, L. Chen, S. Villar, and J. Bruna, “Can graph neural58
networks count substructures?” Advances in neural information59
processing systems, vol. 33, pp. 10 383–10 395, 2020.60
[16] G. Bouritsas, F. Frasca, S. Zafeiriou, and M. M. Bronstein,61
“Improving graph neural network expressivity via subgraph62
isomorphism counting,” IEEE Transactions on Pattern Analysis63
and Machine Intelligence, vol. 45, no. 1, pp. 657–668, 2022.64
[17] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representa-65
tion learning on large graphs,” NeuralIPS, vol. 30, 2017.66
[18] J. Chen, T. Ma, and C. Xiao, “FastGCN: fast learning67
with graph convolutional networks via importance sampling,”68
arXiv:1801.10247, 2018.69
[19] R. Karim, T. Islam, Beyan, D. Rebholz-Schuhmann, and 1
S. Decker, “Explainable AI for Bioinformatics: Methods, Tools, 2
and Applications,” Briefings in Bioinformatics, July 2023. 3
[20] A. Hogan, E. Blomqvist, M. Cochez, C. d’Amato, G. D. Melo, 4
C. Gutierrez, S. Kirrane, J. E. L. Gayo, R. Navigli, S. Neumaier 5
et al., “Knowledge graphs,” ACM Computing Surveys (CSUR), 6
vol. 54, no. 4, pp. 1–37, 2021. 7
[21] Y. Dai, S. Wang, N. N. Xiong, and W. Guo, “A survey 8
on knowledge graph embedding: Approaches, applications and 9
benchmarks,” Electronics, vol. 9, no. 5, p. 750, 2020. 10
[22] J. Feng, M. Huang, M. Wang, M. Zhou, Y. Hao, and X. Zhu, 11
“Knowledge graph embedding by flexible translation,” in Inter- 12
national Conference on Principles of Knowledge Representation 13
and Reasoning. AAAI, 2016, pp. 557–560. 14
[23] R. Karim, M. Cochez, M. Uddin, O. Beyan, and S. Decker, 15
“Drug-drug interaction prediction based on knowledge graph 16
embeddings and convolutional-LSTM network,” in ACM Inter- 17
national conference on bioinformatics, computational biology 18
and health informatics, 2019, pp. 113–123. 19
[24] P. Xia, Z. Ni, H. Xiao, X. Zhu, and P. Peng, “A novel spatiotem- 20
poral prediction approach based on graph convolution neural 21
networks and LSTM for money laundering fraud,” Arabian 22
Journal for Science and Engineering, pp. 1–17, 2021. 23
[25] Z. A. Sahili and M. Awad, “Spatio-temporal graph neural 24
networks: A survey,” arXiv preprint arXiv:2301.10569, 2023. 25
[26] G. Jin, Y. Liang, Y. Fang, Z. Shao, J. Huang, J. Zhang, and 26
Y. Zheng, “Spatio-temporal graph neural networks for predictive 27
learning in urban computing: A survey,” IEEE Transactions on 28
Knowledge and Data Engineering, 2023. 29
[27] A. Pareja, G. Domeniconi, J. Chen, T. Ma, T. Suzu- 30
mura, H. Kanezashi, T. Kaler, T. Schardl, and C. Leiserson, 31
“EvolveGCN: Evolving graph convolutional networks for dy- 32
namic graphs,” in Proceedings of the AAAI Conference on 33
Artificial Intelligence, vol. 34, no. 04, 2020, pp. 5363–5370. 34
[28] Y. Liu, S. Pan, Y. G. Wang, and V. C. Lee, “Anomaly detection 35
in dynamic graphs via transformer,” IEEE Transactions on 36
Knowledge and Data Engineering, 2021. 37
[29] M. Cochez, P. Ristoski, S. P. Ponzetto, and H. Paulheim, “Biased 38
graph walks for RDF graph embeddings,” in International Con- 39
ference on Web Intelligence, Mining and Semantics, ser. WIMS 40
’17. ACM, 2017. 41
[30] T. Mikolov, K. Chen, and J. Dean, “Efficient estimation of word 42
representations in vector space,” arXiv:1301.3781, 2013. 43
[31] D. Zhang, J. Yin, X. Zhu, and C. Zhang, “Attributed network 44
embedding via subspace discovery,” Data Mining and Knowledge 45
Discovery, vol. 33, no. 6, pp. 1953–1980, 2019. 46
[32] R. R. C. S. T. Cormen and C. Leiserson, “Introduction to 47
algorithms., volume 3rd ed,” The MIT Press, MA, USA, vol. 3, 48
no. 6, p. 7, 2009. 49
[33] S. Brin and L. Page, “The anatomy of a large-scale hypertextual 50
web search engine,” Computer networks and ISDN systems, 51
vol. 30, no. 1-7, pp. 107–117, 1998. 52
[34] S. Suri and S. Vassilvitskii, “Counting triangles and the curse 53
of the last reducer,” in Proceedings of the 20th international 54
conference on World wide web, 2011, pp. 607–614. 55
[35] F. Di Castro and E. Bertini, “Surrogate decision tree visualiza- 56
tion.” in IUI Workshops, 2019. 57
[36] R. I. T. Jensen, J. Ferwerda, K. S. Jørgensen, E. R. Jensen, 58
M. Borg, M. P. Krogh, J. B. Jensen, and A. Iosifidis, “A syn- 59
thetic data set to benchmark anti-money laundering methods,” 60
Scientific Data, vol. 10, no. 1, p. 661, 2023. 61
[37] M. Weber, G. Domeniconi, J. Chen, D. K. I. Weidele, C. Bellei, 62
T. Robinson, and C. E. Leiserson, “Anti-money laundering in 63
bitcoin: Experimenting with graph convolutional networks for 64
financial forensics,” arXiv preprint arXiv:1908.02591, 2019. 65
[38] M. R. Karim, M. Shajalal, A. Graß, T. Döhmen, S. A. Chala, 66
A. Boden, C. Beecks, and S. Decker, “Interpreting black-box 67
machine learning models for high dimensional datasets,” in 68
2023 IEEE 10th International Conference on Data Science and 69
Advanced Analytics (DSAA). IEEE, 2023, pp. 1–10. 70
[39] M. R. Karim, T. Islam, M. Shajalal, O. Beyan, C. Lange, 71
M. Cochez, D. Rebholz-Schuhmann, and S. Decker, “Explainable 72
ai for bioinformatics: Methods, tools and applications,” Briefings 73
in bioinformatics, vol. 24, no. 5, p. bbad236, 2023. 74
18 VOLUME 11, 2023
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3383784
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Karim and Mandal et al. Semi-supervised Graph Learning Techniques for Anti Money Laundering
[40] C. Molnar, Interpretable machine learning. Lulu. com, 2020.1
[41] Z. Ying, D. Bourgeois, J. You, M. Zitnik, and J. Leskovec, “Gn-2
nexplainer: Generating explanations for graph neural networks,”3
in Advances in NeuralIPS, 2019, pp. 9240–9251.4
[42] D. S. Berman, A. L. Buczak, and C. L. Corbett, “A survey of5
deep learning methods for cyber security,” Information, vol. 10,6
no. 4, p. 122, 2019.7
[43] M. Abadi, I. Goodfellow, K. Talwar, and L. Zhang, “Deep8
learning with differential privacy,” in ACM SIGSAC conference9
on computer and communications security, 2016, pp. 308–318.10
[44] H. Hu, Z. Salcic, P. S. Yu, and X. Zhang, “Membership infer-11
ence attacks on machine learning: A survey,” ACM Computing12
Surveys (CSUR), vol. 54, no. 11s, pp. 1–37, 2022.13
[45] P. de Perthuis and D. Pointcheval, “Two-client inner-product14
functional encryption with an application to money-laundering15
detection,” in ACM SIGSAC Conference on Computer and16
Communications Security, ser. CCS ’22. ACM, 2022, p. 725–737.17
MD. REZAUL KARIM is a Staff Data Scien-18
tist at ALDI SÜD - International Data &19
Analytics Services and Visiting Researcher20
at RWTH Aachen University Germany. Be-21
fore joining ALDI SÜD, he worked as a22
Senior Data Scientist at Fraunhofer FIT23
and a Postdoctoral Researcher at RWTH24
Aachen University, Germany. Previously, he25
worked as an ML Engineer at Insight Centre26
for Data Analytics, University of Galway,27
Ireland. Before that, he worked as a Lead Engineer at Samsung28
Electronics, South Korea. He received his PhD from RWTH29
Aachen University, Germany; an MSc. degree from Kyung Hee30
University, South Korea and a BSc. degree from the University31
of Dhaka, Bangladesh. His research interests include applied32
machine/deep learning, NLP and explainable AI (XAI).33
FELIX HERMSEN received the M.Sc. de-34
gree in Computer Science from the RWTH35
Aachen University of Germany in 2021.36
Since 2022, he has been working as a Re-37
searcher at the Fraunhofer Institute for38
Applied Information Technology FIT, Ger-39
many. At Fraunhofer FIT, he is currently40
in the research group Data Protection and41
Sovereignty. His research interests focus42
on privacy-preserving machine learning and43
next-generation marketplaces.44
SISAY ADUGNA CHALA is a Postdoctoral 1
Researcher at Fraunhofer FIT as well as the 2
RWTH Aachen University, Germany. He is 3
currently the deputy head of the Intelligent 4
Data Analytics Group in the Department 5
of Data Science and AI at Fraunhofer FIT. 6
He obtained his PhD in Computer Science 7
from the University of Siegen where he was 8
a Marie Curie ITN Fellow working on unsu- 9
pervised feature learning from textual data 10
applied in data-intensive personalized vacancy recommendation. 11
Earlier, he worked on research on Statistical Machine Translation 12
at the German Research Center for Artificial Intelligence (DFKI). 13
His research interests include machine translation, data mining 14
and machine learning, information retrieval, and the application 15
of AI in various problem domains. 16
PAOLA DE PERTHUIS is a PhD Researcher 17
in Cryptography at the École Normale 18
Supérieure (ENS) of Paris and Cosmian, 19
France. Previously, she worked on crypto- 20
graphic systems for the detection of money 21
laundering networks, to which she intends 22
to add simplified embedding techniques 23
compatible with their security models. 24
25
AVIKARSHA MANDAL received the M.Sc. 26
degree in Information and Computer Sci- 27
ences from the University of Luxembourg, 28
Luxembourg in 2012 and the PhD degree in 29
Applied Cryptography from the University 30
of Mannheim, Germany in 2020. Since 2019, 31
he has been working as a Senior Researcher 32
at the Fraunhofer Institute for Applied In- 33
formation Technology FIT, Germany. At 34
Fraunhofer FIT, he is currently the Head 35
of the research group Data Protection and Sovereignty. Before 36
joining Fraunhofer FIT, he worked as a Research Assistant in IT 37
Security at the Offenburg University of Applied Sciences, Germany 38
from 2013 to 2019. His research interests focus on improving data 39
privacy and security in data-driven applications across different 40
domains such as cybersecurity, energy and blockchain. 41
42
43
VOLUME 11, 2023 19
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3383784
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/