ArticlePDF Available

Expanding and Interpreting Financial Statement Fraud Detection Using Supply Chain Knowledge Graphs

Authors:

Abstract

The relationships within a supply chain are crucial for analyzing business transactions and can reveal significant patterns in disclosed financial data. These relationships also aid in the assessment and detection of financial fraud. Recent studies employing graph neural networks (GNNs) have demonstrated enhanced detection capabilities by integrating corporate financial features with supply chain relationships, surpassing traditional methods that rely solely on financial features. However, these studies face notable limitations: (1) they do not model enterprise associations across consecutive years, hindering the detection of long-term financial fraud, and (2) they lack efficacy in interpretive analyses of supply chain relationships to uncover patterns of fraud or risk transfer. To address these gaps, this paper introduces an interpretable and efficient Heterogeneous Graph Convolutional Network (ieHGCN) designed to analyze supply chain knowledge graphs. It also extends the model’s learning scope to multi-year financial data for detecting fraud. The experimental results indicate that our method, offering both extensibility and interpretability, significantly outperforms existing machine learning and GNN approaches in continuous multi-year fraud detection, achieving the highest AUC of 0.7498, a 3.8% improvement over the leading method. Furthermore, meta-path analysis reveals that companies sharing the same supplier exhibit correlated financial fraud risks and that fraud can propagate through the supply chain, providing novel insights into anti-fraud and risk management strategies through enhanced interpretability.
Academic Editor: Steve Worthington
Received: 12 July 2024
Revised: 2 September 2024
Accepted: 20 January 2025
Published: 10 February 2025
Citation: Zhu, S.; Ma, T.; Wu, H.; Ren,
J.; He, D.; Li, Y.; Ge, R. Expanding and
Interpreting Financial Statement
Fraud Detection Using Supply Chain
Knowledge Graphs. J. Theor. Appl.
Electron. Commer. Res. 2025,20, 26.
https://doi.org/10.3390/jtaer20010026
Copyright: © 2025 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license
(https://creativecommons.org/
licenses/by/4.0/).
Article
Expanding and Interpreting Financial Statement Fraud Detection
Using Supply Chain Knowledge Graphs
Shanshan Zhu 1,†, Tengyun Ma 2,†, Haotian Wu 3, Jifan Ren 1,* , Daojing He 2,* , Yubin Li 4and Rui Ge 5
1School of Economics and Management, Harbin Institute of Technology, Shenzhen 518055, China;
23b357002@stu.hit.edu.cn
2School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China;
tengyunma@stu.hit.edu.cn
3Elite Engineers School, Harbin Institute of Technology, Harbin 150001, China; 23b936025@stu.hit.edu.cn
4Shenzhen Humanities & Social Sciences Key Research Base for Big Data Accounting and Decision-Making
Research Center, Harbin Institute of Technology, Shenzhen 518055, China; liyubin@hit.edu.cn
5Shenzhen Audencia Financial Technology Institute, Shenzhen University, Shenzhen 518055, China;
gerui@szu.edu.cn
*Correspondence: renjifan@hit.edu.cn (J.R.); hedaojinghit@163.com (D.H.)
These authors contributed equally to this work.
Abstract: The relationships within a supply chain are crucial for analyzing business trans-
actions and can reveal significant patterns in disclosed financial data. These relationships
also aid in the assessment and detection of financial fraud. Recent studies employing graph
neural networks (GNNs) have demonstrated enhanced detection capabilities by integrating
corporate financial features with supply chain relationships, surpassing traditional meth-
ods that rely solely on financial features. However, these studies face notable limitations:
(1) they do not model enterprise associations across consecutive years, hindering the detec-
tion of long-term financial fraud, and (2) they lack efficacy in interpretive analyses of supply
chain relationships to uncover patterns of fraud or risk transfer. To address these gaps,
this paper introduces an interpretable and efficient Heterogeneous Graph Convolutional
Network (ieHGCN) designed to analyze supply chain knowledge graphs. It also extends
the model’s learning scope to multi-year financial data for detecting fraud. The experi-
mental results indicate that our method, offering both extensibility and interpretability,
significantly outperforms existing machine learning and GNN approaches in continuous
multi-year fraud detection, achieving the highest AUC of 0.7498, a 3.8% improvement over
the leading method. Furthermore, meta-path analysis reveals that companies sharing the
same supplier exhibit correlated financial fraud risks and that fraud can propagate through
the supply chain, providing novel insights into anti-fraud and risk management strategies
through enhanced interpretability.
Keywords: financial statement; fraud detection; graph neural network; supply chain
relationship; knowledge graph
1. Introduction
Financial fraud, as a persistent issue within capital markets, has a profound negative
impact on businesses, shareholders, the market, and the overall economic environment.
Among all types of fraud, financial statement fraud incurs the highest losses. According to
the ACFE’s Occupational Fraud 2022: A Report to the Nations [
1
], the median loss from
financial statement fraud globally in 2022 was USD 593,000. Such fraudulent activities
inflict substantial damage on the operations of the companies involved and entail significant
J. Theor. Appl. Electron. Commer. Res. 2025,20, 26 https://doi.org/10.3390/jtaer20010026
J. Theor. Appl. Electron. Commer. Res. 2025,20, 26 2 of 19
risks to the entire capital market. However, current measures to combat financial statement
fraud remain considerably limited, as this type of fraud typically exhibits high levels of
concealment. Publicly available financial data are often cautiously doctored, and fraudulent
practices frequently evolve and intensify over time, making it difficult to summarize specific
patterns of fraud [2,3].
In recent years, artificial intelligence approaches have been widely introduced into
financial fraud detection, enhancing the effectiveness of detection and reducing the time
required to discover fraud [
4
6
]. Most of these methods are based on quantified finan-
cial data, such as financial ratios, and employ machine learning techniques to identify
potential problematic features within financial statements, thereby detecting fraud [
4
,
7
,
8
].
These studies typically employ machine learning methods such as Logistic Regression (LR),
Support Vector Machines (SVM), or Decision Trees (DT). They perform cross-comparisons
using classification metrics, including the area under the curve (AUC), recall, and accu-
racy. These methods can be rapidly trained, without constraints on the temporal range
and sample selection, and possess extensive applicability. However, machine learning
methods are limited to tabular data and struggle to extract other features of companies.
Some approaches also extract the full textual data from financial statements and employ
natural language processing to encode and mine the implicit fraudulent features within
the
text [2,5,9,10]
. However, most financial statement analysis is limited to independent
samples from each company, making it challenging to use associative information about
the company, such as its transactional activities, to provide more effective assistance.
The transactional information of a company constitutes its supply chain relationships.
In the web of supply chains in the market, companies cooperate and interact with other
entities. Through the continuous flow of capital and products, they shape their operational
status, which is ultimately reflected in the financial data they disclose in statements. Analyz-
ing supply chain relationships is key to uncovering financial fraud: firstly, the transactional
activities themselves may hide fraudulent features [
11
,
12
]. Second, inconsistencies between
transactional activities and financial data can indicate the intent of a company to conceal
fraud [
13
]. Furthermore, aggregation of financial data from different companies can capture
the global features within supply chain relationships, providing a wealth of information
for fraud detection [14,15].
With the rapid development and globalization of companies, their supply chain re-
lationships have grown increasingly complex. Although existing research can analyze
relational patterns on smaller subgraphs of supply chains using statistical methods and
supply chain expert knowledge [
16
,
17
], there is a lack of methods capable of analyzing
large-scale supply chain relationship graphs. With the expansion of supply chain rela-
tionships, research should not be restricted to suppliers and customers directly related
to a company. Understanding financial risk transmission patterns at multiple tiers can
unearth more useful fraud-related features [
18
20
]. However, few existing methods an-
alyze supply chain relationships beyond two tiers. Prior research has proposed using
graph neural network (GNN) approaches to integrate financial features with supply chain
relationships [
3
], achieving a basic level of multi-level supply chain analysis across the
entire graph. However, this method is limited by a short temporal window, overlooking the
possibility that companies commit continuous fraud and evolve their fraudulent methods
over time. In addition, this approach lacks a degree of interpretability, which is insufficient
to discover the patterns of financial fraud transmission within supply chains.
To address these issues, this paper combines upstream and downstream supply chain
relationships with companies’ financial characteristics. We use multi-year historical data
to conduct financial fraud detection. Specifically, we adapt an interpretable and efficient
Heterogeneous Graph Convolutional Network (ieHGCN) [
21
] to analyze the functionality
J. Theor. Appl. Electron. Commer. Res. 2025,20, 26 3 of 19
of supply chain graph networks. First, we construct supply chain relationship graphs
based on the supplier–customer relationships of companies. These graphs allow for the
comparative analysis of financial information between companies through their supply
chain connections. Second, we use ieHGCN to aggregate financial ratio features of different
companies. The method assigns attention weights to connections between nodes based
on global supplier association information. This approach allows us to obtain fraud
attention meta-paths with different weights. Finally, we detect and analyze fraud through
graph networks for several consecutive years, explaining fraud risk correlations between
different companies.
The structure of this paper is organized as follows. Section 2introduces the most
recent literature and the motivation behind this study. Section 3describes the construc-
tion of the supply chain relationship graph and the specific manner in which ieHGCN
aggregates supply chain relationship features. Section 4presents experimental results on
the financial and supply chain data of Chinese listed companies, as well as analysis of
fraud detection combining supply chain relationship graphs with financial data. Section 5
presents the conclusions.
2. Related Work
Analyzing the information disclosed in financial statements directly reflects a com-
pany’s financial situation. Quantitative financial indicators not only present a company’s
operating conditions but also imply potential risks [
22
24
]. Therefore, in recent years,
most studies have employed machine learning methods for the rapid detection of financial
data. For example, Perols
[25]
used logistic regression and SVM based on data from the
SEC’s AAERs to detect financial fraud in listed companies between 1998 and 2005, and
various other machine learning methods have also been introduced. Additionally, some
research has focused on aspects such as sample balancing, parameter optimization, and the
classification of fraudulent behavior in the machine learning process [4,8].
To fully utilize other operational information of enterprises, some studies have begun
to pay attention to the textual data of financial reports [
26
]. In particular, in the field
of management discussion and analysis (MD&A), Goel and Uzuner
[27]
have extracted
sentiment semantic features from MD&A and used SVM for fraud detection. In recent
years, the capability of deep learning to extract textual features has gradually strengthened.
Some studies [
2
,
10
] have used recurrent neural networks (RNNs) to input complete MD&A
texts, achieving notable detection results. Recent research by Wang et al.
[5]
has shown that
modeling the full text of financial statements with attention mechanisms and Long Short-
Term Memory (LSTM) can also yield valuable features for fraud detection. Bhattacharya
and Mickovic
[28]
improved fraudulent firm-year observations by fine-tuning pre-trained
language models using MD&A textual information. Additionally, some studies have
introduced other supplementary information to address evolving forms of fraud and
enhance financial fraud detection, such as executive conference calls [
29
] and non-financial
corporate data [
30
,
31
] (e.g., the proportion of shares held by the largest shareholder, the
age of the CEO, etc.).
While detection technologies have gradually improved, the methods by which com-
panies embellish their financial statements and data are also constantly evolving. Conse-
quently, some studies have focused on business transactions and supply relationships that
are difficult for companies to disguise to reflect the real operational and financial status
of the companies [
32
,
33
]. For instance, the study by Li et al.
[13]
verified that variables
such as customer excess purchases and the discrepancy between supplier sales growth and
customer purchase growth can improve the accuracy of financial fraud detection. This
study only focused on the dyadic relationships between suppliers and customers. However,
J. Theor. Appl. Electron. Commer. Res. 2025,20, 26 4 of 19
as globalization and the division of labor among companies have continued to deepen,
the supply and demand relationships between companies have evidently become more
complex [
34
], forming intricate heterogeneous networks. Supply chain management has
become a central issue in different industries and there is an urgent need for improved
data mining methods to aid decision making [
35
37
]. To investigate the impact of certain
specific micro-structures in networks on financial fraud risks, Wang et al.
[16]
analyzed
the structure of tier-2 suppliers, indicating that financial risks can be transmitted through
multi-tier supply chain relationships. These studies suggest that the financial conditions
of companies can be transmitted through supply chain relationships and reflected in the
related neighboring nodes, or even the neighbors of those neighbors. By describing this
transmission through the supply chain network and integrating the financial data of all rel-
evant companies, it is possible to holistically learn the features within the financial data and
to mine for potential fraudulent activities through interaction between company entities.
The key issue in integrating supply chain information with company financial features
is how to handle the supply relationships between companies. In recent years, graph
neural networks have proven to be highly effective in modeling these knowledge graph
relationships [
38
40
]. In each layer of a graph neural network, the model aggregates the
features of nodes and their connected neighbors with mechanisms such as convolution
and attention. This process captures multi-layer node relationships through network
stacking. Typically, these methods are designed for homogeneous graphs. However,
in fraud detection, companies are often categorized as publicly listed (with accessible
financial data from annual reports) and unlisted companies (of which financial data are
difficult to obtain), forming a heterogeneous graph. Therefore, heterogeneous graph
neural networks are better suited for addressing this task [
41
,
42
]. Some methods also
focus on the expressive power of the graph, adapting to different task nodes through
isomorphism transformation [
43
]. These graph neural networks have been applied to many
fraud detection and supply chain risk tasks, such as fraud user detection [
44
], credit card
transaction fraud detection [
45
], and new order forecasts of manufacturing industries [
46
].
Wu et al.
[47]
utilized heterogeneous graph networks to detect fraudulent borrowers
in the supply chain. Li et al.
[3]
were the first to employ heterogeneous graph neural
networks to model complex multi-level company supply chain relationships, demonstrating
that the integration of financial data with supply and demand relationships can improve
detection accuracy.
Overall, research on company financial fraud detection focuses mainly on machine
learning methods. Only a few studies use supply chain information and aggregate different
company financial features. Furthermore, no studies have yet constructed large-scale
knowledge graphs for company supply chains over consecutive years, which hinders
the scalability of GNNs to practical detection tasks. Additionally, existing GNN-based
fraud detection methods also lack interpretability and cannot explicitly express the supply
chain structures that have a potential impact on financial fraud. To address these issues,
we leverage the advantages of GNN methods to extract and integrate financial features
from large-volume and long-term historical data. We introduce the ieHGCN [
21
] model
to improve the effectiveness of financial fraud detection and provide interpretability by
analyzing meta-paths within the supply chain graph network, which offers explanations
for the associative patterns between companies.
3. Methodology
We construct a workflow for detecting company financial fraud using GNNs, as
illustrated in Figure 1. After filtering the raw financial data, we extract relevant features,
construct a supply relationship knowledge graph, and then apply the GNN algorithms
J. Theor. Appl. Electron. Commer. Res. 2025,20, 26 5 of 19
to be trained and make predictions based on this graph. Finally, we evaluate the models’
detection results. The following sections will explain this process in detail.
Figure 1. Workflow of financial fraud detection based on supply chain relationship graphs.
3.1. Supply Chain Knowledge Graph
In the task of financial fraud detection, the information typically available is the
financial data disclosed by publicly listed companies, which include the financial figures
for the fiscal year and information about customers and suppliers for that year. According
to the requirements of the China Securities Regulatory Commission, listed companies
must disclose the purchase and sales amounts for their top five suppliers and customers,
and are encouraged to disclose the names of these suppliers and customers. However,
between 2012 and 2021, only approximately 30% of Chinese publicly listed companies
disclosed specific names of their suppliers and customers. Additionally, it is difficult to
obtain financial data for unlisted companies mentioned as suppliers or customers.
In some existing studies [
3
], detecting fraud within a single year for listed companies
was only feasible by filtering the disclosed data, resulting in a final sample size of only a few
hundred companies. Training detection models on such a small sample size may introduce
sample bias, leading to overfitting on small samples and difficulty in generalizing to real-
world situations. To expand the scope of supply chain relationships in financial fraud
detection tasks, this paper constructs a multi-year continuous supply chain relationship
knowledge graph.
We define the heterogeneous knowledge graph composed of listed and unlisted
companies as
G={V
,
E
,
θ
,
ω}
, where
V
represents the entity nodes of the companies,
E
represents the edge relations between the company nodes,
θ:V C
is the mapping from
the entity nodes to the classification of the listed/unlisted companies, and
ω:E R
is
the mapping from the edges between the nodes to the classification of different associative
relations between companies. The specific composition is as follows:
(1) Node Representation. We treat each year’s company as an independent entity,
constituting the basic nodes of the graph (e.g., company A in 2019 and company A in 2021
are considered as two different nodes). We define the set of listed/unlisted companies. As
for listed companies, the disclosed financial data are converted into financial ratios and used
as node features. As for unlisted companies, due to the difficulty of obtaining their financial
status, we apply one-hot encoding and then use embedding to map them into a vector
space with the same dimensionality as that of the financial ratio data. Specifically, each
unlisted company is encoded as a
|VU|
-dimensional one-hot vector, where
|VU|
represents
the total number of unlisted companies selected within the specified time range. The model
then uses an embedding layer to compress these vectors into a
d
-dimensional semantic
space, aligning with the dimensionality of the financial ratios. This allows the model to
differentiate between unlisted company nodes while ensuring that the embeddings do not
carry any financial significance, thereby distinguishing them from listed company nodes.
J. Theor. Appl. Electron. Commer. Res. 2025,20, 26 6 of 19
(2) Representation of Edge Relationship. We link companies’ nodes through three
types of relationships: Customer (where the source node sells to the target node), Supplier
(where the source node purchases from the target node), and Identical (where the source
node and the target node represent the same company but reflect financial conditions in
different years). Through ’Identical’ relationships, we can extend the supply chain network
of a company across multiple consecutive fiscal years.
(3) Heterogeneous Representation of Edge and Node. The above representations
are distinguished into different nodes and edge types. Specifically, the sets of types can
be defined as
C={L
,
U}
, representing listed/unlisted companies, and
R={C
,
S
,
I}
,
representing Customer, Supplier, and Identical company relationships, respectively. We
can use
VΘ
,
Θ C
to denote a particular type set of companies and
E
,
R
to denote
different sets of relationship types.
Based on these methods, we can build a supply chain knowledge graph as shown
in Figure 2, where the flow of supply assets is indicated by uniform arrows for customer
and supplier relationships. Specifically, we first extract financial ratios for publicly listed
companies from raw data. We then filter companies that have disclosed information about
their suppliers and customers in their financial statements. The financial ratio serves as
feature vectors representing the publicly listed company nodes, and we retain only the
unlisted company nodes that interact with different listed companies. Missing edges
and isolated unlisted company nodes are discarded, as they are unable to provide useful
information for the GNN. It should be noted that customer–supplier relationships are
not symmetrical and cannot be unified into a single type of relationship. The supply
chain information is derived from the five main suppliers and customers of each company.
Therefore, a supply relationship may be present in the customer information of the source-
node companies but absent in the supplier information of the target node. In general, the
supply chain relationship graph integrates both company financial ratios and supply chain
information. Although there may be some missing company nodes in the graph due to
incomplete disclosure of information, the three types of relationships help to eliminate
isolated nodes, resulting in a graph with a high degree of connectivity. After constructing
the knowledge graph, we introduce graph neural network methods to aggregate relevant
information and search for fraud features within financial and supply relationships.
Figure 2. Continuous annual company supply chain knowledge graph. Arrow lines indicate the flow
of asset supply. Dotted lines indicate the same company in different years.
3.2. Interpretable Heterogeneous Graph Convolutional Network
Graph neural networks (GNNs) are commonly used deep learning methods for pro-
cessing relational data. In contrast to traditional machine learning methods, graph neural
J. Theor. Appl. Electron. Commer. Res. 2025,20, 26 7 of 19
networks explicitly model the aggregation of relationships between nodes based on a
graph data structure, which enables the mining of node associations across a wide range
and multiple levels. Previous research [
3
] utilized Heterogeneous Graph Transformer
(HGT) [
42
] to model the supply chain relationship knowledge graph under heterogeneous
graph data. Although it verified the feasibility of different companies within the same
fiscal year, it overlooked the construction and interpretation of a fraud detection graph
over consecutive years, which is insufficient to meet the needs of practical regulation and
decision making. We introduce an interpretable heterogeneous graph convolution [
21
] that
aggregates enterprise financial features based on supply chain relationships and detects
financial fraud.
Figure 3demonstrates the structure of the ieHGCN. Based on the company supply
chain knowledge graph, the GNN is designed to have multiple layers, with each containing
|C|
blocks, where
C
is the set of node types. Each block represents the attention of that
type of node towards other nodes. Specifically, for each block
Θ C
in a given layer, the
computation is as follows.
(a)Model architecture consisting of two company blocks
(b)Listed company block internal calculation (c)Unlisted company block internal calculation
Figure 3. Overall ieHGCN architecture based on the features of listed/unlisted companies. (a) Five-
layer ieHGCN network. Dashed lines denote the self-relation projection and solid lines denote the
relation-specific projection. (b) Listed company block in each layer. L denotes the listed company
features and U denotes the unlisted company features. Both feature representations are projected
into a new common “listed company” semantic space. Then, object-level aggregation and type-level
aggregation are calculated and propagated to the next layer. (c) Unlisted company block in each layer.
First, nodes of different company types contain diverse semantic features. For instance,
the features of listed companies are structured financial ratios, whereas the features of
unlisted companies are merely embedding vectors that distinguish between different
J. Theor. Appl. Electron. Commer. Res. 2025,20, 26 8 of 19
company entities without including financial features. To unify the features of the nodes of
different types, the model projects the representations of neighboring nodes with different
types into a common semantic space in two ways: self-relation projection and relation-
specific projection.
Self-relation projection. For each layer, the input for the
Θ
block is
{HΘ
,
HΓ}
, where
ΓNΘ
denotes that
Γ
is the company type of
Θ
’s neighboring nodes.
HΘR|VΘd
and
HΓR|VΓd
are the feature representations of the nodes
VΘ
and
VΓ
in that layer of the
GNN. They are the financial ratio features and the embedding of unlisted companies in the
input layer, and the output from the previous layer in the middle and last layers. Within the
Θ
block, the model first utilizes the self-projection weight
W(self-Θ)Rd×d
to transform
and project the Θtype itself:
Y(self-Θ)=HΘWself-Θ(1)
Relation-specific projection. The model also needs to project the neighboring nodes of
other types into the same semantic space, using the weight of the type relation
WΓΘRd×d
:
YΓΘ=HΓWΓΘ(2)
In the same semantic space
Rd
, the model’s features can aggregate with each other,
generating meaningful hidden associative information.
Specifically,
HUR|VUd
, the features of unlisted companies
U
are projected into the
common semantic space
Rd
of the listed company
L
through
WULRd×d
. Meanwhile,
the self-projection
Wself-LRd×d
projects the features from the previous layer
HL
R|VLdinto the shared semantic space of that layer Rd.
Second, after the projection of node representations, object-level aggregation is per-
formed using heterogeneous graph convolution. For the pair of nodes of the objects
VΘ
and
VΓ
, their adjacency matrix is represented as
AΘΓR|VΘ|×|VΓ|
. Following the GCN
approach, the adjacency matrix is normalized to
ˆ
AΘΓ= (DΘΓ)1AΘΓ
, where
DΘΓ
is
the degree matrix of
AΘΓ
. The heterogeneous graph convolution is computed as follows:
Zself-Θ=Yself-Θ=HΘWself-Θ(3)
ZΓΘ=ˆ
AΘΓYΓΘ=ˆ
AΘΓHΓWΓΘ(4)
The convolutional representations are
{Zself-Θ
,
Z1Θ
,
. . .
,
ZΓΘ
,
. . .
,
Z|NΘ|→Θ}
. Specif-
ically, in the case of this work, which only includes listed and unlisted companies, the
convolutional representations are {Zself-L,ZUL}.
Subsequently, the attention of node types is calculated based on the results of node
representation aggregation, and the features of different company types are aggregated.
Queries and key matrices are constructed with different learned weights:
QΘ=Zself-ΘWΘ
q(5)
Kself-Θ=Zself-ΘWΘ
k(6)
KΓΘ=ZΓΘWΘ
k(7)
where
WΘ
qRd×da
and
WΘ
kRd×da
. As the model needs to pay attention not only to
the feature representations of neighboring companies of the same type but also to those of
J. Theor. Appl. Electron. Commer. Res. 2025,20, 26 9 of 19
different types, it maps the node representations of different types onto multiple keys. The
attention matrix can be calculated as follows:
attself-Θ=σ(hKself-Θ|| QΘiwΘ
att)(8)
attΓΘ=σ(hKΓΘ|| QΘiwΘ
att)(9)
Here,
||
denotes the concatenation operation,
wΘ
att R2da×1
represents the weight
vector, and
σ
is the activation function (exponential linear units, ELUs [
48
,
49
]).
attself-Θ
i
and
attΓΘ
i
can represent the high-dimensional attention representation of node object
i
and its
neighbor Γin the hidden layers of the GNN. The attentions are normalized as follows:
hattself-Θ|| att1Θ|| . . . || attΓΘ|| . . . || att|NΘ|→Θisoftmaxhattself-Θ|| att1Θ|| . . . || attΓΘ|| . . . || att|NΘ|→Θi (10)
Finally, the weighted representation of node
VΘ
is calculated using the atten-
tion weights:
HΘ-output
i=σ attself-Θ
iZself-Θ
i+
ΓNΘ
attΓΘ
iZΓΘ
i!(11)
where
HΘ-output
i
is the output representation of that layer. We sum the self-representations
Zself-Θ
and neighboring representations
ZΓΘ
, while using attention weights
att
to allocate
importance. This allows for the differential aggregation of information from both the
central node and its neighboring nodes. The resulting node representation
HΘ-output
captures information from both the central node and its neighbors, with the ability to
emphasize important information through varying weight magnitudes. This computation
represents the hidden layer representation of the
i
-th company node in
VΘ
, and the final
node classification representation is the output of the last layer block. The model compares
the output with the true label of whether the company is fraudulent and is trained using
the cross-entropy loss.
3.3. Evaluation Metrics
In our study, we primarily assess the detection performance of the model using
recall and the area under the receiver operating characteristic curve (AUC) [
4
]. Since the
fraud detection task focuses on the identification of fraudulent entities, recall is utilized
to evaluate the model’s ability to accurately recognize fraudulent activities. On the other
hand, AUC is used to evaluate the overall classification performance of the model across
varying classification thresholds, which is calculated by plotting the receiver operating
characteristic (ROC) curve based on the true positive rate (TPR, or recall) and false positive
rate (FPR) [2,3,5]. Specifically, recall and FPR are defined as follows:
TPR =Recall =TP
TP +F N (12)
FPR =FP
FP +T N (13)
where TP (true positive) denotes the number of samples correctly identified as fraudulent;
FN (false negative) denotes the number of fraudulent samples incorrectly classified as non-
fraudulent; TN (true negative) denotes the number of non-fraudulent samples correctly
identified; FP (false positive) denotes the number of non-fraudulent samples incorrectly
classified as fraudulent. Recall (Sensitivity) is the proportion of actual fraud cases correctly
identified [
7
]. A higher recall means fewer missed fraud cases. The AUC represents the
likelihood that a randomly chosen fraud case will be ranked higher than a non-fraud case
J. Theor. Appl. Electron. Commer. Res. 2025,20, 26 10 of 19
by the classifier. A random guess results in an AUC of 0.50, so higher values indicate better
model performance.
To further assess the model’s overall classification capability, we also employ accuracy
and the F1 score as supplementary evaluation metrics:
Accuracy =TP +T N
TP +FP +TN +FN (14)
Precision =TP
TP +FP (15)
F1 =2×Precision ×Recall
Precision +Recall (16)
Accuracy is the proportion of correctly classified instances out of all instances, with
higher values indicating better overall performance [
9
]. The F1 score is the harmonic mean
of precision and recall, balancing the two metrics [
7
]. Higher F1 scores indicate better
performance in handling imbalanced data, with fewer false positives and negatives.
4. Experiments
4.1. Experiment Setup
4.1.1. Dataset Description
We selected the necessary data for our experiments from the China Stock Market &
Accounting Research (CSMAR) database, which includes company information and fraud
labels. These data comprise organized financial data originating from Chinese companies
listed on the stock exchanges in Shanghai and Shenzhen, including supplier–customer
relationships from their disclosed financial statements. Since the average detection time
for company financial fraud is generally longer than 2 years or more [
50
], to ensure that
as many fraud samples as possible were discovered and to avoid introducing potential
sample bias, we only considered samples from 2021 and earlier as training data.
We used financial ratios instead of raw financial data to describe financial characteris-
tics. Compared to raw financial data [
4
], financial ratios are comparable across different
companies scales [
7
,
8
,
51
]. To adequately consider the financial factors that affect companies
and to enhance the model’s robustness, we selected 407 financial ratio indicators as financial
features, comprehensively representing the listed enterprises’ solvency, risk level, opera-
tional ability, financial indicators disclosure, cash flow, and other factors. Additionally, regu-
latory authorities in various countries usually do not require unlisted companies to disclose
financial information; therefore, financial ratios of unlisted companies were unavailable
and the supplier–customer relationships also relied on disclosures from listed companies.
For the determination of financial fraud, we referred to the definition commonly used
in previous research [
3
,
52
], defining seven types of violations defined by Chinese regulatory
authorities as fraudulent, including inflated profits, asset overstatement, false statements,
delayed disclosure, omission of significant information, fraudulent disclosure, and general
accounting irregularities. Companies accused of such behaviors were marked as fraudulent,
and other companies were marked as non-fraudulent. Regulatory authorities typically do
not deal with fraud disclosures of non-listed companies; therefore, non-listed companies
did not possess fraud labels.
Overall, we compiled four different ranges of company financial fraud datasets and
constructed corresponding supply chain knowledge graphs. The specific quantities are
shown in Table 1. Previous research in supply chain fraud detection used one year of
financial features and three years of supply chain data to detect fraud in one year [
3
].
However, research [
1
,
50
] indicates that financial statement fraud generally goes undetected
J. Theor. Appl. Electron. Commer. Res. 2025,20, 26 11 of 19
for 18–24 months, and some well-orchestrated frauds may go undiscovered for more than
a decade (such as Enron in 2001 and Olympus in 2011), with these longer-duration frauds
causing more severe losses. Therefore, we extended the fraud detection time range to
up to 10 years to prevent the continuation of potential frauds with serious harm. We
constructed datasets that span 10 years, 5 years, 3 years, and 1 year, according to existing
research [
4
]. We utilized a range of 10 years to cover scenarios where regulators might
need to retrospectively detect fraud in real-world situations. We also examined 5-year
and 3-year operational periods. These time ranges align with periods of stable company
development and consistent regulatory conditions, minimizing the interference from factors
such as company closures. Moreover, we included a 1-year detection range as a baseline
comparison to demonstrate the fundamental advantages of our method over other 1-year
detection methods [
8
]. It is important to note that in the datasets for 10 years, 5 years,
and 3 years, the fraudulent samples account for 18.95%, 18.14%, and 14.75% of the data,
respectively, while the fraudulent samples for 2021 alone account for 9.98%. Considering
that the exposure of fraud typically lags, this imbalance is reasonable. The imbalance rate
is also similar to other related studies [
3
,
4
]. Previous studies [
4
,
8
] typically used the earlier
years as the training set and the later years as the validation/testing set. In contrast, we
did not split the dataset by year to avoid bias in samples during different market periods.
For each dataset, we randomly selected 60% of the samples as the training set , with 20%
and 20% of the samples used as the validation set and test set, respectively. We used
the validation set to monitor the model’s performance after training and to optimize the
model’s hyperparameters.
Table 1. Dataset statistics of knowledge graphs with different time ranges. “Object-node” represents
the number of company objects (as nodes) and “Relation-edge” represents the number of supply
relationships between companies (as edges). “Total”, “Listed”, and “Fraud” indicate the total number
of companies, the number of listed companies, and the number of fraudulent companies, respectively.
“L-Customer-L represents the number of customer relationships between listed companies, while
“L-Customer-U” represents the number of customer relationships between listed and non-listed
companies; similar definitions apply to other relationships.
Object-Node Relation-Edge
Year Range Total Listed Fraud L-Customer-L L-Supplier-L L-Identical-L L-Customer-U L-Supplier-U U-Identical-U
2012–2021 44,611 9472 1795 3064 1993 53,872 16,882 15,973 107,665
2017–2021 22,768 4431 804 1230 1145 15,455 7529 9006 41,563
2019–2021 13,161 2745 405 845 735 6195 4245 5123 18,080
2021 4895 1102 110 348 287 1102 1498 1880 3793
4.1.2. Baselines and Experimental Settings
To evaluate the efficacy of the supply chain relationship knowledge graph, we incorpo-
rated machine learning techniques to analyze financial ratio data and employed additional
GNN methods to extract features from the supply chain relations for comparison.
Machine Learning Methods: We selected several methods that have been demon-
strated to be highly effective in previous studies for comparison: Adaboost, XGBoost,
Random Forest (RF), Support Vector Machine (SVM), and LightGBM [
4
,
31
,
53
]. SVM uses
the Gaussian kernel function. The other optimal parameters for these methods were
determined through grid search, with optimization performed on the validation set.
GNN-Based Methods: We chose the following GNN methods: GraphSAGE, RGCN
(Relational Graph Convolutional Network), HGT (Heterogeneous Graph Transformer),
and GIN (Graph Isomorphism Network). GraphSAGE is based on homogeneous graphs,
but it can be adapted to our task by employing heterogeneous graph convolution as
J. Theor. Appl. Electron. Commer. Res. 2025,20, 26 12 of 19
the aggregation method. All GNN methods have two hidden layers, with their optimal
parameters also determined through grid search, optimized on the validation set.
Sampling Method: Given the imbalance in the dataset, we noticed that when training
GNNs with limited graph data, considering both fraudulent and non-fraudulent nodes may
lead to overfitting. Therefore, we employed undersampling [
4
] as the training approach for
all models, which can slow down the fitting speed of the GNNs and prevent overfitting.
Optimization Technique: For GNN-based methods, we used the Adam optimizer,
with the learning rate set to vary according to the 1-cycle policy [
54
]. This policy dictates
that the learning rate increases to its maximum value of 1
×
10
3
during the first half of
the cycle and then decreases to zero during the second half.
Implementation Details: We implemented various machine learning methods using
XGBoost, LightGBM, and the sklearn package, while GNN-based methods were imple-
mented using PyTorch and DGL. All experiments were conducted on Python 3.11.5, with a
4090 (24 GB) GPU.
4.2. Fraud Detection Results and Comparison
We first compared the fraud detection results of different models on a 5-year dataset
(2017–2021) in Table 2. Among the machine learning methods, LightGBM exhibited the
best performance, with an AUC of 0.7026 and a recall rate of 0.7143. However, GNN-
based methods achieved superior results. Specifically, RGCN, HGT, GIN, and ieHGCN
attained AUC scores of 0.7087, 0.7198, 0.7224, and 0.7498, respectively. There was a general
improvement in classification performance across the board, indicating that considering
the hidden relationships encapsulated within the supply chain relationship knowledge
graph can improve fraud detection.
Table 2. Comparison of financial fraud detection results based on different methods.
Methods Accuracy Precision Recall F1-Macro AUC
Machine
Learning
Methods
Adaboost 0.6568 0.2820 0.5513 0.5684 0.6160
XGBoost 0.6876 0.3333 0.6859 0.6154 0.6870
SVM 0.7067 0.3311 0.6667 0.6217 0.6909
RF 0.7150 0.3380 0.6599 0.6275 0.6932
LightGBM 0.7221 0.3793 0.6707 0.6472 0.7026
GNN-based
Methods
GraphSAGE 0.7470 0.3819 0.6340 0.6549 0.7031
RGCN 0.7352 0.3782 0.6667 0.6523 0.7087
HGT 0.7494 0.3902 0.6732 0.6637 0.7198
GIN 0.7720 0.4043 0.6463 0.6750 0.7224
ieHGCN 0.7577 0.4137 0.7372 0.6834 0.7498
Note: Bold values indicate the best performance for each metric.
Among all GNN methods, ieHGCN achieved the best results, outperforming the other
top-performing GNN models by 9.5%, 1.2%, and 3.8% for recall rate, F1-macro, and AUC,
respectively. Compared to the previous state-of-the-art model HGT, the ieHGCN model ex-
hibited increases of 9.5%, 3.0%, and 4.2% in the recall rate, F1-macro, and AUC, respectively.
It also showed improvements of 14.1%, 1.2%, and 3.8% over the best-performing model
in our experiments, GIN. These results indicate that the ieHGCN model has a superior
ability to extract features from the supply chain knowledge graph for fraud detection tasks.
Unlike other methods that aggregate neighbor node information across layers, ieHGCN
aggregates features of different types of neighbor nodes within various blocks in the same
layer, achieving more efficient and accurate performance. Among all methods, ieHGCN
stands out as the most effective detection model. Using supply chain relationships be-
J. Theor. Appl. Electron. Commer. Res. 2025,20, 26 13 of 19
tween companies effectively, it integrates the financial features of publicly listed companies,
resulting in superior detection performance.
4.3. Fraud Detection in Different Time Ranges
The knowledge graph pattern we constructed enables the model to utilize continuous
multi-year company financial features and supply chain relationships. To explore the
impact of different time ranges on fraud detection, we constructed supply chain relationship
knowledge graphs for four periods, 2021, 2019–2021, 2017–2021, and 2012–2021, to detect
financial fraud in specific ranges, with the results illustrated in Figure 4. We used the
best-performing machine learning model in our experiments, LightGBM, as a baseline to
compare the effects of GNN-based models.
(a)2021 (b)2019–2021
(c)2017–2021 (d)2012–2021
Figure 4. Fraud detection results for different time ranges.
As shown in Figure 4, GNN-based methods that consider supply chain relationships
outperform machine learning methods that only consider financial features across all time
ranges. Some methods based on homogeneous graphs show only minor improvements,
while heterogeneous graph methods demonstrate relatively greater improvements. The
ieHGCN model, which can aggregate attention on company relationship information, ex-
hibited the best performance on all datasets. This suggests that through the transformation
of semantic space, the heterogeneous features of both unlisted and listed companies can be
fully integrated, thereby aiding the model in extracting potential fraud-related features.
It is noteworthy that all models achieved the best detection effectiveness in the years
2017–2021, followed by 2019–2021 and 2021, with the poorest performance observed for
J. Theor. Appl. Electron. Commer. Res. 2025,20, 26 14 of 19
2012–2021. This indicates that 3–5 years is a critical time range for uncovering company
financial fraud: supply chain relationships over shorter periods might not fully reflect
the intensity of inter-company connections, and financial features could also have been
disguised, lacking the continuity of actual business operations. Financial features and
supply relationships within ten years may contain noise due to changes in regulatory
policies. Within 3–5 years, companies are likely to have a more stable policy environment
and business plans, making fraudulent financial features and anomalies more pronounced.
5. Discussion on Fraud-Associated Supply Chains
Building upon the unique attributes of GNN methods, we conducted a further study of
relationships within the supply chain. The activities of companies within the supply chain
network contain useful transaction information [
17
,
55
,
56
]. Therefore, studying company-
related financial fraud risks based on supply chain relationships can explicitly express the
potential patterns of fraudulent associations that may exist within the supply chain. In
this section, we first investigate the impact of supply relationships at different tiers on
financial fraud; subsequently, we use the interpretability of the ieHGCN model to evaluate
the meta-paths between listed and related unlisted companies to analyze supply chain
structures that may transmit financial fraud risks.
Table 3. Fraud detection performance of ieHGCN models with different tiers.
Tiers Accuracy Precision Recall F1-Macro AUC
1 0.6983 0.3592 0.8013 0.6404 0.7381
20.7577 0.4137 0.7372 0.6834 0.7498
3 0.7209 0.3713 0.7308 0.6500 0.7247
4 0.7625 0.4091 0.6346 0.6710 0.7131
Note: Bold values indicate the best performance for each metric.
The ieHGCN model aggregates multi-tier supply chain relationships by stacking
convolutional layers. In the first layer, the model learns the financial features between
companies and their tier-1 suppliers and customers. In the second layer, it aggregates
the tier-1 and tier-2 relationships of the supply chains. By doing so, the model outputs
a fraud detection score through a classification layer at the end. The different effects of
the ieHGCN method with different tiers are presented in Table 3. The model considering
tier-2 supply chain relationships achieved the highest AUC of 0.7498, suggesting that the
attention aggregation between the model’s layers can extract the associated information of
tier-1 and tier-2 supply chain companies, thereby aiding detection. Additionally, previous
research [
3
] had difficulty extracting effective information at the tier-1 supply chain level.
However, in reality, immediate suppliers and customers have a considerable impact on
the company’s operational performance and financial condition [
56
]. The method we
introduced addresses this problem well, achieving considerable results even when only in-
tegrating tier-1 supply chain relationships. This indicates that the semantic space projection
and category information aggregation operations in the modules can learn the financial
features of neighboring nodes within a single GNN layer.
For the best-performing model that integrates information from the tier-2 supply
chain, we analyzed the company association paths in the supply chain knowledge graph
based on the GNN’s meta-paths. First, we calculated the average attention distribution for
each relationship within each layer of the model, representing how nodes distribute atten-
tion to different types of neighboring nodes through various relationships. As shown in
Table 4,
Self-relation indicates the semantic space self-projection, while Identical, Customer,
J. Theor. Appl. Electron. Commer. Res. 2025,20, 26 15 of 19
and Supply, respectively, represent the attention weights of nodes in the corresponding
relationships. We primarily focus on the impact of these three relationships along the path.
Table 4. Attention distribution coefficients for different relations in each layer of the model.
Layers Company Self-Relation Identical Customer Supply
1–2 Listed 0.3237 0.2039 0.2039 0.2685
Unlisted 0.2499 0.2482 0.2478 0.2541
2–3 Listed 0.2680 0.3095 0.2194 0.2031
Unlisted 0.2466 0.2393 0.2460 0.2681
Based on the attention weights, we can calculate the importance of the meta-paths
within the graph. Figure 5displays the meta-paths of relevant nodes within the tier-2
relationship scope for the focal company A. It is important to note that we disregard cases
where tier-2 relationships involved unlisted companies because unlisted companies lack
financial features, making it impossible to assess their relationship with company A.
Figure 5. Company fraud association paths obtained from attention coefficients.
As shown in Figure 5, within the tier-1 relationships of company A (the 1-hop rela-
tionships in the figure), the suppliers are the nodes that receive the most attention, namely
company B and H. Among them, company B is a listed company with explicit financial
features, and hence it is the node that should be paid the most attention. This suggests that
in supplier–customer relationships, suppliers are more crucial in influencing a company’s
financial condition. The financial status of suppliers can reveal potential financial fraud
activities in their downstream companies.
Based on the above results, a company’s tier-2 supply chain relationships are more
capable of uncovering inconsistencies in financial data between associated companies. In
Figure 5, the tier-2 companies to company A have different attention weights. Among
them, the node C, which shares the same supplier B with A, forms the meta-path with the
highest attention in different companies:
ABC
(0.2685
×
0.2194
=
0.0589). This
structure suggests that the financial fraud judgment of A is correlated with the financial
features of C. Similarly, the meta-path
AHI
(0.2541
×
0.2194
=
0.0557) also shows
higher attention weight. This indicates that companies sharing the same supplier have
related financial fraud risks, which may be due to potential fraud in the supplier, leading to
inconsistencies between financial features and the pattern of activities in the supply chain.
Tier-2 relationships with continuous supply also constitute meta-paths with high attention,
J. Theor. Appl. Electron. Commer. Res. 2025,20, 26 16 of 19
such as
ABD
(0.2685
×
0.2031
=
0.0545) and
LKA
(0.2478
×
0.2194
=
0.0544),
indicating that there is a certain fraud-relatedness between a company and its multi-tier
supplier. Financial statement fraud in a company could propagate through the supply
chain, causing systematic fraud. These patterns are also consistent with some existing
supply chain research [16].
Additionally, the historical financial features of a company are also the primary concern
in the model’s attention, as represented by the meta-path
AA1A2
. This is intuitive
since the operations of the same company are continuous. Although financial features
change over time, they should conform to a similar distribution.
Overall, the detection model we constructed can interpret the fraud associations based
on the supply chain relationships between companies. Experiments indicate that the model
performs best when employing tier-2 supply chain relationships. Even using only tier-1
supply chain relationships, the proposed method can provide effective fraud features,
which outperforms other existing methods. Moreover, the interpretive studies show that in
the company supply chain relationships, the fork structure (e.g.,
ABC
), the chain
structure (e.g.,
ABD
), and the historical association of the same company (e.g.,
AA1A2
) are among the most important structures in the supply chain knowledge
graph and require focused analysis in fraud detection.
6. Conclusions
In this paper, we have introduced an interpretable GNN method, ieHGCN, to detect
patterns in an expanded company supply chain knowledge graph. Specifically, we first
constructed a company supply chain knowledge graph that can expand on the supplier–
customer relationships in continuous operational years. Next, we aggregated different
companies’ financial features through the supply chain relationships of listed and unlisted
companies. Finally, we allocated node weights using the attention mechanism and detected
fraud in listed companies. Our experimental results have shown that compared to machine
learning methods using only financial data and other supply chain-based GNN methods,
ieHGCN achieves the best detection results, improving by 3.8% in AUC compared to other
methods. Moreover, our proposed method can rationally explain the supplier structure in
the supply chain with a high risk of financial fraud. It provides analytical suggestions for
further supplier management and financial fraud regulation.
There are also some limitations and potential future research directions. The inclusion
of data from 2020 and 2021, years significantly impacted by the COVID-19 pandemic, may
have skewed supply chain dynamics and delayed fraud detection. Future work could
address this by excluding COVID-19-period data to ensure more accurate fraud detection
analysis. Another avenue for future research is the development of a dynamic knowledge
graph that updates in real time based on changes in sales and purchase amounts within
supplier–customer relationships. This approach could improve the effectiveness of supply
chain relationships and enable real-time fraud detection. Additionally, considering the
impact of imbalanced graph node samples on detection results, exploring GNN algorithms
tailored for imbalanced datasets might further improve detection performance. Further-
more, the changes in the financial conditions of businesses during continuous operations
may also imply fraudulent features. Therefore, the time series features of financial data can
be further investigated.
Author Contributions: Conceptualization, S.Z. and T.M.; methodology, T.M.; software, T.M.; valida-
tion, T.M., S.Z. and H.W.; formal analysis, T.M.; investigation, S.Z.; resources, T.M.; data curation,
H.W.; writing—original draft preparation, T.M.; writing—review and editing, J.R., Y.L. and R.G.;
visualization, T.M.; supervision, D.H.; project administration, D.H.; funding acquisition, D.H. All
authors have read and agreed to the published version of the manuscript.
J. Theor. Appl. Electron. Commer. Res. 2025,20, 26 17 of 19
Funding: This research is supported by the National Key R&D Program of China (Grant No.:
2021YFB2700900), the National Natural Science Foundation of China (Grant No.: 62376074), the Shen-
zhen Science and Technology Program (Grants No.: SGDX20230116091244004, KCXST2022102111140
4010, JSGGKQTD20221101115655027, KJZD20230923114405011, JSGG20220831103400002, RKX202311
10090859012, KJZD20231023095959002), the Fundamental Research Funds for the Central Universities
(Grant No.: HIT.OCEF.2024047), Harbin Institute of Technology (Shenzhen) Joint Basic Education
Cultivation Project “Application Project of Intelligent Assistive Teaching System for Secondary School
Biology Curriculum Based on Multimodal Large Language Model” and Guangdong Research Grants
for Philosophy and Social Science (Grant No.: GD21CYJ04).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Data are available on request from the authors.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1.
Association of Certified Fraud Examiners (ACFE). Occupational Fraud 2022: A Report to the Nations. ACFE: Austin, TX, USA, 2022.
[CrossRef].
2.
Craja, P.; Kim, A.; Lessmann, S. Deep learning for detecting financial statement fraud. Decis. Support Syst. 2020,139, 113421.
[CrossRef]. [CrossRef]
3.
Li, J.; Chang, Y.; Wang, Y.; Zhu, X. Tracking down financial statement fraud by analyzing the supplier-customer relationship
network. Comput. Ind. Eng. 2023,178, 109118. [CrossRef]. [CrossRef]
4.
Bao, Y.; Ke, B.; Li, B.; Yu, Y.J.; Zhang, J. Detecting accounting fraud in publicly traded US firms using a machine learning approach.
J. Account. Res. 2020,58, 199–235. [CrossRef]. [CrossRef]
5.
Wang, G.; Ma, J.; Chen, G. Attentive statement fraud detection: Distinguishing multimodal financial data with fine-grained
attention. Decis. Support Syst. 2023,167, 113913. [CrossRef]. [CrossRef]
6.
Shahana, T.; Lavanya, V.; Bhat, A.R. State of the art in financial statement fraud detection: A systematic review. Technol. Forecast.
Soc. Change 2023,192, 122527. [CrossRef]. [CrossRef]
7.
Dechow, P.M.; Ge, W.; Larson, C.R.; Sloan, R.G. Predicting material accounting misstatements. Contemp. Account. Res. 2011,
28, 17–82. [CrossRef]. [CrossRef]
8. Khan, A.T.; Cao, X.; Li, S.; Katsikis, V.N.; Brajevic, I.; Stanimirovic, P.S. Fraud detection in publicly traded US firms using Beetle
Antennae Search: A machine learning approach. Expert Syst. Appl. 2022,191, 116148. [CrossRef]. [CrossRef]
9.
Cecchini, M.; Aytug, H.; Koehler, G.J.; Pathak, P. Making words work: Using financial text as a predictor of financial events. Decis.
Support Syst. 2010,50, 164–175. [CrossRef]. [CrossRef]
10.
Wu, X.; Du, S. An analysis on financial statement fraud detection for Chinese listed companies using deep learning. IEEE Access
2022,10, 22516–22532. [CrossRef].
11.
DuHadway, S.; Mena, C.; Ellram, L.M. Let the buyer beware: how network structure can enable (and prevent) supply chain fraud.
Int. J. Oper. Prod. Manag. 2022,42, 125–150. [CrossRef]. [CrossRef]
12.
Manuela, P.; Cristina, B.; Molina-Morales, F.X. I need you, but do I love you? Strong ties and innovation in supplier–customer
relations. Eur. Manag. J. 2021,39, 790–801. [CrossRef]. [CrossRef]
13.
Li, C.; Li, N.; Zhang, F. Using economic links between firms to detect accounting fraud. Account. Rev. 2023,98, 399–421.
[CrossRef]. [CrossRef]
14.
Patterson, J.L.; Goodwin, K.N.; McGarry, J.L. Understanding and Mitigating Supply Chain Fraud. J. Mark. Dev. Compet. 2018,12.
[CrossRef].
15.
DuHadway, S.; Talluri, S.; Ho, W.; Buckhoff, T. Light in dark places: the hidden world of supply chain fraud. IEEE Trans. Eng.
Manag. 2020,69, 874–887. [CrossRef]. [CrossRef]
16.
Wang, Y.; Li, J.; Wu, D.; Anupindi, R. When ignorance is not bliss: An empirical analysis of subtier supply network structure on
firm risk. Manag. Sci. 2021,67, 2029–2048. [CrossRef]. [CrossRef]
17.
Lu, G.; Shang, G. Impact of supply base structural complexity on financial performance: Roles of visible and not-so-visible
characteristics. J. Oper. Manag. 2017,53, 23–44. [CrossRef]. [CrossRef]
18.
Villena, V.H.; Gioia, D.A. On the riskiness of lower-tier suppliers: Managing sustainability in supply networks. J. Oper. Manag.
2018,64, 65–87. [CrossRef]. [CrossRef]
J. Theor. Appl. Electron. Commer. Res. 2025,20, 26 18 of 19
19.
Taghizadeh, E.; Venkatachalam, S.; Chinnam, R.B. Impact of deep-tier visibility on effective resilience assessment of supply
networks. Int. J. Prod. Econ. 2021,241, 108254. [CrossRef]. [CrossRef]
20.
Ang, E.; Iancu, D.A.; Swinney, R. Disruption risk and optimal sourcing in multitier supply networks. Manag. Sci. 2017,
63, 2397–2419. [CrossRef]. [CrossRef]
21.
Yang, Y.; Guan, Z.; Li, J.; Zhao, W.; Cui, J.; Wang, Q. Interpretable and efficient heterogeneous graph convolutional network. IEEE
Trans. Knowl. Data Eng. 2021,35, 1637–1650. [CrossRef]. [CrossRef]
22. Beaver, W.H. Financial ratios as predictors of failure. J. Account. Res. 1966,4, 71–111. [CrossRef]. [CrossRef]
23. Beneish, M.D. The detection of earnings manipulation. Financ. Anal. J. 1999,55, 24–36. [CrossRef]. [CrossRef]
24.
Green, B.P.; Choi, J.H. Assessing the risk of management fraud through neural network technology. Auditing 1997,16, 14–28.
[CrossRef].
25. Perols, J. Financial statement fraud detection: An analysis of statistical and machine learning algorithms. Audit. J. Pract. Theory
2011,30, 19–50. [CrossRef]. [CrossRef]
26.
Dong, W.; Liao, S.; Zhang, Z. Leveraging financial social media data for corporate fraud detection. J. Manag. Inf. Syst. 2018,
35, 461–487. [CrossRef]. [CrossRef]
27.
Goel, S.; Uzuner, O. Do sentiments matter in fraud detection? Estimating semantic orientation of annual reports. Intell. Syst.
Account. Financ. Manag. 2016,23, 215–239. [CrossRef]. [CrossRef]
28.
Bhattacharya, I.; Mickovic, A. Accounting fraud detection using contextual language learning. Int. J. Account. Inf. Syst. 2024,
53, 100682. [CrossRef]. [CrossRef]
29.
Throckmorton, C.S.; Mayew, W.J.; Venkatachalam, M.; Collins, L.M. Financial fraud detection using vocal, linguistic and financial
cues. Decis. Support Syst. 2015,74, 78–87. [CrossRef]. [CrossRef]
30.
Kim, Y.J.; Baik, B.; Cho, S. Detecting financial misstatements with fraud intention using multi-class cost-sensitive learning. Expert
Syst. Appl. 2016,62, 32–43. [CrossRef]. [CrossRef]
31.
Xu, X.; Xiong, F.; An, Z. Using machine learning to predict corporate fraud: Evidence based on the GONE framework. J. Bus.
Ethics 2023,186, 137–158. [CrossRef]. [CrossRef]
32.
Iranmanesh, M.; Maroufkhani, P.; Asadi, S.; Ghobakhloo, M.; Dwivedi, Y.K.; Tseng, M.L. Effects of supply chain transparency,
alignment, adaptability, and agility on blockchain adoption in supply chain among SMEs. Comput. Ind. Eng. 2023,176, 108931.
[CrossRef]. [CrossRef]
33.
Yin, C.; Cheng, X.; Yang, Y.; Palmon, D. Do corporate frauds distort suppliers’ investment decisions? J. Bus. Ethics 2021,
172, 115–132. [CrossRef]. [CrossRef]
34.
Azzi, R.; Chamoun, R.K.; Sokhn, M. The power of a blockchain-based supply chain. Comput. Ind. Eng. 2019,135, 582–592.
[CrossRef]. [CrossRef]
35.
Kara, M.E.; Fırat, S.Ü.O.; Ghadge, A. A data mining-based framework for supply chain risk management. Comput. Ind. Eng. 2020,
139, 105570. [CrossRef]. [CrossRef]
36.
Risso, L.A.; Ganga, G.M.D.; Godinho Filho, M.; de Santa-Eulalia, L.A.; Chikhi, T.; Mosconi, E. Present and future perspectives
of blockchain in supply chain management: A review of reviews and research agenda. Comput. Ind. Eng. 2023,179, 109195.
[CrossRef]. [CrossRef]
37.
Karamchandani, A.; Srivastava, S.K.; Srivastava, A. A lower approximation based integrated decision analysis framework for a
blockchain-based supply chain. Comput. Ind. Eng. 2023,177, 109092. [CrossRef]. [CrossRef]
38.
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907.
[CrossRef].
39.
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the Advances in Neural
Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [CrossRef].
40.
Veliˇckovi´c, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2018, arXiv:1710.10903.
[CrossRef].
41.
Wang, X.; Ji, H.; Shi, C.; Wang, B.; Ye, Y.; Cui, P.; Yu, P.S. Heterogeneous graph attention network. In Proceedings of the The
World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 2022–2032. [CrossRef].
42.
Hu, Z.; Dong, Y.; Wang, K.; Sun, Y. Heterogeneous graph transformer. In Proceedings of the Web Conference 2020, Taipei, Taiwan,
20–24 April 2020; pp. 2704–2710. [CrossRef].
43. Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How powerful are graph neural networks? arXiv 2018, arXiv:1810.00826. [CrossRef].
44.
Chen, J.; Chen, Q.; Jiang, F.; Guo, X.; Sha, K.; Wang, Y. SCN_GNN: A GNN-based fraud detection algorithm combining strong
node and graph topology information. Expert Syst. Appl. 2024,237, 121643. [CrossRef]. [CrossRef]
45.
Van Belle, R.; Baesens, B.; De Weerdt, J. CATCHM: A novel network-based credit card fraud detection method using node
representation learning. Decis. Support Syst. 2023,164, 113866. [CrossRef]. [CrossRef]
46.
Lee, C.Y.; Yang, S.H. Graph Spatio-Temporal networks for manufacturing sales forecast and prevention policies in pandemic era.
Comput. Ind. Eng. 2023,182, 109413. [CrossRef]. [CrossRef] [PubMed]
J. Theor. Appl. Electron. Commer. Res. 2025,20, 26 19 of 19
47.
Wu, B.; Chao, K.M.; Li, Y. Heterogeneous graph neural networks for fraud detection and explanation in supply chain finance.
Inf. Syst. 2024,121, 102335. [CrossRef]. [CrossRef]
48. Barron, J.T. Continuously differentiable exponential linear units. arXiv 2017, arXiv:1704.07483. [CrossRef].
49.
Hassanin, M.; Anwar, S.; Radwan, I.; Khan, F.S.; Mian, A. Visual attention methods in deep learning: An in-depth survey.
Inf. Fusion 2024,108, 102417. [CrossRef]. [CrossRef]
50.
Dyck, A.; Morse, A.; Zingales, L. Who blows the whistle on corporate fraud? J. Financ. 2010,65, 2213–2253. [CrossRef]. [CrossRef]
51.
Cecchini, M.; Aytug, H.; Koehler, G.J.; Pathak, P. Detecting management fraud in public companies. Manag. Sci. 2010,
56, 1146–1160. [CrossRef]. [CrossRef]
52.
Xiong, J.; Ouyang, C.; Tong, J.Y.; Zhang, F.F. Fraud commitment in a smaller world: Evidence from a natural experiment. J. Corp.
Financ. 2021,70, 102090. [CrossRef]. [CrossRef]
53.
Papík, M.; Papíková, L. Detecting accounting fraud in companies reporting under US GAAP through data mining. Int. J. Account.
Inf. Syst. 2022,45, 100559. [CrossRef]. [CrossRef]
54.
Smith, L.N.; Topin, N. Super-convergence: Very fast training of neural networks using large learning rates. In Proceedings of the
Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, SPIE, Baltimore, MD, USA, 15–17 April
2019; Volume 11006, pp. 369–386. [CrossRef].
55.
Jiang, R.; Kang, Y.; Liu, Y.; Liang, Z.; Duan, Y.; Sun, Y.; Liu, J. A trust transitivity model of small and medium-sized manufacturing
enterprises under blockchain-based supply chain finance. Int. J. Prod. Econ. 2022,247, 108469. [CrossRef]. [CrossRef]
56.
Gu, Q.; Jitpaipoon, T.; Yang, J. The impact of information integration on financial performance: A knowledge-based view. Int. J.
Prod. Econ. 2017,191, 221–232. [CrossRef]. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
ResearchGate has not been able to resolve any citations for this publication.
Article
A growing number of papers addressing blockchain to supply chain management has been found, extending the range for scholars and practitioners aiming to comprehend the state of the art is in this emerging field. Thus, we depict the current state of the art of blockchain in supply chain management and identify the main gaps to be explored in order to advance knowledge in this interplay. We employ the systematic literature review approach in a review of reviews to look into 103 review papers found in peer-reviewed scientific journals currently assigned to Scopus or Web of Science databases. Our results extend the discussion through an integrative approach that embraces blockchain technology in the context of supply chain management from multiple perspectives, encompassing the interface between physical and virtual flows beyond one-step-back and one-step-forward scope. We also consider disruptive aspects of blockchain technology as an alternative to integrate different supply chain stakeholders and provide information sharing to support both process-monitoring and decision-making in different industries and segments.
Article
We are pleased to share our published article, which proposes a framework for agile decision making for an organization in a blockchain based supply chain. The paper presents a helpful tool called L-Graph that can help in quick decision making based on the values of large number of KPIs. The decision making framework is proposed based on concepts of Artificial Intelligence such as case based reasoning and rough fuzzy set theory. It is one of the first decision-making frameworks proposed for a blockchain-based supply chain that can be a part of an organization's Big data analytics toolbox. It is now available online for free access till April 11, 2023 on the following link: https://authors.elsevier.com/a/1gdZ~_8D5PXsZK It is published in "Computers and Industrial Engineering", a leading peer-reviewed international journal.
Article
This study aims to investigate the extent to which the contributions of blockchain technology to supply chain parameters influence blockchain adoption among SMEs. Drawing on contingency theory, the study investigates the moderating effect of market turbulence. The data were collected from 204 SMEs in Malaysia’s manufacturing sector and analysed using the partial least squares technique. The results showed that the intention of SMEs’ managers to adopt blockchain is influenced by the contributions of blockchain to supply chain transparency and agility. Supply chain transparency, alignment, adaptability, and agility are interrelated. Market turbulence moderates positively the association between agility and intention to adopt blockchain. This study extends the literature by decomposing the concept of relative advantages and investigating the influences of blockchain benefits on blockchain adoption. The moderating effect of market turbulence indicates that the influence of blockchain on agility is more important for SMEs operating in a turbulent environment than the SMEs in a stable market. The findings help the policymakers and blockchain vendors in developing effective plans and strategies to speed up the adoption of blockchain among SMEs. Furthermore, the results give confidence to the managers and owners of SMEs that blockchain can be a valuable competitive advantage source.