ArticlePDF Available

Explainable Machine Learning for Financial Distress Prediction: Evidence from Vietnam

MDPI
Data
Authors:

Abstract and Figures

The past decade has witnessed the rapid development of machine learning applied in economics and finance. Recent evidence suggests that machine learning models have produced superior results to traditional statistical models and have become the driving force for dramatic improvement in the financial industry. However, a much-debated question is whether the prediction results from black box machine learning models can be interpreted. In this study, we compared the predictive power of machine learning algorithms and applied SHAP values to interpret the prediction results on the dataset of listed companies in Vietnam from 2010 to 2021. The results showed that the extreme gradient boosting and random forest models outperformed other models. In addition, based on Shapley values, we also found that long-term debts to equity, enterprise value to revenues, account payable to equity, and diluted EPS had greatly influenced the outputs. In terms of practical contributions, the study helps credit rating companies have a new method for predicting the possibility of default of bond issuers in the market. The study also provides an early warning tool for policymakers about the risks of public companies in order to develop measures to protect retail investors against the risk of bond default.
This content is subject to copyright.
Citation: Tran, K.L.; Le, H.A.;
Nguyen, T.H.; Nguyen, D.T.
Explainable Machine Learning for
Financial Distress Prediction:
Evidence from Vietnam. Data 2022,7,
160. https://doi.org/10.3390/
data7110160
Academic Editor: Francisco Guijarro
Received: 13 October 2022
Accepted: 7 November 2022
Published: 14 November 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
data
Article
Explainable Machine Learning for Financial Distress Prediction:
Evidence from Vietnam
Kim Long Tran 1, Hoang Anh Le 2,* , Thanh Hien Nguyen 3and Duc Trung Nguyen 1
1Faculty of Banking, Ho Chi Minh University of Banking, No. 36 Ton That Dam Street,
Nguyen Thai Binh Ward, District 1, Ho Chi Minh City 700000, Vietnam
2Institute for Research Science and Banking Technology, Ho Chi Minh University of Banking,
No. 36 Ton That Dam Street, Nguyen Thai Binh Ward, District 1, Ho Chi Minh City 700000, Vietnam
3Department of Economic Mathematics, Ho Chi Minh University of Banking, No. 36 Ton That Dam Street,
Nguyen Thai Binh Ward, District 1, Ho Chi Minh City 700000, Vietnam
*Correspondence: anhlh_vnc@buh.edu.vn
Abstract:
The past decade has witnessed the rapid development of machine learning applied in
economics and finance. Recent evidence suggests that machine learning models have produced
superior results to traditional statistical models and have become the driving force for dramatic
improvement in the financial industry. However, a much-debated question is whether the prediction
results from black box machine learning models can be interpreted. In this study, we compared
the predictive power of machine learning algorithms and applied SHAP values to interpret the
prediction results on the dataset of listed companies in Vietnam from 2010 to 2021. The results
showed that the extreme gradient boosting and random forest models outperformed other models.
In addition, based on Shapley values, we also found that long-term debts to equity, enterprise value
to revenues, account payable to equity, and diluted EPS had greatly influenced the outputs. In terms
of practical contributions, the study helps credit rating companies have a new method for predicting
the possibility of default of bond issuers in the market. The study also provides an early warning tool
for policymakers about the risks of public companies in order to develop measures to protect retail
investors against the risk of bond default.
Keywords: explainable AI; financial distress; machine learning
1. Introduction
Financial distress refers to the situation in which a company fail to meet debt obliga-
tions to its creditors at maturity. The prolonged and severe financial distress can eventually
lead to bankruptcy. Traditionally, the assessment of the financial distress situation of com-
panies was mainly based on the subjective judgment of experts. However, this expert-based
approach exposes many drawbacks, including the results are inconsistent, cannot be vali-
dated and are highly dependent on expert competence. Therefore, other approaches have
been developed to improve consistency and accuracy.These classification techniques can be
categorized into statistical methods and machine learning methods. Statistical methods
include univariate analysis [
1
], multiple discriminant analysis [
2
], logistic regression [
3
],
and Cox survival model [4]. Statistical models are simple in structure, highly explanatory,
and take less time to train. However, statistical models require many strict assumptions
unavailable in real life, including linear relationships, homogeneity of variances and inde-
pendence assumptions. Violation of these assumptions can reduce the predictive power
of statistical methods. Then, the development of machine learning algorithms marked
a breakthrough in the science of prediction. The application of machine learning models,
such as support vector machine [
5
], decision tree [
6
], and artificial neural networks [
7
], have
enhanced the predictive power of traditional models. Recently, ensemble models such as
random forest [
8
], adaptive boosting [
9
], and extreme gradient boosting [
10
] have become
Data 2022,7, 160. https://doi.org/10.3390/data7110160 https://www.mdpi.com/journal/data
Data 2022,7, 160 2 of 12
significant drivers of developments in the economy and financial sectors, especially in risk
management. Although providing better forecasting results, machine learning methods
also have drawbacks, as these models are complex and unable to interpret. Meanwhile,
explaining and interpreting become extremely necessary for internal use, such as by man-
agers and programmers, and external stakeholders, such as creditors, shareholders, credit
rating agencies, and regulators.
Recently, studies have been conducted to enhance the explainability of the machine
learning models, but they mainly focused on P2P loans and the SME loans market. In this
study, we aim to apply machine learning and enhance the explainability of forecasting
results on the data of listed companies in Vietnam. Our study contributes to two new
points to the area of risk forecasting. Firstly, to our best knowledge, this is a pioneering
study in applying machine learning to predict financial distress in the dataset of companies
in Vietnam. Second, we also found important features that explain the forecast results
using the SHAP values. Based on the results, we also gained more valuable information to
improve the risk assessment process of debt issuers.
The rest of this study is organized as follows. Section 2reviews the literature on
financial distress prediction, the introduction of explanatory techniques, and highlights the
contribution of previous research. Section 3presents the methodologies and techniques
used in this research. Section 4shows the results of the prediction and interpretation.
Section 5concludes with conclusions and limitations of this study.
2. Literature Review
2.1. Literature Review on Financial Distress Prediction
Default risk prediction models based on statistical techniques were built and devel-
oped in the late 1960s. Beaver [
1
] applied regression models to determine 30 financial ratios
that significantly impact the corporate default risk. Later, Altman [
2
] improved Beaver’s
work by developing a multiple discriminant analysis method to predict bankruptcy. He
built a Z-score model that employed a discriminant function to classify the observation.
However, discriminant model also has some disadvantages, such as (i) assuming a linear
relationship between the independent variable and the dependent variable; (ii) the results
are difficult to interpret and cannot quantify the level of risk between different groups. In
1980, Ohlson pioneered applied logistic regression models to predict the probability of
default of corporates. The advantage of this model is that the outputs are the borrower’s
probability of default, but the accuracy of the model is not always high [11].
Because the credit analysis process is similar to pattern recognition problems, machine
learning algorithms have been employed to classify borrowers’ creditworthiness [
11
].
Having less restrictive constraints than Altman and Olson’s model, support vector machines
(SVM) have been developed to solve the classification problem. [
12
]. Chen et al. [
13
] used
the SVM model to predict the bankruptcy risk of German firms. The study proved that
the SVM model produced better results than the traditional logit model. Moreover, the
authors found that the SVM model can better exploit the nonlinear relationships between
coefficients and default risk than traditional models such as discriminant or logit models.
Shin et al. [
14
] also applied SVM to predict bankruptcy for 2320 medium-sized enterprises
at the Korea Credit Guarantee Fund from 1996 to 1999. The results showed that SVM
brought better predictive results than other models, including artificial neural network
(ANN) models.
Zhao et al. [
15
] conducted a study to build a credit scoring system based on ANN on the
German credit data dataset. The results showed that ANN could predict credit scores more
accurately than traditional models, with an efficiency of 87%. Geng et al. [
16
] used machine
learning models to predict financial distress for firms listed on the Shanghai and Shenzhen
stock exchanges from 2001 to 2008. They found that the ANN model produced better results
than decision trees, SVM, and assembled models. Barboza et al. [
17
] used SVM, bagging,
boosting, and random forest methods to predict the bankruptcy of
10,000 companies
in the
North American market from 1985 to 2013 and compared them with traditional statistical
Data 2022,7, 160 3 of 12
models. The results showed that bagging, boosting, and random forest outperforms the
others. Specifically, machine learning models had an average accuracy of about 10% higher
than traditional models. In more detail, the random forest model had an accuracy of up
to 87%, while the traditional model had an accuracy of 50% to 69%. Chakraborty and
Joseph [
18
] constructed a predictive model of financial distress based on balance sheet
items. They found that the random forest model showed better results than 10 percent
measured by AUC ROC. Similarly, Fuster et al. [
19
] studied mortgage defaults in the US
and found that a random forest model had more accurate predictive results than the logistic
model. Based on recent research by Dubyna et al. [
20
], the use of technologies in the
provision of financial services is extremely important, influencing the transformation of
the financial behavior of customers and the business models of financial institutions. In
addition, research by Zhavoronok et al. [
21
] shows that innovation processes affect the
financial behavior patterns of households in innovative economies in different meanings
and forms.
Previous studies have proved that machine learning models yield better results than
traditional statistical models. However, the results are not consistent and depend on the
data set used in the study. In addition, machine learning models also have drawbacks
such as (i) they do not work well for unbalanced data because they tend to classify many
observations into classes with more larger data; (ii) model accuracy increases with a more
extensive training dataset, but validation is insufficient to meet a certain rate (iii) selecting
hidden later layers is problematic, resulting in a trade-off between computation time and
high prediction rate [15].
2.2. Literature Review on Explanation
Interpretability is the ability to explain or present in terms that can be understood
by humans [
22
]. Miller [
23
] defined explainability as the degree to which people can
understand the causes of a decision. Thus, an interpretable system is a system that provides
knowledge that helps people understand how it works and can interpret the results of
a specific forecast.
Since 2019, studies have focused on the explanatory power of deep learning models
in predicting default. Bracke et al. [
24
] used the gradient tree boosting model to predict
default on mortgage loans in the UK. They introduced a new method named quantitative
input influence (QII) to evaluate the contribution of the input variables to the target variable
by calculating the Shapley values. The authors showed that this method could provide
a detailed explanation of the degree of impact of the variables on different customer groups.
Later, some studies used the Shapley values to measure the contribution of variables
in the model to the target variable. Babaei et al. [
25
] applied machine learning to predict
default for small and medium enterprises. The authors eliminated variables with low
explanatory Sharley values. The results showed that defaults and expected returns of these
companies are better forecasted with a smaller amount of input variables from the financial
statements. Bussmann et al. [
26
] applied XGB machine learning, correlation networks, and
Shapley computation on a sample of 15,000 SMEs in Southeast Europe. The results showed
that forecasting efficiency could be improved by understanding the factors that affect credit
risk in the model.
Additionally, some studies used Shapley Additive exPlanations (SHAP) and Local
Interpretable Model-Agnostic Explanations (LIME) methods to compare the explanatory
power of the variables in the model [
27
,
28
]. Ariza-Garzón et al. [
29
] compared the predictive
power of machine learning algorithms (decision trees, random forests, XGBoost) with
logistic regression models on personal loans from Lending Club company. Then, they
evaluated the contribution of variables in the model through SHAP and LIME methods. The
results showed that when applying SHAP to machine learning methods, the explanatory
power of these models was improved, even reflecting the nonlinear relationships better
than the traditional logistic regression model. A similar study was conducted by Hadji
Misheva et al. [
30
], and the authors also found the same results. They concluded that
Data 2022,7, 160 4 of 12
explanatory results were stable and consistent with the logical explanations in finance. This
study applied the SHAP method to interpret machine learning models’ results.
3. Methodology
3.1. Data and Data Processing
In this study, we used data extracted from the financial statements of Vietnamese
companies listed on the Ho Chi Minh Stock Exchange, Hanoi Stock Exchange, and UPCOM.
Data were collected from 2010 to 2021.
Financial distress companies are identified based on criteria such as negative equity,
EBITDA on interest being less than one for two consecutive years, and operational income
being negative for three consecutive years. In addition, we also consult the external
auditor’s conclusions in the financial statements and filter out companies suspected of not
being able to operate continuously. Finally, the selected insolvency companies meet the
above criteria and have sufficient financial data during the observation period to conduct
the research. Companies will be labeled one if they are in the financial distress group and
zero for the others.
Based on the study of Chakraborty and Joseph [
18
], and Standard & Poor’s evaluation
criteria, we used 25 financial ratios as input features for predictive machine learning models
(Appendix A). These ratios reflect essential aspects of companies, such as liquidity, financial
risk, business risk, and the market factor, that are expected to affect the debt repayment
capacity of companies.
The data were preprocessed for missing values and outliers. We also excluded fi-
nancial, insurance, accounting, and banking companies because of differences in financial
statements. The data had 3277 observations, of which 436 companies were in financial
distress (13.3%), and 2841 companies were in the group of non-financial distress (86.7%).
Because the data were unbalanced, we used SMOTE technique to handle the unbalance
data problem. Finally, the data were divided into training and validation sets, with 70%
and 30%, respectively.
We implemented the models using Python 3.5 and other Python packages ori-
ented to data analysis, including Numpy1.19.3, Pandas 1.5.1, Scikit-learn 1.1.3, and
Seaborn 0.12.1 [3134]
. For interpreting the results, we use Shap packages to compute
Shapley values and visualize the results [35].
3.2. Machine Learning Methods to Predict Financial Distress
In this research, we employed statistical methods and machine learning to predict the
distress of businesses, including logistic regression, support vector machine, decision tree,
random forest, artificial neural network, and extreme gradient boosting. The details of the
methods are presented as follows.
3.2.1. Logistic Regression
Logistic regression is a popular statistical technique for forecasting problems where
the dependent variable is binary, specifically, the financial distress status in this study. The
output of the model is the probability of financial distress
Pn
, corresponding to the input
features X. This probability is calculated as Equation (1).
Pn(y=1)=1
1+e(β0+β1X1+...+βkXk)(1)
Logistic regression is often used as a benchmark in research to compare with other
forecasting methods. The advantage of logistic regression is that the results are easy to
interpret and understand for most users. In other words, this is one of the models with
high explanatory power, so it is often used in practice at financial institutions.
Data 2022,7, 160 5 of 12
3.2.2. Support Vector Machine
Support vector machines (SVMs) are based on the idea of defining hyperplanes that
decompose observations into high-dimensional feature spaces. Linear SVMs models focus
on maximizing the margin between positive and negative hyperplanes. The classification
process will take place according to Equation (2).
yi=+1 if b+αTx+1
1 if b+αTx 1(2)
where b is the bias.
For nonlinear cases, a kernel is used to project features into a high-dimensional space.
For example, a traditional Kernel function, a Gaussian radial basis, has the following
Equation (3).
K(x,xi)=expγ||xy||2(3)
The strength of SVM is that it avoids overfitting with small samples and is less sensitive
to unbalanced distributions.
3.2.3. Decision Tree
Decision tree algorithms extract information from data to derive decision rules in the
form of a tree structure. More specifically, the decision tree algorithm determines the best
allocation to optimize each split with maximum purity based on a measure, such as the
Gini Index or Entropy Index. The root of a decision tree is called the root node, the most
distinguishable attribute. Leaf nodes represent classes, which are the following attributes.
The decision tree model has the advantage that model is intuitive and interpretable.
However, the drawback is that this model is more prone to overfitting during the feature
domain division or the branching process.
3.2.4. Random Forest
Breiman [
8
] developed the random forest technique based on the decision tree model.
In this method, many decision trees are constructed using subsets of randomly selected
features. The sample and feature subsets are randomly selected to ensure the diversity of
the classifiers. Then, the random forest is built for several subsets that generate the same
number of classification trees. The preferred class is defined by a majority of votes; thus,
the results are more precise and, most importantly, avoid model overfitting [8].
3.2.5. Extreme Gradient Boosting (XGB)
Gradient boosting is a machine-learning technique used in regression and classification
tasks. It gives a prediction model in the form of an ensemble of weak prediction models,
typically decision trees. Extreme gradient boosting constructs decision trees in parallel and
incorporates complexity control in the loss function to control overfitting and achieve better
performance results. The optimization function to minimize is as follows in Equation (4).
Lt=
n
k=1
lyk,yt1
k+φt(xk)+(φt)(4)
where
l(.)
is a loss function.
(φt)
is a regularization term that penalizes the complexity of
the model. The goal is to find the φtthat minimized the function Lt.
3.2.6. Artificial Neural Network
An artificial neural network (also known as a neural network) is a machine learning
algorithm designed based on the idea of how an organism’s brain works. The algo-
rithm solves complex problems by mimicking the brain’s structure and the connections
between neurons.
Data 2022,7, 160 6 of 12
An artificial neural network consists of connections between many layers of artificial
neurons. Each layer is divided into an input layer, an output layer, and a hidden layer.
These artificial neurons simulate the role of human neurons through mathematical models.
Each artificial neuron receives an input signal,
x1
,
x2
,
. . .
,
xj
, consisting of the numbers
0 and 1, then it estimates the weighted sum of the signals it receives according to their
weights,
w1
,
w2
,
. . . wj
. A signal is only transmitted to the next artificial neuron when the
sum of the weights of the received signals exceeds a certain threshold. An artificial neuron
can be represented as the Equation (5).
yi=output =
0i f
j
wjxjthreshold
1i f
j
wjxj>threshold
(5)
Based on historical data, neural network optimization is conducted by determining
weights and thresholds for activation.
3.3. Explainability Methods
SHApely Additive exPlanation (SHAP) is applied to meet interpretation requirements.
This algorithm aims to build a linear model explaining the feature importance for a given
prediction by computing Shapley sampling values. The SHAP values are calculated based
on cooperative game theory in order to explain the prediction through the marginal contri-
bution of each feature. The SHAP model can be represented as a linear combination of the
binary variables in the following Equation (6).
gz0=Φ0+
M
i=1
Φiz0
i(6)
where
g
is an explanatory model,
z0e{0, 1}M
is the coalition vector, M is the maximum
number of features, the ith feature has a contribution (
z0=
1) or not
(z0=0)
.
Φi
is the
SHAP value of the ith feature, representing the contribution of the ith feature and can be
calculated according to the following Equation (7).
Φi(f,x)=
SN
|S|!(M |S| 1)!
M![fx(S{i})fx(S)](7)
where N is the set of all features, |S| represents the number of features in feature subset
S
excluding the ith feature.
fx(S)
represents the result of the machine learning model f
training in feature subset S.
SHAP is an interpretation technique that works very well on structured data with
a limited number of features. SHAP can be interpreted at the global level and on a specific
data point. At the global level, feature importance is determined by the average absolute
values per feature. In this research, TreeSHAP was employed to compute SHAP values and
explain the output of the decision tree and XGBoost models. We chose TreeSHAP because
it is a fast and exact method to estimate SHAP values for tree models and ensembles
of trees [
35
]. Moreover, although the tree-based methods, e.g., XGBoost and random
forest have their permutation feature importance values, the SHAP values have significant
differences with such scales. Permutation feature importance is based on the decrease in
model performance. SHAP is based on the magnitude of feature attributions.
Data 2022,7, 160 7 of 12
3.4. Evaluation of Model Performance
To evaluate the model’s performance, we use the following performance metrics.
Accuracy—The proportion of correct classification in the evaluation data
Accuracy =TP +TN
TP +T N +FP +FN (8)
Precision—The proportion of true positives among the predicted positives
Precision =TP
TP +FP (9)
Sensitivity (Recall)—The proportion of positives correctly predicted
Recall =TP
TP +F N (10)
F1Score—The harmonic mean of precision and recall.
F1Score =2×Precision ×Recall
Precision +Recall (11)
The ROC plots the true positive rate to the false positive rate.
Area under the receiver operating curve (AUC)—The receiver operating curve (ROC)
measures the model’s classification ability subject to varying decision boundary thresh-
olds. The area under the curve (AUC) aggregates the performance measures given
by the ROC curve. AUC also helps to provide criteria for evaluating and comparing
models: AUC had to be more than 0.5 for the model to be acceptable, and the close
to 1, the stronger its predictive power.
4. Results and Discussions
4.1. Prediction Results
Table 1presents the hyper-parameter settings and the evaluation of the models on
the model performance metrics. According to Abellán and Castellano [
36
], the accuracy
measure may not be accurate because it does not consider that false positives are more
important than false negatives. So, precision and recall are better measures of the model
performance, which is more sensitive to the imbalanced dataset. This research also uses the
balanced F-score (F1 score), the harmonic mean of precision and recall.
XGB and random forest also have higher recall and F1 scores than other models,
indicating that both models are good at predicting positive values. In contrast, logistic
regression, ANN, and SVM have relatively low sensitivity values, indicating that these
models have higher Type I errors. Interestingly, SVM has the highest value (0.9427),
showing the ability to predict the positive accuracy of the predicted values.
Based on AUC values, it can be seen that random forest has the highest AUC value
(0.9788), followed by extreme gradient boosting (0.9702), showing that these two models
have better classification ability than other models. These results are similar to the results
of Barboza et al. [17]; Chakraborty and Joseph [18]; Fuster et al. [19].
Figure 1shows that the ROC curve of random forest and XGB closer to the top left
corner indicate better performance than other models. It is noted that the ROC does not
depend on the class distribution, so it helps evaluate classifiers predicting rare events such
as default risk or financial distress risk.
Data 2022,7, 160 8 of 12
Table 1. The performance results of classifiers.
Algorithms Hyper-Parameter AUC Accuracy Precision Recall F1 Score
1
Extreme
Gradient
Boosting
booster = “gbtree”,
n_estimator = 100, max_depth = 1,
random_state = 42
0.9702 0.9566 0.8726 0.8354 0.8536
2Random
Forest
max_depth = 14,n_estimators = 100,
random_state = 42 0.9788 0.9529 0.8535 0.8272 0.8401
3Logistic
Regression random_state = 42 0.9303 0.8623 0.8854 0.5148 0.6511
4
Artificial
Neural
Network
n_hidden = 2, max_iter = 200,
activations = relu, Optimizer = adam
0.9034 0.9168 0.8025 0.6811 0.7368
5Decision
Trees
Criterion = “gini”, max_depth = 14,
random_state = 42 0.8848 0.9251 0.828 0.7065 0.7625
6
Support
Vector
Machine
Kernel = “rbf”, probability = True,
class_weight = “balanced”,
random_state = 42
0.7889 0.8789 0.9427 0.4022 0.5815
Source: author’s calculation.
Data 2022, 7, x FOR PEER REVIEW 8 of 12
6
Support Vector
Machine
Kernel = rbf, probabil-
ity = True, class_weight =
balanced, ran-
dom_state = 42
0.7889
0.8789
0.9427
0.5815
Source: author’s calculation.
XGB and random forest also have higher recall and F1 scores than other models, in-
dicating that both models are good at predicting positive values. In contrast, logistic re-
gression, ANN, and SVM have relatively low sensitivity values, indicating that these mod-
els have higher Type I errors. Interestingly, SVM has the highest value (0.9427), showing
the ability to predict the positive accuracy of the predicted values.
Based on AUC values, it can be seen that random forest has the highest AUC value
(0.9788), followed by extreme gradient boosting (0.9702), showing that these two models
have better classification ability than other models. These results are similar to the results
of Barboza et al. [17]; Chakraborty and Joseph [18]; Fuster et al. [19].
Figure 1 shows that the ROC curve of random forest and XGB closer to the top left
corner indicate better performance than other models. It is noted that the ROC does not
depend on the class distribution, so it helps evaluate classifiers predicting rare events such
as default risk or financial distress risk.
Figure 1. ROC of classifiers. Source: author’s calculation.
4.2. Interpretation Results
We calculated SHAP values on two models with the best predictive results, random
forest and XGBoost. We calculated the average Shapley values across all observations to
obtain an “overall or “global” explanation. This technique was used in the research of
Kim and Shin [5] and Bussmann et al. [26].
Figure 2 shows that four of the five important features are the same between the two
models. They are long-term debts to equity (X4), account payable to equity (X10), enter-
prise value to revenues (X22), and diluted EPS (X25). Thus, the important features deter-
mined based on Shapley values are relatively stable between XGB and random forest
models.
Figure 1. ROC of classifiers. Source: author ’s calculation.
4.2. Interpretation Results
We calculated SHAP values on two models with the best predictive results, random
forest and XGBoost. We calculated the average Shapley values across all observations to
obtain an “overall” or “global” explanation. This technique was used in the research of
Kim and Shin [5] and Bussmann et al. [26].
Figure 2shows that four of the five important features are the same between the
two models. They are long-term debts to equity (X4), account payable to equity (X10), en-
terprise value to revenues (X22), and diluted EPS (X25). Thus, the important features
determined based on Shapley values are relatively stable between XGB and random
forest models.
Data 2022,7, 160 9 of 12
Data 2022, 7, x FOR PEER REVIEW 9 of 12
SHAP on XGB
SHAP on Random Forest
Figure 2. The feature importance of XGB and random forest. Source: author’s calculation.
Figure 3a illustrates the influence of long-term debts to equity (X4) on the prediction
results. X4 reflects the leverage risk of the company in the long term. If the leverage is
high, the company is under tremendous pressure to repay debts and is prone to liquidity
risk when the economy is in recession. Figure 3a shows the positive relationship between
the X4 and the SHAP values. When the X4 increases, the SHAP values increases, indicat-
ing that the probability of financial distress also increases. This phenomenon is in line with
the knowledge of financial experts.
(a) Depence plot for X4
(b) Depence plot for X10
(c) Depence plot for X22
(d) Depence plot for X25
Figure 3. The SHAP dependence plot of single feature. Source: author’s calculation.
Figure 3b displays the dependence plot for account payable to equity (X10). X10 re-
flects the default risk in the short term. If X10 is high, the company is under pressure to
pay short-term debt obligations, which may lead to liquidity risk. When X10 is under 2.5,
Figure 2. The feature importance of XGB and random forest. Source: author’s calculation.
Figure 3a illustrates the influence of long-term debts to equity (X4) on the prediction
results. X4 reflects the leverage risk of the company in the long term. If the leverage is high,
the company is under tremendous pressure to repay debts and is prone to liquidity risk
when the economy is in recession. Figure 3a shows the positive relationship between the
X4 and the SHAP values. When the X4 increases, the SHAP values increases, indicating
that the probability of financial distress also increases. This phenomenon is in line with the
knowledge of financial experts.
Data 2022, 7, x FOR PEER REVIEW 9 of 12
SHAP on XGB
SHAP on Random Forest
Figure 2. The feature importance of XGB and random forest. Source: author’s calculation.
Figure 3a illustrates the influence of long-term debts to equity (X4) on the prediction
results. X4 reflects the leverage risk of the company in the long term. If the leverage is
high, the company is under tremendous pressure to repay debts and is prone to liquidity
risk when the economy is in recession. Figure 3a shows the positive relationship between
the X4 and the SHAP values. When the X4 increases, the SHAP values increases, indicat-
ing that the probability of financial distress also increases. This phenomenon is in line with
the knowledge of financial experts.
(a) Depence plot for X4
(b) Depence plot for X10
(c) Depence plot for X22
(d) Depence plot for X25
Figure 3. The SHAP dependence plot of single feature. Source: author’s calculation.
Figure 3b displays the dependence plot for account payable to equity (X10). X10 re-
flects the default risk in the short term. If X10 is high, the company is under pressure to
pay short-term debt obligations, which may lead to liquidity risk. When X10 is under 2.5,
Figure 3. The SHAP dependence plot of single feature. Source: author ’s calculation.
Data 2022,7, 160 10 of 12
Figure 3b displays the dependence plot for account payable to equity (X10). X10
reflects the default risk in the short term. If X10 is high, the company is under pressure
to pay short-term debt obligations, which may lead to liquidity risk. When X10 is under
2.5, the SHAP values increase, implying that the likelihood of financial distress increases.
However, when the X10 is over 2.5, the SHAP values tend to be stable. This is an interesting
phenomenon from observable data, different from experts’ expectations.
Figure 3c shows the influence of enterprise value on revenues (X22) on the prediction
results. This ratio measures how much it would cost to purchase a company’s value relative
to its revenues. If EV/R increases, the company is overvalued. When the value of X22 is
close to 0, the variation of the SHAP value is high. As X22 increases, the SHAP value also
gradually increases. However, after X22 is greater than 0.6, the SHAP values tend to be flat.
Thus, increasing of X22 will influence SHAP values when X22 is low, but this effect will
decrease when X22 is high.
Diluted EPS considers what would happen if dilutive securities were exercised.
Figure 3d
exhibits the dependent plot for diluted EPS (X25). It can be seen in Figure 3d that
there exists a negative relationship between X25 and SHAP values. When X25 increases,
SHAP values tend to decrease, reducing the probability of financial distress. However,
SHAP values have fluctuated when X25 is less than 0 and greater than 4000.
5. Conclusions
In this study, we employed machine learning models to predict financial distress in
listed companies in Vietnam from 2010 to 2021. The results showed that XGB and random
forest were two models with higher recall, F1 scores and AUC than other models. In
addition, we also used SHAP values to analyze the impacts of each feature on the forecast
results. Features such as long-term debts to equity (X4), account payable to equity (X10),
enterprise value to revenues (X22), and diluted EPS (X25) showed an significant impact on
forecast results and were generally in accordance with the knowledge from experts.
Based on this study, managers, policymakers, and credit rating agencies have equipped
tools to understand and interpret results from complex machine learning models. This
research has shed light on using XAI to make decisions in economics and finance.
The study also has some limitations, such as low sample size, especially low proportion
in financial distress companies. We hope the following studies can expand the sample size
by researching countries with similar characteristics. Moreover, the research sample can be
expanded to other fields, such as consumer lending or P2P lending.
In addition, the features used in this study are financial indicators, which are based
on the assumption that information about companies is reflected in the financial position.
However, this assumption is unrealistic in Vietnam, whose financial market is inefficient.
We hope that the following studies can add more behavioral features, such as ownership
structure, number of independent BOD members, industry, and diversity of business lines.
Author Contributions:
Conceived the idea, wrote the Introduction, D.T.N.; wrote literature review,
H.A.L.; wrote methodology, results and discussions, conclusion, K.L.T.; revised the manuscript,
T.H.N. All authors have read and agreed to the published version of the manuscript.
Funding:
The study was supported by The Youth Incubator for Science and Technology Program,
managed by the Youth Development Science and Technology Center—Ho Chi Minh Communist
Youth Union and Department of Science and Technology of Ho Chi Minh City, the contract number is
“14/2021/ HÐ-KHCNT-VU”.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement:
The data for this study can be found on our GitHub page: https://
github.com/anhle32/Explainable-Machine-Learning-.git (accessed on 12 October 2022).
Conflicts of Interest: The authors declare no conflict of interest.
Data 2022,7, 160 11 of 12
Appendix A
Table A1. Summary of Variables.
Symbol Input Features Category
X1 Cash Ratio Liquidity risk
X2 Quick Ratio Liquidity risk
X3 Current Ratio Liquidity risk
X4 Long term Debts to Equity Financial risk
X5 Long term Debts to Total Assets Financial risk
X6 Total Liabilities to Equity Financial risk
X7 Total Liabilities to Total Assets Financial risk
X8 Short term Debt to Equity Financial risk
X9 Short term Debt to Total Assets Financial risk
X10 Account Payable to Equity Business Risk
X11 Account Payable to Total Assets Business Risk
X12 Total Assets to Total Liabilities Business Risk
X13 EBITDA to Short term Debt and Interest Business Risk
X14 Price to Earning Market factor
X15 Diluted Price to Earning Market factor
X16 Price to Book Value Market factor
X17 Price to Sales Market factor
X18 Price to Tagible Book Value Market factor
X19 Market Capital Market factor
X20 Price to Cashflow Market factor
X21 Enterprise Value Valuation
X22 Enterprise Value to Revenues Valuation
X23 Enterprise Value to EBITDA Valuation
X24 Enterprise Value to EBIT Valuation
X25 Diluted EPS Valuation
References
1. Beaver, W.H. Financial Ratios as Predictors of Failure. J. Account. Res. 1966, 71–111. [CrossRef]
2.
Altman, E.I. Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. J. Financ.
1968
,23,
589–609. [CrossRef]
3. Ohlson, J.A. Financial Ratios and the Probabilistic Prediction of Bankruptcy. J. Account. Res. 1980,18, 109–131. [CrossRef]
4. Cox, D.R. Regression Models and Life-tables. J. R. Stat. Soc. Ser. B (Methodol.) 1972,34, 187–202. [CrossRef]
5.
Kim, D.; Shin, S. The Economic Explainability of Machine Learning and Standard Econometric Models-an Application to the US
Mortgage Default Risk. Int. J. Strateg. Prop. Manag. 2021,25, 396–412. [CrossRef]
6.
Olson, D.L.; Delen, D.; Meng, Y. Comparative Analysis of Data Mining Methods for Bankruptcy Prediction. Decis. Support Syst.
2012,52, 464–473. [CrossRef]
7.
Chen, H.-J.; Huang, S.Y.; Lin, C.-S. Alternative Diagnosis of Corporate Bankruptcy: A Neuro Fuzzy Approach. Expert Syst. Appl.
2009,36, 7710–7720. [CrossRef]
8. Breiman, L. Random Forests. Mach. Learn. 2001,45, 5–32. [CrossRef]
9. Freund, Y.; Schapire, R.; Abe, N. A Short Introduction to Boosting. J. -Jpn. Soc. Artif. Intell. 1999,14, 1612.
10.
Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K. Xgboost: Extreme Gradient Boosting. R Package
Version 0.4-2 2015,1, 1–4.
11.
Kruppa, J.; Schwarz, A.; Arminger, G.; Ziegler, A. Consumer Credit Risk: Individual Probability Estimates Using Machine
Learning. Expert Syst. Appl. 2013,40, 5125–5131. [CrossRef]
12.
Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1999;
ISBN 0-387-98780-0.
13.
Chen, S.; Härdle, W.K.; Moro, R.A. Modeling Default Risk with Support Vector Machines. Quant. Financ.
2011
,11, 135–154. [CrossRef]
14.
Shin, K.-S.; Lee, T.S.; Kim, H. An Application of Support Vector Machines in Bankruptcy Prediction Model. Expert Syst. Appl.
2005,28, 127–135. [CrossRef]
15.
Zhao, Z.; Xu, S.; Kang, B.H.; Kabir, M.M.J.; Liu, Y.; Wasinger, R. Investigation and Improvement of Multi-Layer Perceptron Neural
Networks for Credit Scoring. Expert Syst. Appl. 2015,42, 3508–3516. [CrossRef]
16.
Geng, R.; Bose, I.; Chen, X. Prediction of Financial Distress: An Empirical Study of Listed Chinese Companies Using Data Mining.
Eur. J. Oper. Res. 2015,241, 236–247. [CrossRef]
17.
Barboza, F.; Kimura, H.; Altman, E. Machine Learning Models and Bankruptcy Prediction. Expert Syst. Appl.
2017
,83,
405–417. [CrossRef]
Data 2022,7, 160 12 of 12
18. Chakraborty, C.; Joseph, A. Machine Learning at Central Banks; SSRN: Amsterdam, The Netherlands, 2017.
19.
Fuster, A.; Goldsmith-Pinkham, P.; Ramadorai, T.; Walther, A. Predictably Unequal? The Effects of Machine Learning on Credit
Markets. J. Financ. 2022,77, 5–47. [CrossRef]
20.
Dubyna, M.; Popelo, O.; Kholiavko, N.; Zhavoronok, A.; Fedyshyn, M.; Yakushko, I. Mapping the Literature on Financial
Behavior: A Bibliometric Analysis Using the VOSviewer Program. WSEAS Trans. Bus. Econ. 2022,19, 231–246. [CrossRef]
21.
Zhavoronok, A..; Popelo, O.; Shchur, R.; Ostrovska, N.; Kordzaia, N. The Role of Digital Technologies in the Transformation
of Regional Models of Households’ Financial Behavior in the Conditions of the National Innovative Economy Development.
Ingénierie Des Systèmes D’Inf. 2022,27, 613–620. [CrossRef]
22. Doshi-Velez, F.; Kim, B. Towards a Rigorous Science of Interpretable Machine Learning. arXiv 2017, arXiv:1702.08608.
23. Miller, T. Explanation in Artificial Intelligence: Insights from the Social Sciences. Artif. Intell. 2019,267, 1–38. [CrossRef]
24.
Bracke, P.; Datta, A.; Jung, C.; Sen, S. Machine Learning Explainability in Finance: An Application to Default Risk Analysis; SSRN:
Amsterdam, The Netherlands, 2019.
25. Babaei, G.; Giudici, P.; Raffinetti, E. Explainable Fintech Lending; SSRN: Amsterdam, The Netherlands, 2021.
26.
Bussmann, N.; Giudici, P.; Marinelli, D.; Papenbrock, J. Explainable Machine Learning in Credit Risk Management. Comput. Econ.
2021,57, 203–216. [CrossRef]
27.
Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the 31st International
Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4 December 2017; pp. 4768–4777.
28.
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?” Explaining the Predictions of Any Classifier. arXiv
2016
,
arXiv:1602.04938.
29.
Ariza-Garzón, M.J.; Arroyo, J.; Caparrini, A.; Segovia-Vargas, M.-J. Explainability of a Machine Learning Granting Scoring Model
in Peer-to-Peer Lending. IEEE Access 2020,8, 64873–64890. [CrossRef]
30.
Hadji Misheva, B.; Hirsa, A.; Osterrieder, J.; Kulkarni, O.; Fung Lin, S. Explainable AI in Credit Risk Management. Credit. Risk
Manag. 2021. [CrossRef]
31.
Harris, C.R.; Millman, K.J.; Van Der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.
Array Programming with NumPy. Nature 2020,585, 357–362. [CrossRef]
32.
McKinney, W. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference,
Austin, TX, USA, 28 June–3 July 2010; pp. 51–56.
33.
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.
Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011,12, 2825–2830.
34.
Waskom, M.; Botvinnik, O.; O’Kane, D.; Hobson, P.; Lukauskas, S.; Gemperline, D.C.; Augspurger, T.; Halchenko, Y.; Cole, J.B.;
Warmenhoven, J. Mwaskom/Seaborn: V0. 8.1 (September 2017). Zenodo 2017. [CrossRef]
35.
Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From Local
Explanations to Global Understanding with Explainable AI for Trees. Nat. Mach. Intell. 2020,2, 56–67. [CrossRef]
36.
Abellán, J.; Castellano, J.G. A Comparative Study on Base Classifiers in Ensemble Methods for Credit Scoring. Expert Syst. Appl.
2017,73, 1–10. [CrossRef]
... Cho and Shin (2023); Grath et al. (2018) generate contrastive explanations to explain required changes for certain predictions. Bussmann et al. (2021) visualizes similar outcomes that describe the risk of a company default with SHAP, while most SHAP-based methods (Babaei et al. 2022;Benhamou et al. 2021;Bracke et al. 2019;Bueff et al. 2022;Bussmann et al. 2020;Carta et al. 2022;Demajo et al. 2020;Dikmen and Burns 2022;Fior et al. 2022;Fritz-Morgenthal et al. 2022;Gramegna and Giudici 2020;Islam et al. 2019;Kumar et al. 2022;Lachuer and Jabeur 2022;Park and Yang 2022;Müller et al. 2022;Rizinski et al. 2022;Tran et al. 2022;Vivek et al. 2022;Wand et al. 2022;Weng et al. 2022;Yasodhara et al. 2021) improve accessibility for technical audiences by discovering important features. Fior et al. (2022) improve usability by constructing interactive graphical tools upon SHAP, which likewise promotes accessibility. ...
... [right] An example of instance-level, E(f(X)) represents the model's base prediction if no features were considered, and f(x) represents the final prediction after summing the contributing features (ϕ i )(Rizinski et al. 2022) Gramegna and Giudici (2020) identify relevant features leading to consumers' decision on purchasing insurance and further clusters them into least to most likely groups with Shapley values.Bussmann et al. (2020) similarly implement SHAP to explain XGBoost's classification of credit risk, while comparing it against an interpretable logistic regression model. Other studies include discovering the relationship between corporate social responsibility and financial performance(Lachuer and Jabeur 2022), customer satisfaction(Rallis et al. 2022), GDP growth rates (Park and Yang 2022), stock trading(Benhamou et al. 2021;Kumar et al. 2022), financial distress(Tran et al. 2022), market volatility forecast(Weng et al. 2022) and credit evaluation(Rizinski et al. 2022;Bueff et al. 2022;Fritz-Morgenthal et al. 2022).Wand et al. (2022) perform K-means clustering on historical S&P 500 stock information to identify dominant sector correlations that describe the state of the market. This work applies Layer-wise Relevance Propagation (LRP)(Bach et al. 2015), after transforming the clustering classifier into a neural network since LRP is designed to work specifically with neural network architectures.Carta et al. (2022) prune unimportant technical indicators using different configurations of a permutation importance technique, before implementing decision tree techniques for stock market forecasting. ...
Article
Full-text available
The success of artificial intelligence (AI), and deep learning models in particular, has led to their widespread adoption across various industries due to their ability to process huge amounts of data and learn complex patterns. However, due to their lack of explainability, there are significant concerns regarding their use in critical sectors, such as finance and healthcare, where decision-making transparency is of paramount importance. In this paper, we provide a comparative survey of methods that aim to improve the explainability of deep learning models within the context of finance. We categorize the collection of explainable AI methods according to their corresponding characteristics, and we review the concerns and challenges of adopting explainable AI methods, together with future directions we deemed appropriate and important.
... Random Forest [7] is particularly suitable due to its robustness in handling complex, nonlinear relationships; overfitting multicollinearity; effectively managing categorical data; and significantly enhancing predictive accuracy over traditional methods such as logistic regression or discriminant analysis. Comparative analysis of machine learning models for bankruptcy prediction indicated that Random Forests provided superior predictive accuracy compared to traditional models like logistic regression and discriminant models and even artificial neural networks and support vector machines [42,43]. The Random Forest model is specified as follows: ...
Article
Full-text available
This paper investigates the effectiveness of machine learning algorithms in enhancing the accuracy and reliability of predicting financial distress. The dataset includes Altman Z-Scores and Corporate Governance Compliance (CGC) indicators calculated for manufacturing firms listed on the Bucharest Stock Exchange (BSE) from 2016 to 2022. Leveraging Signaling Theory, the study analyzes financial and governance data for 60 non-financial firms, comprising 420 firm-year observations. Financial distress is classified into three categories: no distress, moderate distress, and severe distress. The study employs a Random Forest classification model, leveraging artificial intelligence techniques to identify critical predictive variables and evaluate their combined effectiveness in signaling financial distress. The findings reveal that machine learning algorithms significantly improve the predictive accuracy and reliability of financial distress classifications, effectively distinguishing between different distress levels by integrating financial ratios and corporate governance variables. These results emphasize the advantages of involving artificial intelligence and advanced analytics in financial distress prediction models, enhancing transparency and strengthening investor confidence. The research contributes to the literature on digital transformation in financial analysis and corporate governance, offering practical implications for investors, managers, creditors, and policymakers in emerging market environments.
... For small-cap firms, total assets growth rate emerged as the most influential predictor, whereas large-cap companies exhibited greater dependence on longterm debt-to-asset ratios and liquidity exceeding total assets ratios. Tran et al. [41] utilized SHAP to isolate long-term debt ratios as key predictors of Vietnamese financial distress, while Kumar et al. [42] applied SHAP to interpret deep Q-network (DQN) allocation logic in Sensex/Dow Jones trading strategies. Our hybrid architecture integrates SHAP analysis to illuminate prediction logic, directly addressing the opacity critique prevalent in existing literature. ...
Article
Full-text available
The inherent uncertainty and information asymmetry in financial markets create significant challenges for accurate price forecasting. Although investor sentiment analysis has gained traction in recent research, the temporal dimension of sentiment dynamics remains underexplored. This study develops a novel framework that enhances stock price prediction by integrating time-partitioned investor sentiment, while improving model interpretability via Shapley additive explanations (SHAP) analysis. Employing the ERNIE (enhanced representation through knowledge integration) 3.0 model for sentiment extraction from China’s Eastmoney Guba stock forum, we quantitatively distinguish intraday and post-market investor sentiment then integrate these temporal components with technical indicators through neural network architecture. Our results indicate that temporal sentiment partitioning effectively reduces uncertainty. Empirical evidence demonstrates that our long short-term memory (LSTM) model integrating intraday and post-market sentiment indicators achieves better prediction accuracy, and SHAP analysis reveals the importance of intraday and post-market investor sentiment to stock price prediction models. Implementing quantitative trading strategies based on these insights generates significantly more annualized returns for representative stocks with controlled risk, outperforming sentiment-agnostic and non-temporal sentiment models. This research provides methodological innovations for processing temporal unstructured data in finance, while the SHAP framework offers regulators and investors actionable insights into sentiment-driven market dynamics.
... Trong khi đó, nghiên cứu của Ozili (2020) [20] đối với các ngân hàng tại Nigeria chỉ ra rằng hiệu quả vận hành đóng vai trò duy trì sự ổn định tài chính, nhấn mạnh vai trò của việc tối ưu hóa tài sản trong quản lý rủi ro tài chính. Tại Việt Nam, nghiên cứu của Tran K. L và cộng sự (2022) [21] dựa trên dữ liệu doanh nghiệp niêm yết cũng khẳng định mối quan hệ chặt chẽ giữa hiệu quả khai thác nguồn lực và kết quả sinh lời, cho thấy rằng quản trị tài sản hiệu quả có thể góp phần nâng cao hiệu suất tài chính và giảm thiểu rủi ro tài chính. ...
Article
Bài viết bổ sung thêm kết quả thực nghiệm về mối quan hệ giữa hiệu quả hoạt động, dòng tiền kinh doanh, đầu tư tài sản cố định (TSCĐ) đến nguy cơ kiệt quệ tài chính của 117 doanh nghiệp xây dựng niêm yết tại Việt Nam giai đoạn 2011-2023. Phương pháp nghiên cứu được sử dụng là FGLS để khắc phục khuyết tật về phương sai thay đổi và tự tương quan, kết quả cho thấy hiệu quả hoạt động (ROA) cao hơn làm giảm khả năng kiệt quệ tài chính trong mô hình X-Score, nhưng lại làm tăng khả năng kiệt quệ tài chính trong hai mô hình còn lại. Ngoài ra, dòng tiền từ hoạt động kinh doanh, vòng quay tổng tài sản, Tỷ lệ đòn bẩy tài chính, tỷ trọng TSCĐ trong tổng tài sản và quy mô doanh nghiệp cũng có ảnh hưởng đáng kể đến kiệt quệ tài chính, song mức độ và hướng ảnh hưởng thay đổi tùy theo mô hình. Điều này phản ánh đặc thù của ngành xây dựng, khi các doanh nghiệp có thể ưu tiên lợi nhuận ngắn hạn bằng cách bỏ qua các yếu tố bền vững, chấp nhận các dự án rủi ro và huy động nguồn vốn vay đáng kể hoặc hy sinh chất lượng công trình. Nghiên cứu nhấn mạnh tầm quan trọng trong việc lựa chọn mô hình phù hợp để đánh giá kiệt quệ tài chính.
... (2) Overall importance analysis By averaging the absolute values of the contribution values at all samples for a given feature, the overall significance value, denoted as j  , can be obtained: (20) where N o represents the amount of all samples pertaining to a specific feature. ...
... The most important explanatory variables for predicting default were the volatility of the utilized credit balance, remaining credit as a percentage of total credit, and the duration of the customer relationship. Tran et al. [25] applied machine learning algorithms to predict the financial distress of listed companies in Vietnam from 2010 to 2021 and utilized SHAP values to interpret the results. Extreme gradient boosting and random forest models outperformed other models regarding recall, F1 scores, and AUC. ...
Article
Full-text available
Accurate prediction of future earnings is crucial for stakeholders. However, existing machine learning models often operate as “black boxes,” offering high accuracy but minimal interpretability. Prior approaches focus on correlational patterns without establishing genuine causal relationships or providing straightforward rule-based explanations. This lack of transparency and causal insight limits the actionable value of current financial prediction models. We propose an anchor-based explainable and causal AI framework for earnings prediction. It integrates an optimized XGBoost classifier (with RENN undersampling to address class imbalance) for high-performance prediction, the Anchor XAI method to generate human-readable “if-then” rules explaining model decisions, and the DoWhy causal inference tool to validate genuine cause-and-effect factors in the financial data. The optimized XGBoost+RENN model achieved ~93.3% overall accuracy, with precision, recall, and F1-scores around 93–94%, outperforming other classifiers. Key features such as Inventory/Total Assets, %Δ Net Profit Margin, and Cash Dividends/Cash Flows emerged as the most influential factors. Coordinated adjustments in these variables yielded significantly better predictive outcomes than isolated changes. Furthermore, DoWhy-based analysis confirms that improvements in these factors causally drive earnings growth, as verified by robustness checks like placebo tests. The proposed framework effectively bridges the gap between predictive accuracy and interpretability. It provides financial decision-makers with reliable earnings predicting and transparent, actionable insights for strategic planning and management, making the predictive model trustworthy and informative.
Chapter
Predicting profitability in the banking sector is essential for effective financial management, risk mitigation, and sustainable growth. In the context of Vietnamese commercial banks, profitability prediction poses significant challenges due to complex relationships among financial metrics and the volatile economic environment. Traditional econometric models often fall short in capturing these complexities, resulting in suboptimal predictions. Although advanced machine learning techniques like XGBoost have shown superior predictive accuracy, they lack the necessary transparency and interpretability, making them challenging to deploy in practice due to regulatory concerns surrounding “black box” models. This study addresses these limitations by proposing an explainable machine learning framework that integrates XGBoost with SHAP (SHapley Additive exPlanations) to predict profitability. The framework uses key financial metrics, including Return on Equity (ROE), Return on Assets (ROA), Total Assets, and Non-performing Loans.
Article
Full-text available
Households play one of the key roles in the development of the financial system of any country and the national economy in general. It is the understanding of the behavior of these economic entities in the market of financial services that makes it possible to predict the development of such a market, to understand the mechanisms of the emergence of dissipative processes in the interaction of households and financial institutions, which can form crisis phenomena in the development of such a system and restrain the innovative development of the national economy. This determines the importance and relevance of further research in this direction. Within the article, the impact of modern digital technologies on the development of financial services, in particular financial behavior of households, in the conditions of the formation and active development of the innovative economy is considered. Significant attention is paid to the specification of the methodology for determining the impact of the digitization index and the index of the model transformation of financial behavior of households. It is established that the outlined models are specific and different between various regions in any country. That is why the above method of calculation was used on the example of Ukraine. As a result, information was obtained on the digitalization index, and the transformation index of the financial behavior model of households in twenty-four regions of Ukraine. Based on the use of econometric analysis, algebraic equations of the dependence of the transformation index of the model of financial behavior of households on the digital technologies development in each of the outlined regions were determined.
Article
Full-text available
Objective of the article is to study the current state of researches of financial behaviour. The article is conceptual and based on the use of the methodology of the bibliometric analysis. The analysis is based on the data retrieval functionalities of Scopus and Web of Science platforms. Is used the toolkit of the VOSviewer program, network visualization of keywords in scientific publications. Findings: The number of publications that directly study the nature and features of the financial behavior formation of various economic agents is insignificant, but is constantly growing. An important role in this process is played by digitalization processes of financial services, which have an important impact on the models transformation of both financial behavior of economic agents, and changes in the model of the financial services provision to customers by financial institutions.
Article
Full-text available
Innovations in statistical technology in functions including credit‐screening have raised concerns about distributional impacts across categories such as race. Theoretically, distributional effects of better statistical technology can come from greater flexibility to uncover structural relationships or from triangulation of otherwise excluded characteristics. Using data on U.S. mortgages, we predict default using traditional and machine learning models. We find that Black and Hispanic borrowers are disproportionately less likely to gain from the introduction of machine learning. In a simple equilibrium credit market model, machine learning increases disparity in rates between and within groups, with these changes attributable primarily to greater flexibility.
Article
Full-text available
This study aims to bridge the gap between two perspectives of explainability−machine learning and engineering, and economics and standard econometrics−by applying three marginal measurements. The existing real estate literature has primarily used econometric models to analyze the factors that affect the default risk of mortgage loans. However, in this study, we estimate a default risk model using a machine learning-based approach with the help of a U.S. securitized mortgage loan database. Moreover, we compare the economic explainability of the models by calculating the marginal effect and marginal importance of individual risk factors using both econometric and machine learning approaches. Machine learning-based models are quite effective in terms of predictive power; however, the general perception is that they do not efficiently explain the causal relationships within them. This study utilizes the concepts of marginal effects and marginal importance to compare the explanatory power of individual input variables in various models. This can simultaneously help improve the explainability of machine learning techniques and enhance the performance of standard econometric methods.
Article
Full-text available
The paper proposes an explainable Artificial Intelligence model that can be used in credit risk management and, in particular, in measuring the risks that arise when credit is borrowed employing peer to peer lending platforms. The model applies correlation networks to Shapley values so that Artificial Intelligence predictions are grouped according to the similarity in the underlying explanations. The empirical analysis of 15,000 small and medium companies asking for credit reveals that both risky and not risky borrowers can be grouped according to a set of similar financial characteristics, which can be employed to explain their credit score and, therefore, to predict their future behaviour.
Article
Full-text available
Array programming provides a powerful, compact and expressive syntax for accessing, manipulating and operating on data in vectors, matrices and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It has an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, materials science, engineering, finance and economics. For example, in astronomy, NumPy was an important part of the software stack used in the discovery of gravitational waves1 and in the first imaging of a black hole2. Here we review how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring and analysing scientific data. NumPy is the foundation upon which the scientific Python ecosystem is constructed. It is so pervasive that several projects, targeting audiences with specialized needs, have developed their own NumPy-like interfaces and array objects. Owing to its central position in the ecosystem, NumPy increasingly acts as an interoperability layer between such array computation libraries and, together with its application programming interface (API), provides a flexible framework to support the next decade of scientific and industrial analysis.
Article
Full-text available
Peer-to-peer (P2P) lending demands effective and explainable credit risk models. Typical machine learning algorithms offer high prediction performance, but most of them lack explanatory power. However, this deficiency can be solved with the help of the explainability tools proposed in the last few years, such as the SHAP values. In this work, we assess the well-known logistic regression model and several machine learning algorithms for granting scoring in P2P lending. The comparison reveals that the machine learning alternative is superior in terms of not only classification performance but also explainability. More precisely, the SHAP values reveal that machine learning algorithms can reflect dispersion, nonlinearity and structural breaks in the relationships between each feature and the target variable. Our results demonstrate that is possible to have machine learning credit scoring models be both accurate and transparent. Such models provide the trust that the industry, regulators and end-users demand in P2P lending and may lead to a wider adoption of machine learning in this and other risk assessment applications where explainability is required.