Figure - available from: Data
This content is subject to copyright.
ROC of classifiers. Source: author’s calculation.

ROC of classifiers. Source: author’s calculation.

Source publication
Article
Full-text available
The past decade has witnessed the rapid development of machine learning applied in economics and finance. Recent evidence suggests that machine learning models have produced superior results to traditional statistical models and have become the driving force for dramatic improvement in the financial industry. However, a much-debated question is whe...

Similar publications

Article
Full-text available
This study enriches the current literature because it is the first to examine how foreign ownership and Investor attention affect the risk-taking behaviors of property and casualty insurance companies in Vietnam. This study is unique because it tests whether foreign ownership moderates the relationship between investor attention and the risk-taking...
Article
Full-text available
This study examined the factors that affect the private sectors' willingness to invest in rural water supply. The study applied a mixed methods approach, including an overview of relevant studies, expert consultation, exploratory factor analysis using SPSS software, and a fuzzy-analytic hierarchy process to identify and evaluate the factors applica...
Article
Full-text available
Earnings management is the practice of adjusting accounting policies to change earnings. It affects the earnings of the banking industry, including listed commercial banks. It also reduces the trust of investors because the information provided is unreasonable for the bank system. Corporate governance as a management organization can prevent earnin...

Citations

... Random Forest [7] is particularly suitable due to its robustness in handling complex, nonlinear relationships; overfitting multicollinearity; effectively managing categorical data; and significantly enhancing predictive accuracy over traditional methods such as logistic regression or discriminant analysis. Comparative analysis of machine learning models for bankruptcy prediction indicated that Random Forests provided superior predictive accuracy compared to traditional models like logistic regression and discriminant models and even artificial neural networks and support vector machines [42,43]. The Random Forest model is specified as follows: ...
Article
Full-text available
This paper investigates the effectiveness of machine learning algorithms in enhancing the accuracy and reliability of predicting financial distress. The dataset includes Altman Z-Scores and Corporate Governance Compliance (CGC) indicators calculated for manufacturing firms listed on the Bucharest Stock Exchange (BSE) from 2016 to 2022. Leveraging Signaling Theory, the study analyzes financial and governance data for 60 non-financial firms, comprising 420 firm-year observations. Financial distress is classified into three categories: no distress, moderate distress, and severe distress. The study employs a Random Forest classification model, leveraging artificial intelligence techniques to identify critical predictive variables and evaluate their combined effectiveness in signaling financial distress. The findings reveal that machine learning algorithms significantly improve the predictive accuracy and reliability of financial distress classifications, effectively distinguishing between different distress levels by integrating financial ratios and corporate governance variables. These results emphasize the advantages of involving artificial intelligence and advanced analytics in financial distress prediction models, enhancing transparency and strengthening investor confidence. The research contributes to the literature on digital transformation in financial analysis and corporate governance, offering practical implications for investors, managers, creditors, and policymakers in emerging market environments.
... For small-cap firms, total assets growth rate emerged as the most influential predictor, whereas large-cap companies exhibited greater dependence on longterm debt-to-asset ratios and liquidity exceeding total assets ratios. Tran et al. [41] utilized SHAP to isolate long-term debt ratios as key predictors of Vietnamese financial distress, while Kumar et al. [42] applied SHAP to interpret deep Q-network (DQN) allocation logic in Sensex/Dow Jones trading strategies. Our hybrid architecture integrates SHAP analysis to illuminate prediction logic, directly addressing the opacity critique prevalent in existing literature. ...
Article
Full-text available
The inherent uncertainty and information asymmetry in financial markets create significant challenges for accurate price forecasting. Although investor sentiment analysis has gained traction in recent research, the temporal dimension of sentiment dynamics remains underexplored. This study develops a novel framework that enhances stock price prediction by integrating time-partitioned investor sentiment, while improving model interpretability via Shapley additive explanations (SHAP) analysis. Employing the ERNIE (enhanced representation through knowledge integration) 3.0 model for sentiment extraction from China’s Eastmoney Guba stock forum, we quantitatively distinguish intraday and post-market investor sentiment then integrate these temporal components with technical indicators through neural network architecture. Our results indicate that temporal sentiment partitioning effectively reduces uncertainty. Empirical evidence demonstrates that our long short-term memory (LSTM) model integrating intraday and post-market sentiment indicators achieves better prediction accuracy, and SHAP analysis reveals the importance of intraday and post-market investor sentiment to stock price prediction models. Implementing quantitative trading strategies based on these insights generates significantly more annualized returns for representative stocks with controlled risk, outperforming sentiment-agnostic and non-temporal sentiment models. This research provides methodological innovations for processing temporal unstructured data in finance, while the SHAP framework offers regulators and investors actionable insights into sentiment-driven market dynamics.
... Trong khi đó, nghiên cứu của Ozili (2020) [20] đối với các ngân hàng tại Nigeria chỉ ra rằng hiệu quả vận hành đóng vai trò duy trì sự ổn định tài chính, nhấn mạnh vai trò của việc tối ưu hóa tài sản trong quản lý rủi ro tài chính. Tại Việt Nam, nghiên cứu của Tran K. L và cộng sự (2022) [21] dựa trên dữ liệu doanh nghiệp niêm yết cũng khẳng định mối quan hệ chặt chẽ giữa hiệu quả khai thác nguồn lực và kết quả sinh lời, cho thấy rằng quản trị tài sản hiệu quả có thể góp phần nâng cao hiệu suất tài chính và giảm thiểu rủi ro tài chính. ...
Article
Bài viết bổ sung thêm kết quả thực nghiệm về mối quan hệ giữa hiệu quả hoạt động, dòng tiền kinh doanh, đầu tư tài sản cố định (TSCĐ) đến nguy cơ kiệt quệ tài chính của 117 doanh nghiệp xây dựng niêm yết tại Việt Nam giai đoạn 2011-2023. Phương pháp nghiên cứu được sử dụng là FGLS để khắc phục khuyết tật về phương sai thay đổi và tự tương quan, kết quả cho thấy hiệu quả hoạt động (ROA) cao hơn làm giảm khả năng kiệt quệ tài chính trong mô hình X-Score, nhưng lại làm tăng khả năng kiệt quệ tài chính trong hai mô hình còn lại. Ngoài ra, dòng tiền từ hoạt động kinh doanh, vòng quay tổng tài sản, Tỷ lệ đòn bẩy tài chính, tỷ trọng TSCĐ trong tổng tài sản và quy mô doanh nghiệp cũng có ảnh hưởng đáng kể đến kiệt quệ tài chính, song mức độ và hướng ảnh hưởng thay đổi tùy theo mô hình. Điều này phản ánh đặc thù của ngành xây dựng, khi các doanh nghiệp có thể ưu tiên lợi nhuận ngắn hạn bằng cách bỏ qua các yếu tố bền vững, chấp nhận các dự án rủi ro và huy động nguồn vốn vay đáng kể hoặc hy sinh chất lượng công trình. Nghiên cứu nhấn mạnh tầm quan trọng trong việc lựa chọn mô hình phù hợp để đánh giá kiệt quệ tài chính.
... Cho and Shin (2023); Grath et al. (2018) generate contrastive explanations to explain required changes for certain predictions. Bussmann et al. (2021) visualizes similar outcomes that describe the risk of a company default with SHAP, while most SHAP-based methods (Babaei et al. 2022;Benhamou et al. 2021;Bracke et al. 2019;Bueff et al. 2022;Bussmann et al. 2020;Carta et al. 2022;Demajo et al. 2020;Dikmen and Burns 2022;Fior et al. 2022;Fritz-Morgenthal et al. 2022;Gramegna and Giudici 2020;Islam et al. 2019;Kumar et al. 2022;Lachuer and Jabeur 2022;Park and Yang 2022;Müller et al. 2022;Rizinski et al. 2022;Tran et al. 2022;Vivek et al. 2022;Wand et al. 2022;Weng et al. 2022;Yasodhara et al. 2021) improve accessibility for technical audiences by discovering important features. Fior et al. (2022) improve usability by constructing interactive graphical tools upon SHAP, which likewise promotes accessibility. ...
... [right] An example of instance-level, E(f(X)) represents the model's base prediction if no features were considered, and f(x) represents the final prediction after summing the contributing features (ϕ i )(Rizinski et al. 2022) Gramegna and Giudici (2020) identify relevant features leading to consumers' decision on purchasing insurance and further clusters them into least to most likely groups with Shapley values.Bussmann et al. (2020) similarly implement SHAP to explain XGBoost's classification of credit risk, while comparing it against an interpretable logistic regression model. Other studies include discovering the relationship between corporate social responsibility and financial performance(Lachuer and Jabeur 2022), customer satisfaction(Rallis et al. 2022), GDP growth rates (Park and Yang 2022), stock trading(Benhamou et al. 2021;Kumar et al. 2022), financial distress(Tran et al. 2022), market volatility forecast(Weng et al. 2022) and credit evaluation(Rizinski et al. 2022;Bueff et al. 2022;Fritz-Morgenthal et al. 2022).Wand et al. (2022) perform K-means clustering on historical S&P 500 stock information to identify dominant sector correlations that describe the state of the market. This work applies Layer-wise Relevance Propagation (LRP)(Bach et al. 2015), after transforming the clustering classifier into a neural network since LRP is designed to work specifically with neural network architectures.Carta et al. (2022) prune unimportant technical indicators using different configurations of a permutation importance technique, before implementing decision tree techniques for stock market forecasting. ...
Article
Full-text available
The success of artificial intelligence (AI), and deep learning models in particular, has led to their widespread adoption across various industries due to their ability to process huge amounts of data and learn complex patterns. However, due to their lack of explainability, there are significant concerns regarding their use in critical sectors, such as finance and healthcare, where decision-making transparency is of paramount importance. In this paper, we provide a comparative survey of methods that aim to improve the explainability of deep learning models within the context of finance. We categorize the collection of explainable AI methods according to their corresponding characteristics, and we review the concerns and challenges of adopting explainable AI methods, together with future directions we deemed appropriate and important.
... This lack of explainability raises ethical concerns, as decision makers may struggle to trust AI-generated financial risk assessments, especially when regulatory oversight is required (Wasserbacher and Spindler 2022). One potential solution to these concerns is the development of hybrid models that combine machine learning with traditional statistical techniques, allowing for enhanced predictive accuracy while maintaining the interpretability required for decision making (Tran et al. 2022;Zhang et al. 2022). ...
... A promising avenue for future exploration lies in the development of hybrid models that combine traditional statistical methods (e.g., logistic regression and discriminant analysis) with advanced machine learning techniques (e.g., gradient boosting and deep learning). Such models should leverage explainable AI (XAI) techniques to maintain transparency and enhance interpretability and trustworthiness for stakeholders (Tran et al. 2022;Zhang et al. 2022). ...
Article
Full-text available
This systematic review analyzes and compares the predictive power between traditional financial ratios and cash flow-based ratios in estimating performance. Although traditional ratios of return on assets and debt to equity have received extensive application, cash flow ratios are increasingly valued by their dynamic insights into both liquidity and financial health. Using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines, this review systematically analyzes 21 studies spread across various industries and regions. The results reveal that cash flow ratios usually dominate the traditional metrics during forecasting financial performance, especially in the presence of the use of machine learning models. Among the identified variables of the logistic regression model and gradient boosting model predictors, key indicators are those showing the return on investment, the current ratio, and the debt-to-asset ratio. The bottom line of the findings is that a combination of cash flow and traditional ratios gives a better understanding of a company’s financial stability. These results may serve as a starting point for investors, regulators, and entrepreneurs and may further facilitate informed decisions with a reduced chance of miscalculations that enhance proactive financial planning. In addition, future prediction models should integrate non-financial factors such as governance quality and market conditions to enhance financial health assessments. Additionally, longitudinal studies examining the evolution of financial ratios over time, along with hybrid statistical and machine learning approaches, can improve forecasting accuracy. Integrating cutting-edge analytical tools with the strength of financial metrics gives this study actionable insights that allow stakeholders to understand financial performance in a more nuanced sense.
... As we argued model interpretability, Zhang et al. (2022) were among the few proposing an explainable AI approach that uses Shapley additive explanations and partial dependence plots to improve financial distress prediction model transparency. Tran et al. (2022) similarly employed SHAP values to interpret Vietnamese financial distress prediction, emphasizing an ongoing trend in the research field for more interpretable models whilst seeking to balance how much accuracy should be sacrificed. The paper enumerates the efforts stated above via optimizing stacked models to deliver reliable predictions and suitable insights for decision-makers. ...
Article
Full-text available
This study evaluates the effectiveness of meta-models in predicting financial distress in the Turkish textile industry. Using economic data from 2013 to 2023, the research applies a meta-model that integrates Lasso, Ridge, Random Forest, Gradient Boosting Machines (GBM), and Support Vector Machines (SVM) as base models, with XGBoost serving as the meta learner. The results show that the meta-model outperforms a standalone XGBoost classifier, especially in minimizing false negatives, which is critical for the early detection of financial distress. The meta-model achieved superior recall and F1 scores, offering a more reliable tool for predicting financial instability in volatile sectors like textiles. However, the study also acknowledges limitations such as model selection bias, the complexity of hyperparameter tuning, and reduced interpretability due to the ensemble nature of the approach. The findings highlight the potential of meta-modeling for industry-specific financial risk prediction while suggesting future improvements in model transparency and generalizability.
... In [7], the research assesses the performance of various machine learning models in predicting financial distress among listed companies in Vietnam. The study reveals that the extreme gradient boosting model achieved the highest accuracy of 95.66%, followed by the artificial neural network (ANN) with 91.68 accuracy. ...
... Accuracy measures the overall correctness of the model by calculating the proportion of true results (both true positives and true negatives) among the total number of cases examined. As shown in Figure 11, XGBoost [7] has accuracy of 95.66% , indicating that it correctly classifies most instances. The proposed method uses RF model and demonstrates high accuracy of 95.34%, reflecting its robustness and reliability. ...
... The most important explanatory variables for predicting default were the volatility of the utilized credit balance, remaining credit as a percentage of total credit, and the duration of the customer relationship. Tran et al. [25] applied machine learning algorithms to predict the financial distress of listed companies in Vietnam from 2010 to 2021 and utilized SHAP values to interpret the results. Extreme gradient boosting and random forest models outperformed other models regarding recall, F1 scores, and AUC. ...
Article
Full-text available
Accurate prediction of future earnings is crucial for stakeholders. However, existing machine learning models often operate as “black boxes,” offering high accuracy but minimal interpretability. Prior approaches focus on correlational patterns without establishing genuine causal relationships or providing straightforward rule-based explanations. This lack of transparency and causal insight limits the actionable value of current financial prediction models. We propose an anchor-based explainable and causal AI framework for earnings prediction. It integrates an optimized XGBoost classifier (with RENN undersampling to address class imbalance) for high-performance prediction, the Anchor XAI method to generate human-readable “if-then” rules explaining model decisions, and the DoWhy causal inference tool to validate genuine cause-and-effect factors in the financial data. The optimized XGBoost+RENN model achieved ~93.3% overall accuracy, with precision, recall, and F1-scores around 93–94%, outperforming other classifiers. Key features such as Inventory/Total Assets, %Δ Net Profit Margin, and Cash Dividends/Cash Flows emerged as the most influential factors. Coordinated adjustments in these variables yielded significantly better predictive outcomes than isolated changes. Furthermore, DoWhy-based analysis confirms that improvements in these factors causally drive earnings growth, as verified by robustness checks like placebo tests. The proposed framework effectively bridges the gap between predictive accuracy and interpretability. It provides financial decision-makers with reliable earnings predicting and transparent, actionable insights for strategic planning and management, making the predictive model trustworthy and informative.
... The dependent variables are internal factors but are carefully considered financial and non-financial factors. Among the few studies that mention factors affecting the financial distress of firms in Vietnam, some studies focus too much on nonfinancial factors (Ninh et al., 2018;Truong, 2022), while others focus mainly on non-financial factors (Tran et al., 2022). This study will overcome the limitations of previous research by including in the model both groups of independent factors: financial factors and non-financial factors. ...
Article
Full-text available
Understanding the conditions leading to business failure and predicting them earlier is the best way for companies to overcome and minimize their harm, improve their performance, and avoid financial distress and bankruptcy. This paper aims to measure the level and trends of factors affecting financial distress in Vietnam – an emerging Southeast Asian economy, along with the managerial implications drawn from the research results. Research data were collected from 606 firms listed on the Vietnam Stock Exchange from 2018 to 2022. The Altman Z-score is used to determine the financial distress of these firms. The factors researched and tested in this study are all internal factors divided into two groups with distinct features. Non-financial factors belong to management characteristics; financial factors are typical indicators of a firm’s financial statements. The study uses OLS, FEM, and REM models to analyze the influence of financial factors (Total liability to Total assets, Sales growth, Firm size, and Firm age) and non-financial factors (Board size, CEO duality, Institutional ownership level, Independent member, and Foreign CEOs) on financial distress and GLS regression to overcome the model’s shortcomings. The results show that the factors in the research model significantly impact financial distress, of which six factors (Board size, CEO duality, Institutional ownership level, Foreign CEOs, Sales growth, and Firm age) are negatively correlated. Three other factors (Independent members, Total liability to Total assets, and Firm size) are positively correlated with financial distress. AcknowledgmentThe author thanks everyone who helped make this study possible.
... The XAI model demonstrated superior performance compared to the Random Forest classifier, achieving training and testing accuracies of 99% and 96.7%, respectively. Tran et al. [25] utilized machine learning methods to predict the financial instability of publicly listed firms in Vietnam from 2010 to 2021. They employed SHAP (Shapley Additive Explanations) values to understand the results of the model. ...
Article
Full-text available
This research aims to enhance financial fraud detection by integrating SHAP-Instance Weighting and Anchor Explainable AI with XGBoost, addressing challenges of class imbalance and model interpretability. The study extends SHAP values beyond feature importance to instance weighting, assigning higher weights to more influential instances. This focuses model learning on critical samples. It combines this with Anchor Explainable AI to generate interpretable if-then rules explaining model decisions. The approach is applied to a dataset of financial statements from the listed companies on the Stock Exchange of Thailand. The method significantly improves fraud detection performance, achieving perfect recall for fraudulent instances and substantial gains in accuracy while maintaining high precision. It effectively differentiates between non-fraudulent, fraudulent, and grey area cases. The generated rules provide transparent insights into model decisions, offering nuanced guidance for risk management and compliance. This research introduces instance weighting based on SHAP values as a novel concept in financial fraud detection. By simultaneously addressing class imbalance and interpretability, the integrated approach outperforms traditional methods and sets a new standard in the field. It provides a robust, explainable solution that reduces false positives and increases trust in fraud detection models. Doi: 10.28991/ESJ-2024-08-06-016 Full Text: PDF