Article

Explaining prediction models and individual predictions with feature contributions

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We present a sensitivity analysis-based method for explaining prediction models that can be applied to any type of classification or regression model. Its advantage over existing general methods is that all subsets of input features are perturbed, so interactions and redundancies between features are taken into account. Furthermore, when explaining an additive model, the method is equivalent to commonly used additive model-specific methods. We illustrate the method's usefulness with examples from artificial and real-world data sets and an empirical analysis of running times. Results from a controlled experiment with 122 participants suggest that the method's explanations improved the participants' understanding of the model.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In this section, we introduce the necessary preliminaries and notions required for the full understanding of our framework. We first introduce the SHAP framework [21], based on the computation of Shapley values for explaining individual classification outcomes [33]. Then, we introduce the de-normalized Goodman-Kruskal's τ association measure that we use for computing the strength of the association of groups of features to groups of data instances. ...
... The goal of the SHAP (SHapley Additive exPlanations) framework [21] is to assign a local importance value (hereafter called SHAP value) to each feature of F in the prediction z = f (d) for an instance d. According to [21,33], and when adapted to a classification task, the SHAP value ϕ v (f h , d) measures the contribution of the feature v to the computation of probability f h (d). ...
... The way SHAP values ϕ v (f h , d) are specified follows a seminal result in cooperative game theory [31], which proposes to assign a fair reward to players by measuring their individual contributions (known as Shapley values) to a grand coalition F they participate in. If the players in coalition F are considered to be features, then, according to [21,33], the SHAP values can be obtained as ...
Conference Paper
Full-text available
Transparency is a non-functional requirement of machine learning that promotes interpretable or easily explainable outcomes. Unfortunately , interpretable classification models, such as linear, rule-based, and decision tree models, are superseded by more accurate but complex learning paradigms, such as deep neural networks and ensemble methods. For tabular data classification, more specifically, models based on gradient-boosted tree ensembles, such as XGBoost, are still competitive compared to deep learning ones, so they are often preferred to the latter. However, they share the same interpretability issues, due to the complexity of the learnt model and, consequently, of the predictions. While the problem of computing local explanations is largely addressed, the problem of extracting global explanations is scarcely investigated. Existing solutions consist of computing some feature importance score, or extracting approximate surrogate trees from the learnt forest, or even using a black-box explainability method. However, those methods either have poor fidelity or their comprehensibility is questionable. In this paper , we propose to fill this gap by leveraging the strong theoretical basis of the SHAP framework in the context of co-clustering and feature selection. As a result, we are able to extract shallow decision trees that explain XGBoost with competitive fidelity and higher comprehensibility compared to two recent state-of-the-art competitors.
... Finally, we employ the random forest of Breiman (2001), which combines the model averaging feature of the random subspace methods with the nonlinear modeling inherent in regression trees. A downside of the random forest is that its nonlinearity complicates the interpretation of the role of the different predictors in the forecasts and we work with Shapley values as discussed by Štrumbelj and Kononenko (2014) and Lundberg and Lee (2017) in our analysis of the importance of the predictors in the random forest. ...
... The random forest, in contrast, is a highly nonlinear model, which makes interpreting the role of different predictors considerably more complex. We use the concept of Shapley values (Shapley 1953) to interpret the role of predictors, which has been developed further by Štrumbelj and Kononenko (2014) and Lundberg and Lee (2017). ...
... The contribution of predictor i, i , is the weighted average of the loss differences where N M is the number of possible combinations of predictors in trees excluding predictor i. With many predictors, this procedure is computationally burdensome and we therefore use the approximation proposed by Štrumbelj and Kononenko (2014), which uses randomly sampled subsets of predictors to compute the Shapley value for each predictor. ...
Article
Full-text available
This paper compares the ability of several econometric and machine learning methods to nowcast GDP in (pseudo) real-time. The analysis takes the example of Dutch GDP over the period 1992Q1–2018Q4 using a broad data set of monthly indicators. It discusses the forecast accuracy but also analyzes the use of information from the large data set of macroeconomic and financial predictors. We find that, on average, the random forest provides the most accurate forecast and nowcasts, whilst the dynamic factor model provides the most accurate backcasts.
... Current applications based on SHAP values that need to select the most influential features (e.g., [10]- [12]), use the following workflow. Firstly, they run an algorithm to approximate SHAP values (e.g., [2], [8], [9], [13]) of each feature, and secondly, they select the features having the highest SHAP values in a post-processing step. The novelty of the method proposed in this paper is that it starts by quickly computing a rough approximation of the SHAP values of all features, and then iteratively improves these approximations while discarding on the fly the features whose SHAP values converge towards too low values. ...
... In the first family, among the approaches based on sampling permutations (e.g., [9]) the one in [13] is of particular interest because it incorporates the approximation of f S (x) in the sampling process. Indeed, computing f S (x) is a problem in itself, and most model types cannot output it directly. ...
... Indeed, computing f S (x) is a problem in itself, and most model types cannot output it directly. The algorithm presented in [13] solved this problem by using a joint sampling of permutations and values in the domain of x to estimate f S (x). ...
Article
Full-text available
With the ever-increasing influence of machine learning models, it has become necessary to explain their predictions. The SHAP framework provides a solution to this problem by assigning a score to each feature of a model such that it reflects the feature contribution to the prediction. Although SHAP is widely used, it is hampered by its computational cost when preserving model-agnosticism. This paper proposes a model-agnostic algorithm, TopShap, to efficiently approximate the SHAP values of the top-k most important features. TopShap uses confidence interval bounds of the approximate SHAP values to determine on the fly which features can no longer be part of the top-k and then removes them from the computation, thus saving computational resources. This cost reduction makes TopShap better suited than competing model-agnostic methods for top-k SHAP value computation. The evaluation of TopShap shows that it performs efficient pruning of the feature search space, in turn leading to a substantial reduction in the execution time when compared to the existing most efficient agnostic approach, Kernel SHAP. The experiments presented in this work cover a wide range of numbers of features and instances, using the following public datasets: Concrete, Wine quality, Appliances energy, PBMC gene expression, Mercedes, CT locations, and a synthetic regression. Various models were used to demonstrate model-agnosticism: Regression Forest , Multi-Layer Perceptron , RBF-kernel Support Vector Regression , and Stacked Generalization .
... Interpretable ML models are now essential for understanding the importance of input variables and their alignment with the physical processes being modeled. SHAP (Shapley Additive Explanations), based on game theory (Štrumbelj and Kononenko 2014), is widely used for interpreting ML models. It helps evaluate how each variable influences model output, allowing for the assessment of input variable relevance to the underlying physical processes (El Bilali et al. 2024). ...
... To interpret the trained XGBoost model and assess the impact of input variables on peak flow, we applied SHAP and LIME algorithms. SHAP (Lundberg and Lee 2017;Štrumbelj and Kononenko 2014) provides both global and local interpretation based on game theory, allowing analysis of both positive and negative contributions of variables. LIME, a modelagnostic method, creates data points by adding noise to samples, then fits an interpretable model to these points with weights based on proximity to the original observation (Ribeiro et al. 2016). ...
Article
Full-text available
Predicting peak flow from dam break is crucial in hydraulic engineering. However, Data availability is a great challenge for developing reliable models. In this study, we attempt to develop a new framework to predict peak flow from breached dams using synthetic and real data. Thus, Monte Carlo method was used to generate synthetic samples of the breach parameters for running HEC-RAS-2D to simulate peak flow. Then, XGBoost, Shapley Additive Explanations, and Local Interpretable Model-agnostic Explanations algorithms were applied to analyze and interpret the influence of the input variables with regard dam breach process. The results revealed that the NSE of the XGBoost model ranged from 0.98 to -0.21. The Surface area of the breach and the height of water at failure were identified as main factors followed by weir coefficient and the formation time of the breach. The volume of water at failure was ranked first factor followed by the breach width when the surface area is not considered. Furthermore, the original data, of 111 real dam break events with known Hw and Vw, was merged with synthetic one, to assess XGBoost and showed a high accuracy with NSE about 0.99 and 0.75 during the training and validation phases, respectively. Using both the real and synthetic data significantly improved the accuracy of the XGBoost model with an increase in NSE by 9% during the validation when using (Vw) as input feature. Overall, this study presents a novel and robust approach for predicting peak flow with limited data, offering valuable insights for effective dam safety management and flood risk mitigation.
... Feature-importancebased methods provide a magnitude and direction for each feature based on its contribution to a model prediction. Several feature-importance-based methods have been proposed over the last decade [13,14,20,21]. In this paper, the use of our data set and evaluation methodology is demonstrated in the following methods: LIME. ...
... A subset is considered influential if flipping all of the features in the subset and feeding the logic operator with the flipped subset results in changing the output (lines [14][15][16][17]. An influential node is included in the final list I if it is an input node i x , or in the queue IQueue if it is an inner node z y (lines [19][20][21][22]. If an influencing subset was found in a group of a certain size, the search is performed (lines 12-13). ...
Article
Full-text available
The widespread use of machine and deep learning algorithms for anomaly detection has created a critical need for robust explanations that can identify the features contributing to anomalies. However, effective evaluation methodologies for anomaly explanations are currently lacking, especially those that compare the explanations against the true underlying causes, or ground truth. This paper aims to address this gap by introducing a rigorous, ground-truth-based framework for evaluating anomaly explanation methods, which enables the assessment of explanation correctness and robustness—key factors for actionable insights in anomaly detection. To achieve this, we present an innovative benchmark dataset of digital circuit truth tables with model-based anomalies, accompanied by local ground truth explanations. These explanations were generated using a novel algorithm designed to accurately identify influential features within each anomaly. Additionally, we propose an evaluation methodology based on correctness and robustness metrics, specifically tailored to quantify the reliability of anomaly explanations. This dataset and evaluation framework are publicly available to facilitate further research and standardize evaluation practices. Our experiments demonstrate the utility of this dataset and methodology by evaluating common model-agnostic explanation methods in an anomaly detection context. The results highlight the importance of ground-truth-based evaluation for reliable and interpretable anomaly explanations, advancing both theory and practical applications in explainable AI. This work establishes a foundation for rigorous, evidence-based assessments of anomaly explanations, fostering greater transparency and trust in AI-driven anomaly detection systems.
... Reliable and easily understood explanations are key to gaining human trust and enabling effective ML usage [12][13][14][15]. In critical health situations, institutions tend to prefer explainable models over complex "black box" models, even if the latter are slightly more accurate [16]. In medical applications like analyzing COVID-19 data, interpretability is as crucial as traditional performance metrics like accuracy. ...
Article
Full-text available
In this study, we introduce a novel approach that integrates interpretability techniques from both traditional machine learning (ML) and deep neural networks (DNN) to quantify feature importance using global and local interpretation methods. Our method bridges the gap between interpretable ML models and powerful deep learning (DL) architectures, providing comprehensive insights into the key drivers behind model predictions, especially in detecting outliers within medical data. We applied this method to analyze COVID-19 pandemic data from 2020, yielding intriguing insights. We used a dataset consisting of individuals who were tested for COVID-19 during the early stages of the pandemic in 2020. The dataset included self-reported symptoms and test results from a wide demographic, and our goal was to identify the most important symptoms that could help predict COVID-19 infection accurately. By applying interpretability techniques to both machine learning and deep learning models, we aimed to improve understanding of symptomatology and enhance early detection of COVID-19 cases. Notably, even though less than 1% of our cohort reported having a sore throat, this symptom emerged as a significant indicator of active COVID-19 infection, appearing 7 out of 9 times in the top four most important features across all methodologies. This suggests its potential as an early symptom marker. Studies have shown that individuals reporting sore throat may have a compromised immune system, where antibody generation is not functioning correctly. This aligns with our data, which indicates that 5% of patients with sore throats required hospitalization. Our analysis also revealed a concerning trend of diminished immune response post-COVID infection, increasing the likelihood of severe cases requiring hospitalization. This finding underscores the importance of monitoring patients post-recovery for potential complications and tailoring medical interventions accordingly. Our study also raises critical questions about the efficacy of COVID-19 vaccines in individuals presenting with sore throat as a symptom. The results suggest that booster shots might be necessary for this population to ensure adequate immunity, given the observed immune response patterns. The proposed method not only enhances our understanding of COVID-19 symptomatology but also demonstrates its broader utility in medical outlier detection. This research contributes valuable insights to ongoing efforts in creating interpretable models for COVID-19 management and vaccine optimization strategies. By leveraging feature importance and interpretability, these models empower physicians, healthcare workers, and researchers to understand complex relationships within medical data, facilitating more informed decision-making for patient care and public health initiatives.
... The prediction results of the RF model for the overall carbon emissions in Jiangsu Province can be explained using SHAP [62]. The left panel of Figure 4a shows the importance ranking plot of the characteristic variables, which represents the average SHAP value of all variables. ...
Article
Full-text available
This study accounted for and analyzed the carbon emissions of 13 cities in Jiangsu Province from 1999 to 2021. We compared the simulation effects of four models—STIRPAT, random forest, extreme gradient boosting, and support vector regression—on carbon emissions and performed model optimization. The random forest model demonstrated the best simulation performance. Using this model, we predicted the carbon emission paths for the 13 cities in Jiangsu Province under various scenarios from 2022 to 2040. The results show that Xuzhou has already achieved its peak carbon target. Under the high-speed development scenario, half of the cities can achieve their peak carbon target, while the remaining cities face significant challenges in reaching their peak carbon target. To further understand the factors influencing carbon emissions, we used the machine learning interpretation method SHAP and the features importance ranking method. Our analysis indicates that electricity consumption, population size, and energy intensity have a greater influence on overall carbon emissions, with electricity consumption being the most influential variable, although the importance of the factors varies considerably across different regions. Results suggest the need to tailor carbon reduction measures to the differences between cities and develop more accurate forecasting models.
... Moreover, these attribute values enhance our understanding of how the learning model arrives at its final decision results. SHAP was developed based on a game-theoretic concept called Shapley values, which studies the impact of each feature and its contribution to the learning prediction model [42]. The general concept behind it is to calculate the impact of each player in a team by calculating the performance of the team with and without this player. ...
Article
Full-text available
The Android operating system has become increasingly popular, not only on mobile phones but also in various other platforms such as Internet-of-Things devices, tablet computers, and wearable devices. Due to its open-source nature and significant market share, Android poses an attractive target for malicious actors. One of the notable security challenges associated with this operating system is riskware. Riskware refers to applications that may pose a security threat due to their vulnerability and potential for misuse. Although riskware constitutes a considerable portion of Android’s ecosystem malware, it has not been studied as extensively as other types of malware such as ransomware and trojans. In this study, we employ machine learning techniques to analyze the behavior of different riskware families and identify similarities in their actions. Furthermore, our research identifies specific behaviors that can be used to distinguish these riskware families. To achieve these insights, we utilize various tools such as k-Means clustering, principal component analysis, extreme gradient boost classifiers, and Shapley additive explanation. Our findings can contribute significantly to the detection, identification, and forensic analysis of Android riskware.
... The solution to Eq. (7) is computationally very demanding as it requires the sum of the marginal contribution of more than 2 − 1 combinations. Work done by for example Strumbelj and Kononenko (2014) provides ways to approximate Eq. (7) with a Monte Carlo method. ...
... SHAP (Shapley additive explanations) analysis, introduced by Lundberg and Lee, was employed in this study to improve the interpretability of ML models by providing a more accurate influence of each input feature on output [77]. SHAP analysis uses game theory and local explanations to quantify the contribution of each input feature [78]. The following Equation (10) is used to calculate the Shapley coefficient φ i , which is the basis for this analysis: ...
Article
Full-text available
Traffic crashes contribute significantly to non-recurrent congestion, thereby increasing delays, congestion pollution, and other challenges. It is important to have tools that enable accurate prediction of incident duration to reduce delays. It is also necessary to understand factors that affect the duration of traffic crashes. This study developed three machine learning models, namely extreme gradient boosting (XGBoost), categorical boosting (CatBoost), and a light gradient-boosting machine (LightGBM), to predict crash-related incident clearance time in Louisiana rural interstates and utilized Shapley additive explanations (SHAP) analysis to determine the influence of factors impacting it. Four ICT levels were defined based on 30 min intervals: short (0–30), medium (31–60), intermediate (61–90), and long (greater than 90). The results suggest that XGBoost outperforms CatBoost and LightGBM in the collective model’s predictive performance. It was found that different features significantly affect different ICT levels. The results indicate that crashes involving injuries, fatalities, heavy trucks, head-on collisions, roadway departure, and older drivers are the significant factors that influence ICT. The results of this study may be used to develop and implement strategies that lead to reduced incident duration and related challenges with long clearance times, providing actionable insights for traffic managers, transportation planners, and incident response agencies to enhance decision-making and mitigate the associated increases in congestion and secondary crashes.
... For the purposes of the present work, we implemented SHAP -SHapley Additive exPlanations, an algorithm based on Shapley values, a concept from cooperative game theory. Shapley values have been introduced in 1953 by Lloyd Shapley (Shapley, 1953) and have been used in the past to compute explanations on model predictions (Lipovetsky & Cocklin, 2001;Štrumbelj & Kononenko, 2014). SHAP is able to explain the output of any Machine Learning model, quantifying the contribution of each feature to the prediction made by the model, thus providing insights on how the model arrived at a particular prediction. ...
Article
Full-text available
This study explores the capability of Convolutional Neural Networks (CNNs), a particular class of Deep Learning algorithms specifically crafted for computer vision tasks, to classify images of isolated fossil shark teeth gathered from online datasets as well as from the authors' experience on Peruvian Miocene and Italian Pliocene fossil assemblages. The shark tooth images that are included in the final, composite dataset (which consists of more than one thousand images) are representative of both extinct and extant genera, namely, Carcharhinus, Carcharias, Carcharocles, Chlamydoselachus, Cosmopolitodus, Galeocerdo, Hemipristis, Notorynchus, Prionace and Squatina. We compared the classification performances of two CNNs, namely: SharkNet-X, a specifically tailored neural network that was developed and trained from scratch; and VGG16, which was trained using the transfer learning paradigm. Furthermore, in order to understand and explain the behaviour of the two CNNs, while providing a palaeontologist's perspective on the results, we firstly elaborated a visualisation of the features extracted from the images using the last dense layer of each CNN, which was achieved through the application of the t-distributed Stochastic Neighbor Embedding (t-SNE) clustering technique. Then, we introduced the explainability method SHAP (SHapley Additive exPlanations), which is a game theoretic approach to explain the output of any Machine Learning model. The results show that VGG16 outperforms SharkNet-X in most scenarios, especially when trained with data augmentation techniques, achieving high accuracy (93%-97%) in tooth classification. In addition, the SHAP heatmaps revealed that the CNNs relied heavily on tooth margins and inner regions for identification, offering insights into the automated classification process. Overall, this study demonstrates that Deep Learning techniques can effectively assist in identifying isolated fossil shark teeth, paving the way for developing automated tools for fossil recognition and classification.
... SHAP is categorized as an additive feature attribution method, representing the model's output as a linear sum of its input variables. Each feature's contribution is encapsulated by the Shapley value, a concept originating from coalitional game theory (Sˇtrumbelj and Kononenko 2014). The explanation model can be defined as follows: ...
Article
Post-earthquake reconnaissance stands as a vital endeavor, encompassing a systematic survey and data collection to gain insights into the aftermath of seismic events. Grappling with a multitude of multidimensional data poses inherent challenges in the proper interpretation and extraction of hidden information. This paper unfolds in two parts. Initially, we delineate the reconnaissance mission following the February 06 Türkiye earthquake sequence, elucidating the challenges, complexities, and nature of the amassed data. Subsequently, a paradigm shift toward automated machine learning (AutoML) is embraced to ascertain the optimal model, categorize observed damages, and unveil underlying patterns within the collected data. The most accurate machine learning model is then coupled with Shapley Additive Explanations (SHAP) to explicate observations, steering away from a black-box model. SHAP values discern and prioritize factors significantly contributing to various damage levels. The results reveal a test set accuracy of 0.75 and 0.95 for multi- and binary-class problems, respectively, employing both raw and composite features. Building-related features exert more control over light damage, while earthquake-related factors dominate severe damage. In critical damage scenarios, duration- and velocity-dependent intensity measures assume significance. Furthermore, findings indicate that if column and wall indices are below 0.07 and 0.08, respectively, they positively contribute to the likelihood of critical structural damage.
... Complementing Sobol's analysis, we also perform a Shapley sensitivity analysis, as proposed by Štrumbelj and Kononenko (2014). Based on game theory, this method calculates the contribution of each input parameter using its Shapley value. ...
Article
Full-text available
Antarctica's Lambert Glacier drains about one-sixth of the ice from the East Antarctic Ice Sheet and is considered stable due to the strong buttressing provided by the Amery Ice Shelf. While previous projections of the sea-level contribution from this sector of the ice sheet have predicted significant mass loss only with near-complete removal of the ice shelf, the ocean warming necessary for this was deemed unlikely. Recent climate projections through 2300 indicate that sufficient ocean warming is a distinct possibility after 2100. This work explores the impact of parametric uncertainty on projections of the response of the Lambert–Amery system (hereafter “the Amery sector”) to abrupt ocean warming through Bayesian calibration of a perturbed-parameter ice-sheet model ensemble. We address the computational cost of uncertainty quantification for ice-sheet model projections via statistical emulation, which employs surrogate models for fast and inexpensive parameter space exploration while retaining critical features of the high-fidelity simulations. To this end, we build Gaussian process (GP) emulators from simulations of the Amery sector at a medium resolution (4–20 km mesh) using the Model for Prediction Across Scales (MPAS)-Albany Land Ice (MALI) model. We consider six input parameters that control basal friction, ice stiffness, calving, and ice-shelf basal melting. From these, we generate 200 perturbed input parameter initializations using space filling Sobol sampling. For our end-to-end probabilistic modeling workflow, we first train emulators on the simulation ensemble and then calibrate the input parameters using observations of the mass balance, grounding line movement, and calving front movement with priors assigned via expert knowledge. Next, we use MALI to project a subset of simulations to 2300 using ocean and atmosphere forcings from a climate model for both low- and high-greenhouse-gas-emission scenarios. From these simulation outputs, we build multivariate emulators by combining GP regression with principal component dimension reduction to emulate multivariate sea-level contribution time series data from the MALI simulations. We then use these emulators to propagate uncertainty from model input parameters to predictions of glacier mass loss through 2300, demonstrating that the calibrated posterior distributions have both greater mass loss and reduced variance compared to the uncalibrated prior distributions. Parametric uncertainty is large enough through about 2130 that the two projections under different emission scenarios are indistinguishable from one another. However, after rapid ocean warming in the first half of the 22nd century, the projections become statistically distinct within decades. Overall, this study demonstrates an efficient Bayesian calibration and uncertainty propagation workflow for ice-sheet model projections and identifies the potential for large sea-level rise contributions from the Amery sector of the Antarctic Ice Sheet after 2100 under high-greenhouse-gas-emission scenarios.
... Recently there have been efforts in the machine learning community to make this process of extracting information from algorithms easier, often called algorithm explainability (e.g. feature importance [63], partial dependence plots [64], Shapley values [65], etc.). The goals are three-fold. ...
Article
Full-text available
Supervised machine learning (ML) offers an exciting suite of algorithms that could benefit research in sport science. In principle, supervised ML approaches were designed for pure prediction, as opposed to explanation, leading to a rise in powerful, but opaque, algorithms. Recently, two subdomains of ML–explainable ML, which allows us to “peek into the black box,” and interpretable ML, which encourages using algorithms that are inherently interpretable–have grown in popularity. The increased transparency of these powerful ML algorithms may provide considerable support for the hypothetico-deductive framework, in which hypotheses are generated from prior beliefs and theory, and are assessed against data collected specifically to test that hypothesis. However, this paper shows why ML algorithms are fundamentally different from statistical methods, even when using explainable or interpretable approaches. Translating potential insights from supervised ML algorithms, while in many cases seemingly straightforward, can have unanticipated challenges. While supervised ML cannot be used to replace statistical methods, we propose ways in which the sport sciences community can take advantage of supervised ML in the hypothetico-deductive framework. In this manuscript we argue that supervised machine learning can and should augment our exploratory investigations in sport science, but that leveraging potential insights from supervised ML algorithms should be undertaken with caution. We justify our position through a careful examination of supervised machine learning, and provide a useful analogy to help elucidate our findings. Three case studies are provided to demonstrate how supervised machine learning can be integrated into exploratory analysis. Supervised machine learning should be integrated into the scientific workflow with requisite caution. The approaches described in this paper provide ways to safely leverage the strengths of machine learning—like the flexibility ML algorithms can provide for fitting complex patterns—while avoiding potential pitfalls—at best, like wasted effort and money, and at worst, like misguided clinical recommendations—that may arise when trying to integrate findings from ML algorithms into domain knowledge. Key Points Some supervised machine learning algorithms and statistical models are used to solve the same problem, y = f(x) + ε , but differ fundamentally in motivation and approach. The hypothetico-deductive framework—in which hypotheses are generated from prior beliefs and theory, and are assessed against data collected specifically to test that hypothesis—is one of the core frameworks comprising the scientific method. In the hypothetico-deductive framework, supervised machine learning can be used in an exploratory capacity. However, it cannot replace the use of statistical methods, even as explainable and interpretable machine learning methods become increasingly popular. Improper use of supervised machine learning in the hypothetico-deductive framework is tantamount to p-value hacking in statistical methods.
... . The validity of these classification features (Table II) was analyzed by using the shapley additive exPlanations (SHAP). The SHAP is a unified framework based on coalitional game theory, which has been widely used to evaluate the outputs of a prediction model by assigning a score of importance [82], [83]. Fig. 12 shows the SHAP values of different classification features used for S. alterniflora extraction with the XGBoost. ...
Article
Full-text available
Accurate mapping of Spartina alterniflora (S. alterniflora) invasion is crucial for controlling its spread and reducing severe ecological problems. Satellite images have been extensively employed for S. alterniflora invasion monitoring, however, there are still several issues that need to be addressed. The spectral similarities between S. alterniflora and surrounding ground objects make it challenging for traditional classifiers to achieve satisfactory extraction accuracy. Since the phenological information and red-edge spectral differences have been considered as informative features for identifying S. alterniflora, current studies mainly used them separately as classification features and seldom considered the differences of red-edge information at different phenological periods. Therefore, we proposed a pixel-based phenological and red-edge feature composite method (PpRef-CM) for S. alterniflora extraction considering both phenological information and red-edge bands derived from Sentinel-2 time series based on the existing pixel-based phenological feature composite method (Ppf-CM). The proposed PpRef-CM and machine learning algorithms were employed for S. alterniflora extraction in two typical mangrove forests along coastal China. Results indicated that red-edge information at different phenological periods is essential for detecting S. alterniflora. S. alterniflora extraction achieved the highest accuracy of 96.57% by using the eXtreme gradient boost (XGBoost) algorithm when compared to other machine learning algorithms. The PpRef-CM gave 2.72% and 2.61% more extraction accuracies of S. alterniflora than the Ppf-CM in two study sites, separately. These findings provide insights for selecting suitable classification features for S. alterniflora exaction studies and serve as an effective control and management of S. alterniflora.
... The calculation of SHAP values can be approximated using the following equations (Štrumbelj and Kononenko, 2014): ...
... Even if an AI model can successfully solve a given task, it is often very hard to determine on which grounds the model made its decision. While methods like random forests can intrinsically assess how important a given feature is for decision-making, other ML types require a separate analysis that estimates the impact of individual features (Carrieri et al., 2021;Štrumbelj & Kononenko, 2014). While there are many approaches to tackling the issues of explainable AI, they cannot guarantee a robust explanation of the full model (van der Velden et al., 2022). ...
Article
Full-text available
Artificial intelligence (AI) has the potential to transform clinical practice and healthcare. Following impressive advancements in fields such as computer vision and medical imaging, AI is poised to drive changes in microbiome‐based healthcare while facing challenges specific to the field. This review describes the state‐of‐the‐art use of AI in microbiome‐related healthcare. It points out limitations across topics such as data handling, AI modelling and safeguarding patient privacy. Furthermore, we indicate how these current shortcomings could be overcome in the future and discuss the influence and opportunities of increasingly complex data on microbiome‐based healthcare.
... SHAP (Shapley Additive exPlanation) is an explainable machine learning model inspired by the Shapley value that can provide explanations for the output of machine learning models [70]. The Shapley value originates from coalition game theory and was introduced by Shapley in 1953 [71]. Its core idea is to compute the marginal contribution of each feature to the model's output. ...
Article
Full-text available
Population mobility between cities significantly affects traffic congestion, disease spread, and societal well-being. As globalization and urbanization accelerate, understanding the dynamics of population mobility becomes increasingly important. Traditional population migration models reveal the factors influencing migration, while machine learning methods provide effective tools for creating data-driven models to handle the nonlinear relationships between origin and destination characteristics and migration. To deepen the understanding of population mobility issues, this study presents GraviGBM, an expandable population mobility simulation model that combines the gravity model with machine learning, significantly enhancing simulation accuracy. By employing SHAPs (SHapley Additive exPlanations), we interpret the modeling results and explore the relationship between urban characteristics and population migration. Additionally, this study includes a case analysis of COVID-19, extending the model’s application during public health emergencies and evaluating the contribution of model variables in this context. The results show that GraviGBM performs exceptionally well in simulating inter-city population migration, with an RMSE of 4.28, far lower than the RMSE of the gravity model (45.32). This research indicates that distance emerged as the primary factor affecting mobility before the pandemic, with economic factors and population also playing significant roles. During the pandemic, distance remained dominant, but the significance of short distances gained importance. Pandemic-related indicators became prominent, while economics, population density, and transportation substantially lost their influence. A city-to-city flow analysis shows that when population sizes are comparable, economic factors prevail, but when economic profiles match, living conditions dictate migration. During the pandemic, residents from hard-hit areas moved to more distant cities, seeking normalcy. This research offers a comprehensive perspective on population mobility, yielding valuable insights for future urban planning, pandemic response, and decision-making processes.
... The SHAP (SHapley Additive exPlanations) model first appeared in 2017, when Lundberg and Lee proposed the SHAP value as a broadly applicable method for interpreting a wide range of models, especially hard to understand black-box models (Liu et al., 2023a, b). SHAP is based on game theory and local explanations (Erik & Lgor, 2014) and is one of the classic expost explanatory frameworks. Its core is the Shapley value concept from cooperative game theory, which provides Shapley values for each trait's contribution. ...
Article
Full-text available
Drought is one of the leading natural disasters in the world. It severely restricts the normal functioning of societies and economies. The purpose of this paper is to predict the drought vulnerability index in Henan Province so that timely measures can be taken to mitigate the impact of drought and to ensure the continued stability of people’s lives and social development. First, the drought vulnerability index of Henan Province for 2010–2022 was calculated based on relevant literature and the regional situation of Henan Province. In addition, the main factors of sensibility and resilience are selected, the prediction indicator system is established, and the grey correlation analysis model is applied to select the top 10 main factors of drought vulnerability. Second, the Random Forest model was trained and simulated, and the drought vulnerability index of Henan Province was predicted for 2023–2025. Finally, the SHapley Additive exPlanations model was used to explain the specific effects of the influences of the Random Forest model. The interpretable Random Forest model not only accurately predicts the drought vulnerability index in Henan Province but also alleviates concerns about the “black box” problem of indirect interpretation of the Random Forest model. The results of the study show that the Average Percentage Error of the drought vulnerability index in Henan Province from 2010 to 2022 simulated by the Random Forest model is as low as 3.0457%, and the simulation accuracy is as high as 96.9643%. Drought vulnerability prediction for Henan Province was also made for the period 2023–2025, showing an increasing trend. Relevant sectors should enhance awareness and early warning of drought, improve resilience, and take effective measures to reduce the socio-economic impact of drought.
... The algorithms to approximately compute SHAP values are either stochastic estimators [6,[8][9][10][11][12] or model-based approximators [6,[13][14][15][16][17][18]. Stochastic estimators, such as KernelSHAP [6], randomly subsample the feature subsets (T ,v T (x T )) and approximately solve a weighted-least squares problem to estimate I SV (i). ...
Preprint
Full-text available
With the rapid growth of black-box models in machine learning, Shapley values have emerged as a popular method for model explanations due to their theoretical guarantees. Shapley values locally explain a model to an input query using additive features. Yet, in genomics, extracting biological knowledge from black-box models hinges on explaining nonlinear feature interactions globally to hundreds to thousands of input query sequences. Herein, we develop SHAP zero, an algorithm that estimates all-order Shapley feature interactions with a near-zero cost per queried sequence after paying a one-time fee for model sketching. SHAP zero achieves this by establishing a surprisingly underexplored connection between the Shapley interactions and the Fourier transform of the model. Explaining two genomic models, one trained to predict guide RNA binding and the other to predict DNA repair outcomes, we demonstrate that SHAP zero achieves orders of magnitude reduction in amortized computational cost compared to state-of-the-art algorithms. SHAP zero reveals all microhomologous motifs that are predictive of DNA repair outcome, a finding previously inaccessible due to the combinatorial space of possible high-order feature interactions.
Chapter
While machine learning is one subfield of AI, this chapter discusses several methods from another subfield, semantic modelling. It supports the learning models and is an enabling technique for explainability. Synonyms, taxonomies and ontologies are introduced. They digitize the true meaning of something as knowledge models, allowing us to store this meaning and make it accessible for computer programs like our machine learning. Physics-informed approaches benefit from these knowledge models, as they can now automatically request transformations or topology changes based on the meaning of their variables. We show a sensitivity analysis based on perturbation theory that systematically scans the input of a learning model and provides analytical estimates for its output. The chapter concludes with a discussion about the different audiences and stakeholders of machine learning and AI. These audiences require different forms of explainability, different key performance indicators and different solution descriptions.
Article
The exponential growth of academic papers necessitates sophisticated classification systems to effectively manage and navigate vast information repositories. Despite the proliferation of such systems, traditional approaches often rely on embeddings that do not allow for easy interpretation of classification decisions, creating a gap in transparency and understanding. To address these challenges, we propose an innovative explainable paper classification system that combines Latent Semantic Analysis (LSA) for topic modeling with explainable artificial intelligence (XAI) techniques. Our objective is to identify which topics significantly influence the classification outcomes, incorporating Shapley additive explanations (SHAP) as a key XAI technique. Our system extracts topic assignments and word assignments from paper abstracts using latent semantic analysis (LSA) topic modeling. Topic assignments are then employed as embeddings in a multilayer perceptron (MLP) classification model, with the word assignments further utilized alongside SHAP for interpreting the classification results at the corpus, document, and word levels, enhancing interpretability and providing a clear rationale for each classification decision. We applied our model to a dataset from the Web of Science, specifically focusing on the field of nanomaterials. Our model demonstrates superior classification performance compared to several baseline models. Ultimately, our proposed model offers a significant advancement in both the performance and explainability of the system, validated by case studies that illustrate its effectiveness in real-world applications.
Article
Full-text available
This Letter proposes a new signature for confining dark sectors at the LHC. Under the assumption of a QCD-like hidden sector, hadronic jets containing stable dark bound states originating from hidden strong dynamics, known as semi-visible jets, could manifest in proton-proton collisions. In the proposed simplified model, a heavy ZZ' Z ′ mediator coupling to SM quarks allows the resonant production of dark quarks, subsequently hadronizing in stable and unstable dark bound states. The unstable dark bound states can then decay back to SM quarks via the same ZZ' Z ′ portal or photons via a lighter pseudo-scalar portal (such as an axion-like particle). This mechanism creates a new signature where semi-visible jets are enriched in non-isolated photons. We show that these exotic jets evade the phase space probed by current LHC searches exploiting jets or photons due to the expected high jet neutral electromagnetic fraction and photons candidates non-isolation, respectively. In the proposed analysis strategy to tackle such signature, we exploit jets as final state objects to represent the underlying QCD-like hidden sector. We show that, by removing any selection on the neutral electromagnetic fraction from the jet identification criteria, higher signal efficiency can be reached. To enhance the signal-to-background discrimination, we train a deep neural network as a jet tagger that exploits differences in the substructure of signal and background jets. We estimate that with the available triggers for Run 2 and this new strategy, a high mass search can claim a 5σ5 \sigma 5 σ discovery (exclusion) of the ZZ' Z ′ boson with a mass up to 5 TeV (5 TeV) with the full Run 2 data of the LHC when the fraction of unstable dark hadrons decaying to photons pairs is around 30%{30}{\%} 30 % , and with a coupling of the ZZ' Z ′ to SM quarks of 0.25.
Article
Full-text available
This paper demonstrates a design and evaluation approach for delivering real world efficacy of an explainable artificial intelligence (XAI) model. The first of its kind, it leverages three distinct but complementary frameworks to support a user-centric and context-sensitive, post-hoc explanation for fraud detection. Using the principles of scenario-based design, it amalgamates two independent real-world sources to establish a realistic card fraud prediction scenario. The SAGE (Settings, Audience, Goals and Ethics) framework is then used to identify key context-sensitive criteria for model selection and refinement. The application of SAGE reveals gaps in the current XAI model design and provides opportunities for further model development. The paper then employs a functionally-grounded evaluation method to assess its effectiveness. The resulting explanation represents real-world requirements more accurately than established models.
Article
Tree models have made an impressive progress during the past years, while an important problem is to understand how these models predict, in particular for critical applications such as finance and medicine. For this issue, most previous works measured the importance of individual features. In this work, we consider the interpretation of feature groups, which is more effective to capture intrinsic structures and correlations of multiple features. We propose the Baseline Group Shapley value (short for BGShapvalue) to calculate the importance of a feature group for tree models. We further develop a polynomial algorithm, BGShapTree, to deal with the sum of exponential terms in the BGShapvalue. The basic idea is to decompose the BGShapvalue into leaves’ weights and exploit the relationships between features and leaves. Based on this idea, we could greedily search salient feature groups with large BGShapvalues. Extensive experiments have validated the effectiveness of our approach, in comparison with state-of-the-art methods on the interpretation of tree models.
Article
Background Gallbladder cancer is often associated with poor prognosis, especially when patients experience early recurrence after surgery. Machine learning may improve prediction accuracy by analysing complex non-linear relationships. The aim of this study was to develop and evaluate a machine learning model to predict early recurrence risk after resection of gallbladder cancer. Methods In this cross-sectional study, patients who underwent resection of gallbladder cancer with curative intent between 2001 and 2022 were identified using an international database. Patients were assigned randomly to a development and an evaluation cohort. Four machine learning models were trained to predict early recurrence (within 12 months) and compared using the area under the receiver operating curve (AUC). Results Among 374 patients, 56 (15.0%) experienced early recurrence; most patients had T1 (51, 13.6%) or T2 (180, 48.1%) disease, and a subset had lymph node metastasis (120, 32.1%). In multivariable Cox analysis, resection margins (HR 2.34, 95% c.i. 1.55 to 3.80; P < 0.001), and greater AJCC T (HR 2.14, 1.41 to 3.25; P < 0.001) and N (HR 1.59, 1.05 to 2.42; P = 0.029) categories were independent predictors of early recurrence. The random forest model demonstrated the highest discrimination in the evaluation cohort (AUC 76.4, 95% c.i. 66.3 to 86.5), compared with XGBoost (AUC 74.4, 53.4 to 85.3), support vector machine (AUC 67.2, 54.4 to 80.0), and logistic regression (AUC 73.1, 60.6 to 85.7), as well as good accuracy after bootstrapping validation (AUC 75.3, 75.0 to 75.6). Patients classified as being at high versus low risk of early recurrence had much worse overall survival (36.1 versus 63.8% respectively; P < 0.001). An easy-to-use calculator was made available (https://catalano-giovanni.shinyapps.io/GallbladderER). Conclusion Machine learning-based prediction of early recurrence after resection of gallbladder cancer may help stratify patients, as well as help inform postoperative adjuvant therapy and surveillance strategies.
Article
In the context of unexpected disasters, comprehending individual decision‐making strategy is essential for governmental policy formulation and evacuation planning. The main challenges stem from the absence of large‐scale mobility data and the complex influences of multiple factors on personal strategies. To address this, this study focuses on the 2011 Tohoku earthquake in Japan as a case, which struck at 14:46 on March 11, a typical working day. Decision‐making strategies regarding how people return home from their workplaces have emerged as a primary concern. Therefore, we construct a large human mobility database for the Greater Tokyo area, and develop an empirical prediction for decision‐making strategies following earthquakes with the consideration of multiple factors. Except for users' location‐based information, we extract the grid‐based stay areas from historical stay records (normal days before disasters) and discover the functions of these areas combined with points of interest located within them. Experimental results indicate the effectiveness of our proposed framework in strategy prediction. Besides, we conduct an explainable feature importance analysis of key factors, which also provides insights for understanding human decisions during disasters.
Article
The relationship between solar potential on building façade and urban morphology at urban scale remains unclear, and the design of façade integrated photovoltaic (FIPV) lacks evidence. This study investigates high-rise, high-density commercial districts in Hong Kong (HK), using Random Forest algorithm combined with the SHapley Additive exPlanations method to assess the importance of urban morphology on solar irradiance on building facades. The results indicate that plot ratio, building floor, building density, and perimeter shape factor are the four key parameters influencing solar irradiance, with the importance rate and contribution value of the four parameters reach 43.9% and 48.7%, respectively. Based on these parameters and actual urban blocks in HK, typical urban typologies were constructed. Four scenarios were generated with plot ratio as the control parameter. The positions, amounts, and transparency of PV glass on the south and east facades were optimized to minimize the payback period and maximize power generation, using Non-dominated Sorting Genetic Algorithm II algorithm. The south façade of Scenario 1 (when the heights of surrounding buildings are lower than that of the targeted building) obtained the optimal payback period (8.44 years) and power generation (55961 kWh), with 77 opaque PV panels and 49 semi-transparent ones.
Article
Full-text available
We introduce NitroNet, a deep learning model for the prediction of tropospheric NO2 profiles from satellite column measurements. NitroNet is a neural network trained on synthetic NO2 profiles from the regional chemistry and transport model WRF-Chem, which was operated on a European domain for the month of May 2019. This WRF-Chem simulation was constrained by in situ and satellite measurements, which were used to optimize important simulation parameters (e.g. the boundary layer scheme). The NitroNet model receives NO2 vertical column densities (VCDs) from the TROPOspheric Monitoring Instrument (TROPOMI) and ancillary variables (meteorology, emissions, etc.) as input, from which it reproduces NO2 concentration profiles. Training of the neural network is conducted on a filtered dataset, meaning that NO2 profiles showing strong disagreement (>20 %) with colocated TROPOMI column measurements are discarded. We present a first evaluation of NitroNet over a variety of geographical and temporal domains (Europe, the US West Coast, India, and China) and different seasons. For this purpose, we validate the NO2 profiles predicted by NitroNet against satellite, in situ, and MAX-DOAS (Multi-Axis Differential Optical Absorption Spectroscopy) measurements. The training data were previously validated against the same datasets. During summertime, NitroNet shows small biases and strong correlations with all three datasets: a bias of +6.7% and R=0.95 for TROPOMI NO2 VCDs, a bias of -10.5% and R=0.75 for AirBase surface concentrations, and a bias of -34.3% to +99.6% with R=0.83–0.99 for MAX-DOAS measurements. In comparison to TROPOMI satellite data, NitroNet even shows significantly lower errors and stronger correlation than a direct comparison with WRF-Chem numerical results. During wintertime considerable low biases arise because the summertime/late-spring training data are not fully representative of all atmospheric wintertime characteristics (e.g. longer NO2 lifetimes). Nonetheless, the wintertime performance of NitroNet is surprisingly good and comparable to that of classic regional chemistry and transport models. NitroNet can demonstrably be used outside the geographic and temporal domain of the training data with only slight performance reductions. What makes NitroNet unique when compared to similar existing deep learning models is the inclusion of synthetic model data, which offers important benefits: due to the lack of NO2 profile measurements, models trained on empirical datasets are limited to the prediction of surface concentrations learned from in situ measurements. NitroNet, however, can predict full tropospheric NO2 profiles. Furthermore, in situ measurements of NO2 are known to suffer from biases, often larger than +20 %, due to cross-sensitivities to photooxidants, which other models trained on empirical data inevitably reproduce.
Article
Full-text available
As the adoption of explainable AI (XAI) continues to expand, the urgency to address its privacy implications intensifies. Despite a growing corpus of research in AI privacy and explainability, there is little attention on privacy-preserving model explanations. This article presents the first thorough survey about privacy attacks on model explanations and their countermeasures. Our contribution to this field comprises a thorough analysis of research papers with a connected taxonomy that facilitates the categorization of privacy attacks and countermeasures based on the targeted explanations. This work also includes an initial investigation into the causes of privacy leaks. Finally, we discuss unresolved issues and prospective research directions uncovered in our analysis. This survey aims to be a valuable resource for the research community and offers clear insights for those new to this domain. To support ongoing research, we have established an online resource repository, which will be continuously updated with new and relevant findings.
Preprint
Full-text available
In response to the formidable challenge of China's substantial carbon emissions, this study introduces a comprehensive research paradigm that integrates "modeling + SHAP analysis + scenario forecasting" from the perspective of machine learning. Utilizing carbon emission data spanning from 1997 to 2021, we have constructed a machine learning model and conducted an in-depth analysis of the key factors influencing carbon emissions. Based on current national policies, predictions for carbon emissions have been made. Firstly, factors affecting carbon emissions were selected in accordance with the principle of data availability. Secondly, by calculating the Spearman correlation coefficients, nine explanatory variables including the share of coal in total energy consumption and urbanization rate, had correlation coefficients of 0.6 or higher and significantly correlated with China's carbon emissions. Subsequently, the contribution of each explanatory variable in the optimal model was quantified using the SHAP method, revealing that energy intensity and urbanization rate are the key factors affecting China's carbon emissions, exerting negative and positive impacts, respectively. Finally, through policy scenario simulation, the trend of China's carbon emissions from 2022 to 2030 was predicted. The study indicates that China's carbon emissions plateau from 2022 to 2028 and peak in 2028, with an estimated carbon emission volume of approximately 9,720 million tons in 2030.
Article
Importance The results of prediction models that stratify patients with sepsis and risk of resistant gram-negative bacilli (GNB) infections inform treatment guidelines. However, these models do not extrapolate well across hospitals. Objective To assess whether patient case mix and local prevalence rates of resistance contributed to the variable performance of a general risk stratification GNB sepsis model for community-onset and hospital-onset sepsis across hospitals. Design, Setting, and Participants This was a retrospective cohort study conducted from January 2016 and October 2021. Adult patients with sepsis at 10 acute-care hospitals in rural and urban areas across Missouri and Illinois were included. Inclusion criteria were blood cultures indicating sepsis, having received 4 days of antibiotic treatment, and having organ dysfunction (vasopressor use, mechanical ventilation, increased creatinine or bilirubin levels, and thrombocytopenia). Analyses were completed in April 2024. Exposure The model included demographic characteristics, comorbidities, vital signs, laboratory values, procedures, and medications administered. Main Outcomes and Measures Culture results were stratified for ceftriaxone-susceptible GNB (SS), ceftriaxone-resistant but cefepime-susceptible GNB (RS), and ceftriaxone- and cefepime-resistant GNB (RR). Negative cultures and other pathogens were labeled SS. Deep learning models were developed separately for community-onset (patient presented with sepsis) and hospital-onset (sepsis developed ≥48 hours after admission) sepsis. The models were tested across hospitals and patient subgroups. Models were assessed using area under the receiver operating characteristic curve (AUROC) and area under precision recall curve (AUPRC). Results A total of 39 893 patients with 85 238 sepsis episodes (43 207 [50.7%] community onset; 42 031 [48.3%] hospital onset) were included. Median (IQR) age was 65 (54-74) years, 21 241 patients (53.2%) were male, and 18 830 (47.2%) had a previous episode of sepsis. RS contributed to 3.9% (1667 episodes) and 5.7% (2389 episodes) of community-onset and hospital-onset sepsis episodes, respectively, and RR contributed to 1.8% (796 episodes) and 3.9% (1626 episodes), respectively. Previous infections and exposure to antibiotics were associated with the risk of resistant GNB. For example, in community-onset sepsis, 375 RR episodes (47.1%), 420 RS episodes (25.2%) and 3483 of 40 744 (8.5%) SS episodes were among patients with resistance to antimicrobial drugs (P < .001). The AUROC and AUPRC results varied across hospitals and patient subgroups for both community-onset and hospital-onset sepsis. AUPRC values correlated with the prevalence rates of resistant GNB ( R = 0.79; P = .001). Conclusions and Relevance In this cohort study of 39 893 patients with sepsis, variable model performance was associated with prevalence rates of antimicrobial resistance rather than patient case mix. This variability suggests caution is needed when using generalized models for predicting resistant GNB etiologies in sepsis.
Preprint
In many industrial applications, it is common that the graph embeddings generated from training GNNs are used in an ensemble model where the embeddings are combined with other tabular features (e.g., original node or edge features) in a downstream ML task. The tabular features may even arise naturally if, e.g., one tries to build a graph such that some of the node or edge features are stored in a tabular format. Here we address the problem of explaining the output of such ensemble models for which the input features consist of learned neural graph embeddings combined with additional tabular features. We propose MBExplainer, a model-agnostic explanation approach for downstream models with augmented graph embeddings. MBExplainer returns a human-legible triple as an explanation for an instance prediction of the whole pipeline consisting of three components: a subgraph with the highest importance, the topmost important nodal features, and the topmost important augmented downstream features. A game-theoretic formulation is used to take the contributions of each component and their interactions into account by assigning three Shapley values corresponding to their own specific games. Finding the explanation requires an efficient search through the corresponding local search spaces corresponding to each component. MBExplainer applies a novel multilevel search algorithm that enables simultaneous pruning of local search spaces in a computationally tractable way. In particular, three interweaved Monte Carlo Tree Search are utilized to iteratively prune the local search spaces. MBExplainer also includes a global search algorithm that uses contextual bandits to efficiently allocate pruning budget among the local search spaces. We show the effectiveness of MBExplainer by presenting a set of comprehensive numerical examples on multiple public graph datasets for both node and graph classification tasks.
Article
Purpose Limited studies in the mobile payment segment have attempted at understanding the factors that resist customers from using financial apps or mobile payment services (MPSs). This study aims at identifying the barriers from online customer reviews and examine how these barriers affect customers’ negative emotions (anger, fear, sadness), customer ratings and recommendation intentions. Design/methodology/approach This study, divided into three phases, has adopted a text-mining based mixed-method approach on 14,043 reviews present in Google PlayStore or App Store pages about financial apps used in India. Findings Phase 1 identified barriers like, “bad user experience”, “UPI failure”, “trust issues”, “transaction delays” from the reviews. Phase 2 found that “bad user experience” and “UPI failure” trigger both “anger” and “sadness”. “Transaction delays” and “money lost in transaction” stimulate “fear”. From the IRT stance, in Phase 3 this study has found that barriers like, “transaction error”, “UPI failure” (usage), “bad user experience” (image) and “trust issues” (tradition) have a significant negative impact on both customer ratings and recommendation intention. Originality/value The current study contributes to the existing literature on MPSs by identifying barriers from user generated content. Additionally, this study has also examined the impact of the barriers on customers’ negative emotions and recommendation intention.
Article
Elevated conductivity (i.e., specific conductance or SC) causes osmotic stress in freshwater aquatic organisms and may increase the toxicity of some contaminants. Indices of benthic macroinvertebrate integrity have declined in urban areas across the Chesapeake Bay watershed (CBW), and more information is needed about whether these declines may be due to elevated conductivity. A predictive SC model for the CBW was developed using monitoring data from the National Water Quality Portal. Predictor variables representing SC sources were compiled for nontidal reaches across the CBW. Random forests modeling was conducted to predict SC at four time periods (1999–2001, 2004–2006, 2009–2011, and 2014–2016), which were then compared to a national data set of background SC to quantify departures from background SC. Carbonate geology, impervious cover, forest cover, and snow depth were the most important variables for predicting SC. Observations and modeled results showed snow depth amplified the effect of impervious cover on SC. Elevated SC was predicted in two-thirds of reaches in the CBW, and these elevated conditions persisted over time in many areas. These results can be used in stressor identification assessments to prioritize future monitoring and to determine where management activities could be implemented to reduce salinization.
Article
In recent years, we have made several improvements to the geometric parameterization method, the surrogate model method, and the sampling method, with the goal of making the traditional surrogate model-based optimization method applicable to aerodynamic optimization of hundreds of parameters with reasonable computational cost. However, increasing the number of control parameters raises two additional issues. First, the impeller geometry becomes too complex to ensure the required mechanical performance. Second, the optimization mechanism becomes difficult to understand with too many parameters. To address the first issue, this paper builds a multidisciplinary optimization platform to achieve the optimization of a large flow coefficient mixed-flow impeller under 140 control parameters, resulting in a significant improvement in both aerodynamic and mechanical performance. To address the latter, a novel machine learning interpretation tool, Shapley Additive Explanations (SHAP), is introduced in this paper. Using this methodology, the contribution of all 140 parameter values in the final optimal impeller to each aspect of the performance improvement is presented in this paper, providing the first in-depth understanding of the intricate mechanisms involved in the multidisciplinary optimization of hundreds of control parameters.
Conference Paper
Full-text available
This paper reviews methods for evaluating and analyzing the understandability of classification models in the context of data mining. The motivation for this study is the fact that the majority of previous work on evaluation and optimization of classification models has focused on assessing or increasing the accuracy of the models and thus user-oriented properties such as comprehensibility and understandability have been largely overlooked. We conduct a quantitative survey to examine the concept of understandability from the user’s point of view. The survey results are analyzed using the analytic hierarchy process (AHP) to rank models according to their understandability. The results indicate that decision tree models are perceived as more understandable than rulebased models. Using the survey results regarding understandability of a number of models in conjunction with quantitative measurements of the complexity of the models, we are able to establish a negative correlation between the complexity and understandability of the classification models, at least for one of the two studied data sets.
Article
Full-text available
The process of automatically extracting novel, useful and ultimately comprehensible information from large databases, known as data mining, has become of great importance due to the ever-increasing amounts of data collected by large organizations. In particular, the emphasis is devoted to heuristic search methods able to discover patterns that are hard or impossible to detect using standard query mechanisms and classical statistical techniques. In this paper an evolutionary system capable of extracting explicit classification rules is presented. Special interest is dedicated to find easily interpretable rules that may be used to make crucial decisions. A comparison with the findings achieved by other methods on a real problem, the breast cancer diagnosis, is performed.
Article
Full-text available
In this paper, we describe the first practical application of two methods, which bridge the gap between the non-expert user and machine learning models. The first is a method for explaining classifiers’ predictions, which provides the user with additional information about the decision-making process of a classifier. The second is a reliability estimation methodology for regression predictions, which helps the users to decide to what extent to trust a particular prediction. Both methods are successfully applied to a novel breast cancer recurrence prediction data set and the results are evaluated by expert oncologists. KeywordsData mining-Machine learning-Breast cancer-Classification explanation-Prediction reliability
Article
Full-text available
Appropriate guidelines for controls in B2C (business-to-consumer) applications (hereafter B2C controls) should be provided such that these guidelines accomplish efficiency of controls in the context of specific system environments, given that many resources and skills are required for the implementation of such controls.This study uses a two-step process for the assessment of B2C controls, i.e., efficiency analysis and recommendation of controls. First, using a data envelopment analysis (DEA) model, the study analyzes the efficiency of B2C controls installed by three groups of organizations: financial firms, retail firms, and information service providers. The B2C controls are composed of controls for system continuity, access controls, and communication controls. DEA model uses B2C controls as input and three variables of implementation of B2C applications, i.e., volume, sophistication, and information contents as output. Second, decision trees are used to determine efficient firms and generate rules for recommending levels of controls.The results of the investigation of the DEA model indicate that retail firms and information service providers implement B2C controls more efficiently than financial firms do. Controls for system continuity are implemented more efficiently than access controls. In financial firms, controls for system continuity, communication controls, and access controls, in a descending order, are efficiently adopted in B2C applications.Every company can determine its relative level of reduction in each component of controls in order to make the control system efficient. The firms that efficiently implement B2C controls are determined using a decision tree model. The decision tree model is further used to recommend the level of controls and suggest rules for controls recommendation. This suggests the possibility of using decision trees for controls assessment in B2C applications.
Conference Paper
Full-text available
Machine-learned classifiers are important components of many data mining and knowledge discovery systems. In several application domains, an explanation of the classifier's reasoning is critical for the classifier's acceptance by the end-user. We describe a framework, ExplainD, for explaining decisions made by classifiers that use additive evidence. ExplainD applies to many widely used classifiers, including linear discriminants and many additive models. We demonstrate our ExplainD framework using implementations of naïve Bayes, linear support vector machine, and logistic regression classifiers on example applications. ExplainD uses a simple graphical explanation of the classification process to provide visualizations of the classifier decisions, visualization of the evidence for those decisions, the capability to speculate on the effect of changes to the data, and the capability, wherever possible, to drill down and audit the source of the evidence. We demonstrate the effectiveness of ExplainD in the context of a deployed web-based system (Proteome Analyst) and using a downloadable Python-based implementation.
Conference Paper
Full-text available
Besides good predictive performance, the naive Bayesian classifier can also offer a valuable insight into the structure of the training data and effects of the attributes on the class probabilities. This structure may be effectively revealed through visualization of the classifier. We propose a new way to visualize the naive Bayesian model in the form of a nomogram. The advantages of the proposed method are simplicity of presentation, clear display of the effects of individual attribute values, and visualization of confidence intervals. Nomograms are intuitive and when used for decision support can provide a visual explanation of predicted probabilities. And finally, with a nomogram, a naive Bayesian model can be printed out and used for probability prediction without the use of computer or calculator.
Article
Full-text available
More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on Source-Forge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Article
Full-text available
We propose a simple yet potentially very effective way of visualizing trained support vector machines. Nomograms are an established model visualization technique that can graphically encode the complete model on a single page. The dimensionality of the visualization does not depend on the number of attributes, but merely on the properties of the kernel. To represent the effect of each predictive feature on the log odds ratio scale as required for the nomograms, we employ logistic regression to convert the distance from the separating hyperplane into a probability. Case studies on selected data sets show that for a technique thought to be a black-box, nomograms can clearly expose its internal structure. By providing an easy-to-interpret visualization the analysts can gain insight and study the effects of predictive factors.
Article
Full-text available
On account of the enormous amounts of rules that can be produced by data mining algorithms, knowledge post-processing is a difficult stage in an association rule discovery process. In order to find relevant knowledge for decision making, the user (a decision maker specialized in the data studied) needs to rummage through the rules. To assist him/her in this task, we here propose the rule-focusing methodology, an interactive methodology for the visual post-processing of association rules. It allows the user to explore large sets of rules freely by focusing his/her attention on limited subsets. This new approach relies on rule interestingness measures, on a visual representation, and on interactive navigation among the rules. We have implemented the rule-focusing methodology in a prototype system called ARVis. It exploits the user's focus to guide the generation of the rules by means of a specific constraint-based rule-mining algorithm.
Conference Paper
Full-text available
This paper presents a method to interpret the output of a classification (or regression) model. The interpretation is based on two concepts: the variable importance and the value importance of the variable. Unlike most of the state of art interpretation methods, our approach allows the interpretation of the model output for every instance. Understanding the score given by a model for one instance can for example lead to an immediate decision in a customer relational management (CRM) system. Moreover the proposed method does not depend on a particular model and is therefore usable for any model or software used to produce the scores.
Article
Full-text available
We present a method for explaining predictions for individual instances. The presented approach is general and can be used with all classification models that output probabilities. It is based on the decomposition of a model's predictions on individual contributions of each attribute. Our method works for the so-called black box models such as support vector machines, neural networks, and nearest neighbor algorithms, as well as for ensemble methods such as boosting and random forests. We demonstrate that the generated explanations closely follow the learned models and present a visualization technique that shows the utility of our approach and enables the comparison of different prediction methods.
Article
Context-aware intelligent systems employ implicit inputs, and make decisions based on complex rules and machine learning models that are rarely clear to users. Such lack of system intelligibility can lead to loss of user trust, satisfaction and acceptance of these systems. However, automatically providing explanations about a system"s decision process can help mitigate this problem. In this paper we present results from a controlled study with over 200 participants in which the effectiveness of different types of explanations was examined. Participants were shown examples of a system"s operation along with various automatically generated explanations, and then tested on their understanding of the system. We show, for example, that explanations describing why the system behaved a certain way resulted in better understanding and stronger feelings of trust. Explanations describing why the system did not behave a certain way, resulted in lower understanding yet adequate performance. We discuss implications for the use of our findings in real-world context-aware applications.
Article
Many quantitative problems in science, engineering, and economics are nowadays solved via statistical sampling on a computer. Such Monte Carlo methods can be used in three different ways: (1) to generate random objects and processes in order to observe their behavior, (2) to estimate numerical quantities by repeated sampling, and (3) to solve complicated optimization problems through randomized algorithms. WIREs Comp Stat 2012, 4:48–58. doi: 10.1002/wics.194 For further resources related to this article, please visit the WIREs website.
Article
Corporate credit rating analysis has attracted lots of research interests in the literature. Recent studies have shown that Artificial Intelligence (AI) methods achieved better performance than traditional statistical methods. This article introduces a relatively new machine learning technique, support vector machines (SVM), to the problem in attempt to provide a model with better explanatory power. We used backpropagation neural network (BNN) as a benchmark and obtained prediction accuracy around 80% for both BNN and SVM methods for the United States and Taiwan markets. However, only slight improvement of SVM was observed. Another direction of the research is to improve the interpretability of the AI-based models. We applied recent research results in neural network model interpretation and obtained relative importance of the input financial variables from the neural network models. Based on these results, we conducted a market comparative analysis on the differences of determining factors in the United States and Taiwan markets.
Article
Credit card fraud is a serious and growing problem. While predictive models for credit card fraud detection are in active use in practice, reported studies on the use of data mining approaches for credit card fraud detection are relatively few, possibly due to the lack of available data for research. This paper evaluates two advanced data mining approaches, support vector machines and random forests, together with the well-known logistic regression, as part of an attempt to better detect (and thus control and prosecute) credit card fraud. The study is based on real-life data of transactions from an international credit card operation.
Article
In this paper we develop a polynomial method based on sampling theory that can be used to estimate the Shapley value (or any semivalue) for cooperative games. Besides analyzing the complexity problem, we examine some desirable statistical properties of the proposed approach and provide some computational results.
Conference Paper
We propose a method for explaining regression models and their predictions for individual instances. The method successfully reveals how individual features influence the model and can be used with any type of regression model in a uniform way. We used different types of models and data sets to demonstrate that the method is a useful tool for explaining, comparing, and identifying errors in regression models.
Article
Although successful in medical diagnostic problems, inductive learning systems were not widely accepted in medical practice. In this paper two different approaches to machine learning in medical applications are compared: the system for inductive learning of decision trees Assistant, and the naive Bayesian classifier. Both methodologies were tested in four medical diagnostic problems: localization of primary tumor, prognostics of recurrence of breast cancer, diagnosis of thyroid diseases, and rheumatology. The accuracy of automatically acquired diagnostic knowledge from stored data records is compared, and the interpretation of the knowledge and the explanation ability of the classification process of each system is discussed. Surprisingly, the naive Bayesian classifier is superior to Assistant in classification accuracy and explanation ability, while the interpretation of the acquired knowledge seems to be equally valuable. In addition, two extensions to naive Bayesian classifier are briefly described: dealing with continuous attributes, and discovering the dependencies among attributes.
Article
We present a general method for explaining individual predictions of classification models. The method is based on fundamental concepts from coalitional game theory and predictions are explained with contributions of individual feature values. We overcome the method's initial exponential time complexity with a sampling-based approximation. In the experimental part of the paper we use the developed method on models generated by several well-known machine learning algorithms on both synthetic and real-world data sets. The results demonstrate that the method is efficient and that the explanations are intuitive and useful.
Article
While there is a growing professional interest on the application of Benford's law and “digit analysis” in financial fraud detection, there has been relatively little academic research to demonstrate its efficacy as a decision support tool in the context of an analytical review procedure pertaining to a financial audit. We conduct a numerical study using a genetically optimized artificial neural network. Building on an earlier work by others of a similar nature, we assess the benefits of Benford's law as a useful classifier in segregating naturally occurring (i.e. non-concocted) numbers from those that are made up. Alongside the frequency of the first and second significant digits and their mean and standard deviation, a posited set of ‘non-digit’ input variables categorized as “information theoretic”, “distance-based” and “goodness-of-fit” measures, help to minimize the critical classification errors that can lead to an audit failure. We come up with the optimal network structure for every instance corresponding to a 3×3 Manipulation–Involvement matrix that is drawn to depict the different combinations of the level of sophistication in data manipulation by the perpetrators of a financial fraud and also the extent of collusive involvement.
Article
The application of a diagnostic or prognostic Multiple Logistic Function (MLF) in medical practice may, depending on the complexity of the model, require considerable arithmetic. Various methods for the elimination of such arithmetic, e.g. sets of tables or nomograms, have been proposed. An alternative method of eliminating the necessary arithmetic is described here. It is based on the principle of the familiar slide-rule. As an example, the design of a slide-rule for the evaluation of a diagnostic model for acute myocardial infarction is described. The slide-rule method allows for the evaluation of logistic models with complex linear combinations in the exponent. Adequate devices can be produced at low cost.
Article
Few published studies have combined clinical prognostic factors into risk profiles that can be used to predict the likelihood of recurrence or metastatic progression in patients following treatment of prostate cancer. We developed a nomogram that allows prediction of disease recurrence through use of preoperative clinical factors for patients with clinically localized prostate cancer who are candidates for treatment with a radical prostatectomy. By use of Cox proportional hazards regression analysis, we modeled the clinical data and disease follow-up for 983 men with clinically localized prostate cancer whom we intended to treat with a radical prostatectomy. Clinical data included pretreatment serum prostate-specific antigen levels, biopsy Gleason scores, and clinical stage. Treatment failure was recorded when there was clinical evidence of disease recurrence, a rising serum prostate-specific antigen level (two measurements of 0.4 ng/mL or greater and rising), or initiation of adjuvant therapy. Validation was performed on a separate sample of 168 men, also from our institution. Treatment failure (i.e., cancer recurrence) was noted in 196 of the 983 men, and the patients without failure had a median follow-up of 30 months (range, 1-146 months). The 5-year probability of freedom from failure for the cohort was 73% (95% confidence interval = 69%-76%). The predictions from the nomogram appeared accurate and discriminating, with a validation sample area under the receiver operating characteristic curve (i.e., comparison of the predicted probability with the actual outcome) of 0.79. A nomogram has been developed that can be used to predict the 5-year probability of treatment failure among men with clinically localized prostate cancer treated with radical prostatectomy.
Visualizing the simple Bayesian classier. KDD workshop on issues in the integration of data mining and data visualization
  • B Becker
  • R Kohavi
  • D Sommerfield
Becker B, Kohavi R, Sommerfield D (1997) Visualizing the simple Bayesian classier. KDD workshop on issues in the integration of data mining and data visualization
A value for n-person games, vol II of Contributions to the theory of games
  • L S Shapley
Shapley LS (1953) A value for n-person games, vol II of Contributions to the theory of games. Princeton University Press, Princeton
Explanation and reliability of breast cancer recurrence predictions
  • E Štrumbelj
  • B C Bosni´
  • Kononenko
Štrumbelj E, Bosni´ B, Grašič-Kuhar C, Kononenko I (2010) Explanation and reliability of breast cancer recurrence predictions. Knowl Inf Syst 24(2):305–324
The datgen dataset generator
  • G Melli
Uci machine learning repository
  • A Frank