Article

Explaining prediction models and individual predictions with feature contributions

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We present a sensitivity analysis-based method for explaining prediction models that can be applied to any type of classification or regression model. Its advantage over existing general methods is that all subsets of input features are perturbed, so interactions and redundancies between features are taken into account. Furthermore, when explaining an additive model, the method is equivalent to commonly used additive model-specific methods. We illustrate the method's usefulness with examples from artificial and real-world data sets and an empirical analysis of running times. Results from a controlled experiment with 122 participants suggest that the method's explanations improved the participants' understanding of the model.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Feature attribution is one of the most widely used approaches in machine learning (ML) explainability, begin implemented with a variety of different methods [64,56,57]. Moreover, the use of Shapley values [60] for feature attribution ranks among the most popular solutions [64,65,48,17,47], offering a widely accepted theoretical justification on how to assign importance to features in machine learning (ML) model predictions. Despite the success of using Shapley values for explainability, it is also the case that their exact computation is in general intractable [8,21,22], with tractability results for some families of boolean circuits [8]. ...
... Shapley values have been extensively used for explaining the predictions of ML models, e.g. [64,65,20,48,15,52,62,69], among a vast number of recent examples. The complexity of computing Shapley values (as proposed in SHAP [48]) has been studied in recent years [8,21,7,22]. ...
... It should be underscored that Shapley values for explainability are not expected to give misleading information. Indeed, it is widely accepted that Shapley values measure the actual influence of a feature[64,65,48,8,21]. Concretely,[64] reads: "...if a feature has no influence on the prediction it is assigned a contribution of 0." But[64] also reads "According to the 2nd axiom, if two features values have an identical influence on the prediction they are assigned contributions of equal size. ...
Preprint
Full-text available
Recent work demonstrated the existence of Boolean functions for which Shapley values provide misleading information about the relative importance of features in rule-based explanations. Such misleading information was broadly categorized into a number of possible issues. Each of those issues relates with features being relevant or irrelevant for a prediction, and all are significant regarding the inadequacy of Shapley values for rule-based explainability. This earlier work devised a brute-force approach to identify Boolean functions, defined on small numbers of features, and also associated instances, which displayed such inadequacy-revealing issues, and so served as evidence to the inadequacy of Shapley values for rule-based explainability. However, an outstanding question is how frequently such inadequacy-revealing issues can occur for Boolean functions with arbitrary large numbers of features. It is plain that a brute-force approach would be unlikely to provide insights on how to tackle this question. This paper answers the above question by proving that, for any number of features, there exist Boolean functions that exhibit one or more inadequacy-revealing issues, thereby contributing decisive arguments against the use of Shapley values as the theoretical underpinning of feature-attribution methods in explainability.
... We gained more insight into model behaviour and how network structure impacted epidemic outcomes on individual networks, including by calculating Shapley values [38]. Shapley values use a game theoretic approach to play off variables in the model with each other based on their contribution to the prediction [39]. ...
... Shapley values use a game theoretic approach to play off variables in the model with each other based on their contribution to the prediction [39]. Calculating Shapley values involves solving a system of linear equations to assign each feature a unique weight based on its contribution to the predicted output and its interaction with other features [38]. For example, negative Shapley values indicate that the observed value 'contributed to the prediction' by reducing the proportion infected or time to peak in an outbreak for a particular network. ...
Article
Full-text available
Predicting what factors promote or protect populations from infectious disease is a fundamental epidemiological challenge. Social networks, where nodes represent hosts and edges represent direct or indirect contacts between them, are important in quantifying these aspects of infectious disease dynamics. However, how network structure and epidemic parameters interact in empirical networks to promote or protect animal populations from infectious disease remains a challenge. Here we draw on advances in spectral graph theory and machine learning to build predictive models of pathogen spread on a large collection of empirical networks from across the animal kingdom. We show that the spectral features of an animal network are powerful predictors of pathogen spread for a variety of hosts and pathogens and can be a valuable proxy for the vulnerability of animal networks to pathogen spread. We validate our findings using interpretable machine learning techniques and provide a flexible web application for animal health practitioners to assess the vulnerability of a particular network to pathogen spread.
... These measures are computed by exploiting the predictive ability of a supervised ML model. Among these techniques, the most commons are the permutation-based measures (Breiman 2001;Hooker et al. 2021), the Shapley value-based measures (Shapley 1952;Š trumbelj and Kononenko 2014;Ribeiro et al. 2016;Casalicchio et al. 2019) and the graphical tool-based measures (Greenwell et al. 2018;Borgonovo et al. 2023 (Unpublished paper)). ...
... The importance measure is defined by considering all possible subsets of features and taking into account all possible combinations. In this work, we consider Shapleybased feature importance SbFI (Shapley 1952;Š trumbelj and Kononenko 2014;Ribeiro et al. 2016) and Shapley feature importance SFIMP (Casalicchio et al. 2019). Both measures are based on the notion of Shapley values that have gained popularity due to their attractive fairness properties, as described in (Lundberg and Lee 2017). ...
Article
Full-text available
Discriminating the role of input variables in a hydrological system or in a multivariate hydrological study is particularly useful. Nowadays, emerging tools, called feature importance measures, are increasingly being applied in hydrological applications. In this study, we propose a virtual experiment to fully understand the functionality and, most importantly, the usefulness of these measures. Thirteen importance measures related to four general classes of methods are quantitatively evaluated to reproduce a benchmark importance ranking. This benchmark ranking is designed using a linear combination of ten random variables. Synthetic time series with varying distribution, cross-correlation, autocorrelation and random noise are simulated to mimic hydrological scenarios. The obtained results clearly suggest that a subgroup of three feature importance measures (Shapley-based feature importance, derivative-based measure, and permutation feature importance) generally provide reliable rankings and outperform the remaining importance measures, making them preferable in hydrological applications.
... SHAP uses the SHAP value to measure the impact of the characteristics of a complex model. The SHAP value is defined as the weighted average of the marginal contributions [21]. It can be used to explain any type of predictive model for classification or regression [21]. ...
... The SHAP value is defined as the weighted average of the marginal contributions [21]. It can be used to explain any type of predictive model for classification or regression [21]. Figure 2 shows a summary plot of the SHAP values of our proposed hepatitis diagnostic model. ...
Article
Full-text available
Background Hepatitis C is a prevalent disease that poses a high risk to the human liver. Early diagnosis of hepatitis C is crucial for treatment and prognosis. Therefore, developing an effective medical decision system is essential. In recent years, many computational methods have been proposed to identify hepatitis C patients. Although existing hepatitis prediction models have achieved good results in terms of accuracy, most of them are black-box models and cannot gain the trust of doctors and patients in clinical practice. As a result, this study aims to use various Machine Learning (ML) models to predict whether a patient has hepatitis C, while also using explainable models to elucidate the prediction process of the ML models, thus making the prediction process more transparent. Result We conducted a study on the prediction of hepatitis C based on serological testing and provided comprehensive explanations for the prediction process. Throughout the experiment, we modeled the benchmark dataset, and evaluated model performance using fivefold cross-validation and independent testing experiments. After evaluating three types of black-box machine learning models, Random Forest (RF), Support Vector Machine (SVM), and AdaBoost, we adopted Bayesian-optimized RF as the classification algorithm. In terms of model interpretation, in addition to using common SHapley Additive exPlanations (SHAP) to provide global explanations for the model, we also utilized the Local Interpretable Model-Agnostic Explanations with stability (LIME_stabilitly) to provide local explanations for the model. Conclusion Both the fivefold cross-validation and independent testing show that our proposed method significantly outperforms the state-of-the-art method. IHCP maintains excellent model interpretability while obtaining excellent predictive performance. This helps uncover potential predictive patterns of the model and enables clinicians to better understand the model's decision-making process.
... Definition 1 (probabilistic anomaly attribution). Given a black-box regression model = ( ) and observed test sample(s), model-agnostic training-data-free baseline-input-free -sensitive built-in UQ reference point LIME [33] yes yes yes no yes/no infinitesimal vicinity SV [41,42] yes no yes no no globally distributional IG [37,44] yes yes no no no arbitrary EIG [6] yes no yes no no globally distributional Z-score [5] yes no yes no no global mean of predictors LC [20] yes yes yes yes no maximum likelihood point GPA yes yes yes yes yes maximum a posteriori point compute the distribution of the score for each input variable indicative of the extent to which that variable is responsible for the sample being anomalous. ...
... The -score is a standard univariate outlier detection metric in the unsupervised setting, and is defined as ≜ ( − )/ for the -th variable, where , are the mean and the standard deviation of , respectively. In SV, we used the same sampling scheme as that proposed in [42] with the number of configurations limited to 100. In IG and EIG, we used the trapezoidal rule with 100 equally-spaced intervals to perform the integration w.r.t. . ...
Preprint
Full-text available
We address the task of probabilistic anomaly attribution in the black-box regression setting, where the goal is to compute the probability distribution of the attribution score of each input variable, given an observed anomaly. The training dataset is assumed to be unavailable. This task differs from the standard XAI (explainable AI) scenario, since we wish to explain the anomalous deviation from a black-box prediction rather than the black-box model itself. We begin by showing that mainstream model-agnostic explanation methods, such as the Shapley values, are not suitable for this task because of their ``deviation-agnostic property.'' We then propose a novel framework for probabilistic anomaly attribution that allows us to not only compute attribution scores as the predictive mean but also quantify the uncertainty of those scores. This is done by considering a generative process for perturbations that counter-factually bring the observed anomalous observation back to normalcy. We introduce a variational Bayes algorithm for deriving the distributions of per variable attribution scores. To the best of our knowledge, this is the first probabilistic anomaly attribution framework that is free from being deviation-agnostic.
... Random Forest uses the bagging method, and the fundamental concept behind the Bagging method is to create several individual predictors, each of which makes independent predictions. The final prediction of the integration model is obtained by averaging or majority voting of the predictions made by these individual evaluators [38,39]. In this paper, the construction and evaluation of the model are realized by the use of the Random Forest algorithm. ...
... Shapley Additive Explanation (SHAP) is an additive explanation model that draws inspiration from cooperative game theory and can provide detailed interpretations of the output of any machine learning model. In SHAP, all input features of a machine learning model are seen as "contributors" to the final prediction and are assessed based on their individual impact on the output [38,39]. The SHAP value is the value that is assigned to each feature in the instance, quantifying the contribution of the feature to the predicted outcome. ...
Article
Full-text available
In the context of globalization in the mining industry, assessing the production feasibility of mining projects by smart technology is crucial for the improvement of mining development efficiency. However, evaluating the feasibility of such projects faces significant challenges due to incomplete data and complex variables. In recent years, the development of big data technology has offered new possibilities for rapidly evaluating mining projects. This study conducts an intelligent evaluation of gold mines based on global mineral resources data to estimate whether a gold mine project can be put into production. A technical workflow is constructed, including data filling, evaluation model construction, and production feasibility evaluation. Based on the workflow, the missing data is filled in by the Miceforest imputation algorithm first. The evaluation model is established based on the Random Forest model to quantitatively predict the feasibility of the mining project being put into production, and important features of the model are extracted using Shapley Additive explanation(SHAP). This workflow may enhance the efficiency and accuracy of quantitative production feasibility evaluation for mining projects, with an accuracy rate increased from 93.80% to 95.99%. Results suggest that the features of estimated mine life and gold ore grade have the most significant impact on production feasibility.
... Definition 1 (probabilistic anomaly attribution). Given a black-box regression model = ( ) and observed test sample(s), model-agnostic training-data-free baseline-input-free -sensitive built-in UQ reference point LIME [33] yes yes yes no yes/no infinitesimal vicinity SV [41,42] yes no yes no no globally distributional IG [37,44] yes yes no no no arbitrary EIG [6] yes no yes no no globally distributional Z-score [5] yes no yes no no global mean of predictors LC [20] yes yes yes yes no maximum likelihood point GPA yes yes yes yes yes maximum a posteriori point compute the distribution of the score for each input variable indicative of the extent to which that variable is responsible for the sample being anomalous. ...
... The -score is a standard univariate outlier detection metric in the unsupervised setting, and is defined as ≜ ( − )/ for the -th variable, where , are the mean and the standard deviation of , respectively. In SV, we used the same sampling scheme as that proposed in [42] with the number of configurations limited to 100. In IG and EIG, we used the trapezoidal rule with 100 equally-spaced intervals to perform the integration w.r.t. . ...
... • We evaluate the proposed model on two sentiment classification datasets with BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019), and the experimental results demonstrate the faithfulness of our explanation model. (Štrumbelj and Kononenko, 2014) are proposed to efficiently approximate Shapley value. However, these methods do not explain how feature interactions contribute to model predictions, which fails to address model's learning capability from high-order feature interactions. ...
... we adopt Monte Carlo sampling (Štrumbelj and Kononenko, 2014) to approximate ϕ T 2 (T 1 ) as ...
... Where g is the explanation model, z′ is the simplified features vector, M is the maximum simplified features size and ϕj ∈ R is the feature attribution for a feature j [20,21]. ...
Article
Full-text available
The health situation caused by the SARS-Cov2 coronavirus, posed major challenges for the scientific community. Advances in artificial intelligence are a very useful resource, but it is important to determine which symptoms presented by positive cases of infection are the best predictors. A machine learning approach was used with data from 5,434 people, with eleven symptoms: breathing problems, dry cough, sore throat, running nose, history of asthma, chronic lung, headache, heart disease, hypertension, diabetes, and fever. Based on public data from Kaggle with WHO standardized symptoms. A model was developed to detect COVID-19 positive cases using a simple machine learning model. The results of 4 loss functions and by SHAP values, were compared. The best loss function was Binary Cross Entropy, with a single hidden layer configuration with 10 neurons, achieving an F1 score of 0.98 and the model was rated with an area under the curve of 0.99 aucROC.
... It calculates a specific feature's marginal contribution to the model and derives the average marginal contribution based on the differences in these features. In addition to expressing the impact of individual features in AROT, it can also express the impact of feature groups and the synergistic effects that exist between each feature [32][33]. ...
Article
Full-text available
Wake re-categorization (RECAT) has been implemented to improve runway capacity, and consequently, aircraft arrival runway occupancy time has become a crucial factor influencing runway capacity. Accurate prediction of the runway occupancy time can assist controllers in determining aircraft separation, thereby enhancing the operational efficiency of the runway. In this study, the GA–PSO algorithm is utilized to optimize the Back Propagation neural network prediction model using Quick access recorder data from various domestic airports, achieving high-precision prediction. Additionally, the SHapley Additive explanation model is applied to quantify the effect of each characteristic parameter on the arrival runway occupancy time, resulting in the prediction of aircraft arrival runway occupancy time. This model can provide a foundation for improving runway operation efficiency and technical support for the design of airport runway and taxiway structure.
... The model interpretability gap was addressed using the Shapley values technique. 78, 79 Shapley values is a model-agnostic technique based on game theory to fairly distribute the effect of the explanatory variables on the prediction. 24 For a single prediction, it returns the average expected marginal contribution of each explanatory variable in relation to the average prediction of the reference dataset. ...
... This method includes a partial dependence plot [12] [13], which visualizes the estimated value and impact of each feature, and an individual conditional explanation [14], which evaluates the importance of a feature by randomly reordering or removing certain features, permutation feature importance [15], and leave-one-featureout (LOFO) importance [16] [17]. In [18], a feature importance extraction method based on sensitivity analysis was presented. In this method, feature importance is extracted by varying the input features of all subsets to reflect the interaction and redundancy among the features. ...
Article
Full-text available
Data-driven decision-making has become pervasive in interpretive machine learning and explainable AI (XAI). Some existing interpretable machine learning and explainable AI methods have utilized a forward problem to derive how the prediction and estimation output results of a black-box model change with respect to the input. However, when seeking explanations for black-box models, it is often crucial to address the inverse problem of understanding why the prediction and estimation output results are derived for a given input. Because some existing methods of generating explanations for the forward problem lead to non-intuitive explanations, we hypothesize that solving the inverse problem of the black-box model would yield more intuitive explanations. We propose approximate inverse model explanations (AIME), which provide unified global and local feature importance by deriving the approximate inverse operators of the black-box model. We also propose a representative instance similarity distribution plot that uses representative estimation instances selected by the inverse operator to understand the predictive behavior of the model and the target dataset. It also visualizes the similarity distribution with the target dataset to demonstrate how a particular prediction is related to other predictions. Furthermore, AIME can estimate the local and global feature importance and provide a new interpretation by visualizing the similarity distribution between representative estimation instances and the target dataset.
... Shapley value assumes all the players collaborate. According to (Štrumbelj and Kononenko, 2014), Shapley values used in DL predictive models (e.g., RNN and GRU) can quantify the contribution of each feature in the predictive model. Overall, the Shapley value for feature X j in a predictive model can be written as follows: ...
Article
Gully erosion possess a serious hazard to critical resources such as soil, water, and vegetation cover within watersheds. Therefore, spatial maps of gully erosion hazards can be instrumental in mitigating its negative consequences. Among the various methods used to explore and map gully erosion, advanced learning techniques, especially deep learning (DL) models, are highly capable of spatial mapping and can provide accurate predictions for generating spatial maps of gully erosion at different scales (e.g., local, regional, continental, and global). In this paper, we applied two DL models, namely a simple recurrent neural network (RNN) and a gated recurrent unit (GRU), to map land susceptibility to gully erosion in the Shamil-Minab plain, Hormozgan province, southern Iran. To address the inherent black box nature of DL models, we applied three novel interpretability methods consisting of SHaply Additive explanation (SHAP), ceteris paribus and partial dependence (CP-PD) profiles and permutation feature importance (PFI). Using the Boruta algorithm, we identified seven important features that control gully erosion: soil bulk density, clay content, elevation, land use type, vegetation cover, sand content, and silt content. These features, along with an inventory map of gully erosion (based on a 70 % training dataset and 30 % test dataset), were used to generate spatial maps of gully erosion using DL models. According to the Kolmogorov-Smirnov (KS) statistic performance assessment measure, the simple RNN model (with KS = 91.6) outperformed the GRU model (with KS = 66.6). Based on the results from the simple RNN model, 7.4 %, 14.5 %, 18.9 %, 31.2 % and 28 % of total area of the plain were classified as very-low, low, moderate, high and very-high hazard classes, respectively. According to SHAP plots, CP-PD profiles, and PFI measures, soil silt content, vegetation cover (NDVI) and land use type had the highest impact on the model's output. Overall, the DL modelling techniques and interpretation methods used in this study proved to be helpful in generating spatial maps of soil erosion hazard, especially gully erosion. Their interpretability can support watershed sustainable management.
... The final feature subset could be determined by (1) selecting the top 10 features from Boruta-RF rank based on the ranked ANOVA F-values, resulting in a narrow spectral band (ensemble SB) or (2) selecting the top 10 least collinear features out of the Boruta-RF rank, which was computed via the same collinearity removal process in the PCA FS framework (ensemble LC). SHAP, [54][55][56][57][58][59] one of the state of the art tools in ML explainability, was performed to explain and quantify individual contributions of selected features to model prediction base on cooperative game theory. The SHAP value was proposed by Lundberg and Lee as a unified measure to represent additive feature importance, which was the average outcomes of marginal contributions from individual features over all possible feature permutations. ...
Article
Significance: Wavelength selection from a large diffuse reflectance spectroscopy (DRS) dataset enables removal of spectral multicollinearity and thus leads to improved understanding of the feature domain. Feature selection (FS) frameworks are essential to discover the optimal wavelengths for tissue differentiation in DRS-based measurements, which can facilitate the development of compact multispectral optical systems with suitable illumination wavelengths for clinical translation. Aim: The aim was to develop an FS methodology to determine wavelengths with optimal discriminative power for orthopedic applications, while providing the frameworks for adaptation to other clinical scenarios. Approach: An ensemble framework for FS was developed, validated, and compared with frameworks incorporating conventional algorithms, including principal component analysis (PCA), linear discriminant analysis (LDA), and backward interval partial least squares (biPLS). Results: Via the one-versus-rest binary classification approach, a feature subset of 10 wavelengths was selected from each framework yielding comparable balanced accuracy scores (PCA: 94.8±3.47%, LDA: 98.2±2.02%, biPLS: 95.8±3.04%, and ensemble: 95.8±3.16%) to those of using all features (100%) for cortical bone versus the rest class labels. One hundred percent balanced accuracy scores were generated for bone cement versus the rest. Different feature subsets achieving similar outcomes could be identified due to spectral multicollinearity. Conclusions: Wavelength selection frameworks provide a means to explore domain knowledge and discover important contributors to classification in spectroscopy. The ensemble framework generated a model with improved interpretability and preserved physical interpretation, which serves as the basis to determine illumination wavelengths in optical instrumentation design.
... [116]. They used the preliminary information available from the work proposed by Štrumbelj and Kononenko [117] to design a new set of values called the expectation Shapley (ES) values which were able to unify and justify a broad spectrum of approaches (e.g., LIME, DeepLift, and Layer-Wise Relevance Propagation) for black-box model interpretations. ...
Article
Full-text available
Recent years have seen a tremendous growth in Artificial Intelligence (AI)-based methodological development in a broad range of domains. In this rapidly evolving field, large number of methods are being reported using machine learning (ML) and Deep Learning (DL) models. Majority of these models are inherently complex and lacks explanations of the decision making process causing these models to be termed as 'Black-Box'. One of the major bottlenecks to adopt such models in mission-critical application domains, such as banking, e-commerce, healthcare, and public services and safety, is the difficulty in interpreting them. Due to the rapid proleferation of these AI models, explaining their learning and decision making process are getting harder which require transparency and easy predictability. Aiming to collate the current state-of-the-art in interpreting the black-box models, this study provides a comprehensive analysis of the explainable AI (XAI) models. To reduce false negative and false positive outcomes of these back-box models, finding flaws in them is still difficult and inefficient. In this paper, the development of XAI is reviewed meticulously through careful selection and analysis of the current state-of-the-art of XAI research. It also provides a comprehensive and in-depth evaluation of the XAI frameworks and their efficacy to serve as a starting point of XAI for applied and theoretical researchers. Towards the end, it highlights emerging and critical issues pertaining to XAI research to showcase major, model-specific trends for better explanation, enhanced transparency, and improved prediction accuracy.
... While saliency maps offer some qualitative explanation of CNN function, connecting NeuriteNet's image classification and scoring to quantified neuron measurements would further corroborate and strengthen the use of CNNs for investigations into neurite growth morphology [27,28]. However, while this kind of analysis can be relatively intuitive when performed on deep learning models trained using tabular data, where the input features are discrete values that can be directly examined [SHAP, LIME] [29,30] It is much more difficult to do so for models trained using image data, where the input features can be millions of pixels that the model has learned to extract higher-order features from. Thus, in this study, we seek to further validate NeuriteNet by first using concept vectors [TCAV] to determine the relationship between these higher-order features and the morphological measurements generated from semi-automated, full-neurite tracing [31,32]. ...
Article
Full-text available
Quantitative analysis of neurite growth and morphology is essential for understanding the determinants of neural development and regeneration, however, it is complicated by the labor-intensive process of measuring diverse parameters of neurite outgrowth. Consequently, automated approaches have been developed to study neurite morphology in a high-throughput and comprehensive manner. These approaches include computer-automated algorithms known as 'convolutional neural networks' (CNNs)—powerful models capable of learning complex tasks without the biases of hand-crafted models. Nevertheless, their complexity often relegates them to functioning as 'black boxes.' Therefore, research in the field of explainable AI is imperative to comprehend the relationship between CNN image analysis output and predefined morphological parameters of neurite growth in order to assess the applicability of these machine learning approaches. In this study, drawing inspiration from the field of automated feature selection, we investigate the correlation between quantified metrics of neurite morphology and the image analysis results from NeuriteNet—a CNN developed to analyze neurite growth. NeuriteNet accurately distinguishes images of neurite growth based on different treatment groups within two separate experimental systems. These systems differentiate between neurons cultured on different substrate conditions and neurons subjected to drug treatment inhibiting neurite outgrowth. By examining the model's function and patterns of activation underlying its classification decisions, we discover that NeuriteNet focuses on aspects of neuron morphology that represent quantifiable metrics distinguishing these groups. Additionally, it incorporates factors that are not encompassed by neuron morphology tracing analyses. NeuriteNet presents a novel tool ideally suited for screening morphological differences in heterogeneous neuron groups while also providing impetus for targeted follow-up studies.
... eSHAP analysis of PrePROTAC model identifies key residues contributing to PROTAC activities SHAP (SHapley Additive exPlanations) [80] [81] [82] values are widely used to explain machine learning models. TreeExplainer in the shap python package was applied to get the SHAP values for the features used in the PrePROTAC model. ...
Article
Full-text available
Proteolysis-targeting chimeras (PROTACs) are hetero-bifunctional molecules that induce the degradation of target proteins by recruiting an E3 ligase. PROTACs have the potential to inactivate disease-related genes that are considered undruggable by small molecules, making them a promising therapy for the treatment of incurable diseases. However, only a few hundred proteins have been experimentally tested for their amenability to PROTACs, and it remains unclear which other proteins in the entire human genome can be targeted by PROTACs. In this study, we have developed PrePROTAC, an interpretable machine learning model based on a transformer-based protein sequence descriptor and random forest classification. PrePROTAC predicts genome-wide targets that can be degraded by CRBN, one of the E3 ligases. In the benchmark studies, PrePROTAC achieved a ROC-AUC of 0.81, an average precision of 0.84, and over 40% sensitivity at a false positive rate of 0.05. When evaluated by an external test set which comprised proteins from different structural folds than those in the training set, the performance of PrePROTAC did not drop significantly, indicating its generalizability. Furthermore, we developed an embedding SHapley Additive exPlanations (eSHAP) method, which extends conventional SHAP analysis for original features to an embedding space through in silico mutagenesis. This method allowed us to identify key residues in the protein structure that play critical roles in PROTAC activity. The identified key residues were consistent with existing knowledge. Using PrePROTAC, we identified over 600 novel understudied proteins that are potentially degradable by CRBN and proposed PROTAC compounds for three novel drug targets associated with Alzheimer's disease.
... In recent years, the development of explainable artificial intelligence techniques has provided many tools to help model users understand the behavior of the trained model. Researchers tend to focus on the degree to which each feature of the model input affects the model output, often referred to as feature importance [33], and in linear regression models, feature importance is measured in the form of variable coefficients. Although decision tree-based feature importance quantification methods have been proposed for a long time, the traditional interpretation methods based on the importance of decision tree features are often misleading [34]. ...
Article
Full-text available
Analyzing monitoring data to recognize structural anomalies is a typical intelligent application of structural safety monitoring, which is of great significance to hydraulic engineering operational management. Many regression modeling methods have been developed to describe the complex statistical relationships between engineering safety monitoring points, which in turn can be used to recognize abnormal data. However, existing studies are devoted to introducing the correlation between adjacent response points to improve prediction accuracy, ignoring the detrimental effects on anomaly recognition, especially the pseudo-regression problem. In this paper, an anomaly recognition method is proposed from the perspective of causal inference to realize the best exploitation of various types of monitoring information in model construction, including four steps of constructing causal graph, regression modeling, model interpretation, and anomaly recognition. In regression modeling stage, two deconfounding machine learning models, two-stage boosted regression trees and copula debiased boosted regression trees, are proposed for recovering the causal effects of correlated response points. The validation was carried out with Shanmen River culvert monitoring data, and experiment results showed that the proposed method in this paper has better anomaly recognition compared to existing regression modeling methods, as shown by lower false alarm rates and lower averaged missing alarm rates under different structural anomaly scenarios.
... Taking the test temperature as an example, it is impossible to know whether the test temperature leads to a decrease or an increase in the final performance. Therefore, the SHAP (Shapley Additive Explanation) value is introduced to analyze how features specifically affect performance [36][37][38][39]. The principle of the SHAP value is based on the concept of "Shapley values" in the theory of positive cooperative games. ...
Article
Full-text available
As the fourth paradigm of materials research and development, the materials genome paradigm can significantly improve the efficiency of research and development for austenitic stainless steel. In this study, by collecting experimental data of austenitic stainless steel, the chemical composition of austenitic stainless steel is optimized by machine learning and a genetic algorithm, so that the production cost is reduced, and the research and development of new steel grades is accelerated without reducing the mechanical properties. Specifically, four machine learning prediction models were established for different mechanical properties, with the gradient boosting regression (gbr) algorithm demonstrating superior prediction accuracy compared to other commonly used machine learning algorithms. Bayesian optimization was then employed to optimize the hyperparameters in the gbr algorithm, resulting in the identification of the optimal combination of hyperparameters. The mechanical properties prediction model established at this stage had good prediction accuracy on the test set (yield strength: R2 = 0.88, MAE = 4.89 MPa; ultimate tensile strength: R2 = 0.99, MAE = 2.65 MPa; elongation: R2 = 0.84, MAE = 1.42%; reduction in area: R2 = 0.88, MAE = 1.39%). Moreover, feature importance and Shapley Additive Explanation (SHAP) values were utilized to analyze the interpretability of the performance prediction models and to assess how the features influence the overall performance. Finally, the NSGA-III algorithm was used to simultaneously maximize the mechanical property prediction models within the search space, thereby obtaining the corresponding non-dominated solution set of chemical composition and achieving the optimization of austenitic stainless-steel compositions.
... SHAP is a game theory-based machine learning interpretation technique that unifies common additive feature attribution methods with better computational performance and better alignment with human intuition [30,50]. ...
Article
Full-text available
Accurate estimation of terrestrial water storage (TWS) and understanding its driving factors are crucial for effective hydrological assessment and water resource management. The launches of the Gravity Recovery and Climate Experiment (GRACE) satellites and their successor, GRACE Follow-On (GRACE-FO), combined with deep learning algorithms, have opened new avenues for such investigations. In this study, we employed a long short-term memory (LSTM) neural network model to simulate TWS anomaly (TWSA) in the Pearl River Basin (PRB) from 2003 to 2020, using precipitation, temperature, runoff, evapotranspiration, and leaf area index (LAI) data. The performance of the LSTM model was rigorously evaluated, achieving a high average correlation coefficient (r) of 0.967 and an average Nash–Sutcliffe efficiency (NSE) coefficient of 0.912 on the testing set. To unravel the relative importance of each driving factor and assess the impact of different lead times, we employed the SHapley Additive exPlanations (SHAP) method. Our results revealed that precipitation exerted the most significant influence on TWSA in the PRB, with a one-month lead time exhibiting the greatest impact. Evapotranspiration, runoff, temperature, and LAI also played important roles, with interactive effects among these factors. Moreover, we observed an accumulation effect of precipitation and evapotranspiration on TWSA, particularly with shorter lead times. Overall, the SHAP method provides an alternative approach for the quantitative analysis of natural driving factors at the basin scale, shedding light on the natural dominant influences on TWSA in the PRB. The combination of satellite observations and deep learning techniques holds promise for advancing our understanding of TWS dynamics and enhancing water resource management strategies.
... Explainability and interpretability have become important topics within machine learning. Explanations can take many forms, including sets of input features [307], linear combinations of input features [289], or generated natural language sentences [332]. Given that transparency is one of the key strengths of KR systems, it should come as no surprise that ideas from KR often play a central role in this context. ...
Preprint
Knowledge Representation and Reasoning is a central, longstanding, and active area of Artificial Intelligence. Over the years it has evolved significantly; more recently it has been challenged and complemented by research in areas such as machine learning and reasoning under uncertainty. In July 2022 a Dagstuhl Perspectives workshop was held on Knowledge Representation and Reasoning. The goal of the workshop was to describe the state of the art in the field, including its relation with other areas, its shortcomings and strengths, together with recommendations for future progress. We developed this manifesto based on the presentations, panels, working groups, and discussions that took place at the Dagstuhl Workshop. It is a declaration of our views on Knowledge Representation: its origins, goals, milestones, and current foci; its relation to other disciplines, especially to Artificial Intelligence; and on its challenges, along with key priorities for the next decade.
... where E(w i X i ) is the mean effect estimate for feature i [21]. This influence value phi i (x) depends on the instance x and is clearly not the same as the importance w i . ...
Preprint
Full-text available
When used in the context of decision theory, feature importance expresses how much changing the value of a feature can change the model outcome (or the utility of the outcome), compared to other features. Feature importance should not be confused with the feature influence used by most state-of-the-art post-hoc Explainable AI methods. Contrary to feature importance, feature influence is measured against a reference level or baseline. The Contextual Importance and Utility (CIU) method provides a unified definition of global and local feature importance that is applicable also for post-hoc explanations, where the value utility concept provides instance-level assessment of how favorable or not a feature value is for the outcome. The paper shows how CIU can be applied to both global and local explainability, assesses the fidelity and stability of different methods, and shows how explanations that use contextual importance and contextual utility can provide more expressive and flexible explanations than when using influence only.
... They visualize feature relevance by comparing network output between input and a modified copy of the input [22]. Occlusion sensitivity [22] and shapley value sampling [140] are two of the most commonly used perturbation-based techniques. ...
Preprint
Full-text available
Artificial Intelligence in Medicine has made significant progress with emerging applications in medical imaging, patient care, and other areas. While these applications have proven successful in retrospective studies, very few of them were applied in practice.The field of Medical AI faces various challenges, in terms of building user trust, complying with regulations, using data ethically.Explainable AI (XAI) aims to enable humans understand AI and trust its results. This paper presents a literature review on the recent developments of XAI solutions for medical decision support, based on a representative sample of 198 articles published in recent years. The systematic synthesis of the relevant articles resulted in several findings. (1) model-agnostic XAI techniques were mostly employed in these solutions, (2) deep learning models are utilized more than other types of machine learning models, (3) explainability was applied to promote trust, but very few works reported the physicians participation in the loop, (4) visual and interactive user interface is more useful in understanding the explanation and the recommendation of the system. More research is needed in collaboration between medical and AI experts, that could guide the development of suitable frameworks for the design, implementation, and evaluation of XAI solutions in medicine.
... Thus, SHAP analysis aims to rationalize a prediction of a specific value by consideration of the feature contributions. The individual values of features in the data set can be interpreted as players in a coalition game (Lundberg and Lee, 2017;Štrumbelj and Kononenko, 2014). In more detail, the algorithm works as follows (Lundberg and Lee, 2017). ...
Article
Full-text available
We present explainable machine learning approaches for gaining deeper insights into the solubilization processes of inclusion bodies. The machine learning model with the highest prediction accuracy for the protein yield is further evaluated with regard to Shapley additive explanation (SHAP) values in terms of feature importance studies. Our results highlight an inverse fractional relationship between the protein yield and total protein concentration. Further correlations can also be observed for the dominant influences of the urea concentration and the underlying pH values. All findings are used to develop an analytical expression that is in reasonable agreement with experimental data. The resulting master curve highlights the benefits of explainable machine learning approaches for the detailed understanding of certain biopharmaceutical manufacturing steps.
Article
Full-text available
The practice of crime risk mapping, enabled by the utilization of geospatial big data such as street view images, has received significant research attention. However, in situations where available data is scarce, mapping models may suffer from underfitting and generate inaccurate spatial pattern estimations of crime risk. The covert nature of pickpocketing crimes results in limited observed areas relevant to such criminal events, leading to insufficient coverage of geospatial data. Moreover, the location of crime is also influenced by socio-economic characteristics that may introduce biases into crime risk estimates. These factors render it challenging for the model to capture a valid crime risk pattern, potentially yielding misleading conclusions. Therefore, effectively extracting crime risk with limited data remains a challenge, especially when relying on easily accessible, widespread, and unbiased geospatial data. To address this challenge, we propose a novel crime risk assessment framework based on deep anomaly detection techniques, assuming that urban landscape anomalies carry deep crime risk information. We take Shenzhen as the study area and map the distribution of pickpocketing risk using street view images, accurately revealing the spatial aggregation of pickpocketing crime risk. Our findings indicate that pickpocketing crime in China is caused by regional economic conditions, built environment factors, and human routine activities. This study provides valuable insights for policing and prevention strategies aimed at addressing pickpocketing crimes in large Chinese cities. By leveraging our proposed crime risk assessment framework, decision-makers can allocate resources more efficiently and develop targeted interventions to mitigate crime risks.
Article
Background: Early prediction of dementia risk is crucial for effective interventions. Given the known etiologic heterogeneity, machine learning methods leveraging multimodal data, such as clinical manifestations, neuroimaging biomarkers, and well-documented risk factors, could predict dementia more accurately than single modal data. Objective: This study aims to develop machine learning models that capitalize on neuropsychological (NP) tests, magnetic resonance imaging (MRI) measures, and clinical risk factors for 10-year dementia prediction. Methods: This study included participants from the Framingham Heart Study, and various data modalities such as NP tests, MRI measures, and demographic variables were collected. CatBoost was used with Optuna hyperparameter optimization to create prediction models for 10-year dementia risk using different combinations of data modalities. The contribution of each modality and feature for the prediction task was also quantified using Shapley values. Results: This study included 1,031 participants with normal cognitive status at baseline (age 75±5 years, 55.3% women), of whom 205 were diagnosed with dementia during the 10-year follow-up. The model built on three modalities demonstrated the best dementia prediction performance (AUC 0.90±0.01) compared to single modality models (AUC range: 0.82–0.84). MRI measures contributed most to dementia prediction (mean absolute Shapley value: 3.19), suggesting the necessity of multimodal inputs. Conclusion: This study shows that a multimodal machine learning framework had a superior performance for 10-year dementia risk prediction. The model can be used to increase vigilance for cognitive deterioration and select high-risk individuals for early intervention and risk management.
Chapter
The widespread use of AI in various industries has been facilitated by advancements in machine learning and neural networks. To shed light on the workings of opaque data-driven algorithms, several mathematical methods have emerged, such as the Shapley value, tree models, and Taylor expansion. Among these, the Shapley value stands out as a popular perturbation method, garnering significant attention. While calculating Shapley values is known to be an NP-hard problem, some researchers have introduced approximate techniques to alleviate this challenge. However, striking a balance between accuracy and time cost remains difficult, particularly as the number of players involved increases. In this paper, we propose a novel approach that efficiently computes Shapley values using fewer high-quality coalition samples, relying on the relationship map.
Chapter
Interpretable machine learning has demonstrated impressive performance while preserving explainability. In particular, neural additive models (NAM) offer the interpretability to the black-box deep learning and achieve state-of-the-art accuracy among the large family of generalized additive models. In order to empower NAM with feature selection and improve the generalization, we propose the sparse neural additive models (SNAM) that employ the group sparsity regularization (e.g. Group LASSO), where each feature is learned by a sub-network whose trainable parameters are clustered as a group. We study the theoretical properties for SNAM with novel techniques to tackle the non-parametric truth, thus extending from classical sparse linear models such as the LASSO, which only works on the parametric truth. Specifically, we show that SNAM with subgradient and proximal gradient descents provably converges to zero training loss as \(t\rightarrow \infty \), and that the estimation error of SNAM vanishes asymptotically as \(n\rightarrow \infty \). We also prove that SNAM, similar to LASSO, can have exact support recovery, i.e. perfect feature selection, with appropriate regularization. Moreover, we show that the SNAM can generalize well and preserve the ‘identifiability’, recovering each feature’s effect. We validate our theories via extensive experiments and further testify to the good accuracy and efficiency of SNAM (Appendix can be found at https://arxiv.org/abs/2202.12482.).
Article
Full-text available
The Long Short-Term Memory (LSTM) neural network model is an effective deep learning approach for predicting streamflow, and the investigation of the interpretability of deep learning models in streamflow prediction is of great significance for model transfer and improvement. In this study, four key hydrological stations in the Xijiang River Basin (XJB) in South China are taken as examples, and the performance of the LSTM model and its variant models in runoff prediction were evaluated under the same foresight period, and the impacts of different foresight periods on the prediction results were investigated based on the SHapley Additive exPlanations (SHAP) method to explore the interpretability of the LSTM model in runoff prediction. The results showed that (1) LSTM was the optimal model among the four models in the XJB; (2) the predicted results of the LSTM model decreased with the increase in foresight period, with the Nash–Sutcliffe efficiency coefficient (NSE) decreasing by 4.7% when the foresight period increased from one month to two months, and decreasing by 3.9% when the foresight period increased from two months to three months; (3) historical runoff had the greatest impact on streamflow prediction, followed by precipitation, evaporation, and the North Pacific Index (NPI); except evaporation, all the others were positively correlated. The results can provide a reference for monthly runoff prediction in the XJB.
Chapter
There has been growing interest in causal explanations of stochastic, sequential decision-making systems. Structural causal models and causal reasoning offer several theoretical benefits when exact inference can be applied. Furthermore, users overwhelmingly prefer the resulting causal explanations over other state-of-the-art systems. In this work, we focus on one such method, MeanRESP, and its approximate versions that drastically reduce compute load and assign a responsibility score to each variable, which helps identify smaller sets of causes to be used as explanations. However, this method, and its approximate versions in particular, lack deeper theoretical analysis and broader empirical tests. To address these shortcomings, we provide three primary contributions. First, we offer several theoretical insights on the sample complexity and error rate of approximate MeanRESP. Second, we discuss several automated metrics for comparing explanations generated from approximate methods to those generated via exact methods. While we recognize the significance of user studies as the gold standard for evaluating explanations, our aim is to leverage the proposed metrics to systematically compare explanation-generation methods along important quantitative dimensions. Finally, we provide a more detailed discussion of MeanRESP and how its output under different definitions of responsibility compares to existing widely adopted methods that use Shapley values.KeywordsCausal InferenceExplainable AIMDPs
Article
Diffusion tensor imaging (DTI) and diffusion kurtosis imaging (DKI) have been previously used to explore white matter related to human immunodeficiency virus (HIV) infection. While DTI and DKI suffer from low specificity, the Combined Hindered and Restricted Model of Diffusion (CHARMED) provides additional microstructural specificity. We used these three models to evaluate microstructural differences between 35 HIV-positive patients without neurological impairment and 20 healthy controls who underwent diffusion-weighted imaging using three b-values. While significant group effects were found in all diffusion metrics, CHARMED and DKI analyses uncovered wider involvement (80% vs. 20%) of all white matter tracts in HIV infection compared with DTI. In restricted fraction (FR) analysis, we found significant differences in the left corticospinal tract, middle cerebellar peduncle, right inferior cerebellar peduncle, right corticospinal tract, splenium of the corpus callosum, left superior cerebellar peduncle, left superior cerebellar peduncle, pontine crossing tract, left posterior limb of the internal capsule, and left/right medial lemniscus. These are involved in language, motor, equilibrium, behavior, and proprioception, supporting the functional integration that is frequently impaired in HIV-positivity. Additionally, we employed a machine learning algorithm (XGBoost) to discriminate HIV-positive patients from healthy controls using DTI and CHARMED metrics on an ROIwise basis, and unique contributions to this discrimination were examined using Shapley Explanation values. The CHARMED and DKI estimates produced the best performance. Our results suggest that biophysical multishell imaging, combining additional sensitivity and built-in specificity, provides further information about the brain microstructural changes in multimodal areas involved in attentive, emotional and memory networks often impaired in HIV patients.
Article
This study aimed to develop machine learning based quantitative structure biodegradability relationship (QSBR) models for predicting primary and ultimate biodegradation rates of organic chemicals, which are essential parameters for environmental risk assessment. For this purpose, experimental primary and ultimate biodegradation rates of high consistency were compiled for 173 organic compounds. A significant number of descriptors were calculated with a collection of quantum/computational chemistry software and tools to achieve comprehensive representation and interpretability. Following a pre-screening process, multiple QSBR models were developed for both primary and ultimate endpoints using three algorithms: extreme gradient boosting (XGBoost), support vector machine (SVM), and multiple linear regression (MLR). Furthermore, a unified QSBR model was constructed using the knowledge transfer technique and XGBoost. Results demonstrated that all QSBR models developed in this study had good performance. Particularly, SVM models exhibited high level of goodness of fit (coefficient of determination on the training set of 0.973 for primary and 0.980 for ultimate), robustness (leave-one-out cross-validated coefficient of 0.953 for primary and 0.967 for ultimate), and external predictive ability (external explained variance of 0.947 for primary and 0.958 for ultimate). The knowledge transfer technique enhanced model performance by learning from properties of two biodegradation endpoints. Williams plots were used to visualize the application domains of the models. Through SHapley Additive exPlanations (SHAP) analysis, this study identified key features affecting biodegradation rates. Notably, MDEO-12, APC2D1_C_O, and other features contributed to primary biodegradation, while AATS0v, AATS2v, and others inhibited it. For ultimate biodegradation, features like No. of Rotatable Bonds, APC2D1_C_O, and minHBa were contributors, while C1SP3, Halogen Ratio, GGI4, and others hindered the process. Also, the study quantified the contributions of each feature in predictions for individual chemicals. This research provides valuable tools for predicting both primary and ultimate biodegradation rates while offering insights into the mechanisms.
Article
Full-text available
An in-depth understanding of a key element such as lake evaporation is particularly beneficial in developing the optimal management approach for reservoirs. In this study, we first aim to evaluate the applicability of regressors Random Forest (RF), Gradient Booting (GB), and Decision Tree (DT), K Nearest Neighbor (kNN), and XGBoost architectures to predict daily lake evaporation of five reservoirs in the Awash River basin, Ethiopia. The best performing models, Gradient Boosting and XGBoost, are then explained through an explanatory framework using daily climate datasets. The inter-pretability of the models was evaluated using the Shapley Additive explanations (SHAP). The GB model performed better with (RMSE = 0.045, MSE = 0.031, MAE = 0.002, NSE = 0.997, KGF = 0.991, RRMSE = 0.011) for Metehara Station, (RMSE = 0.032, MSE = 0.024, MAE = 0.001, NSE = 0.998, KGF = 0.999, RRMSE = 0.008) at Melkasa Station, and Dubti Station (RMSE = 0.13, MSE = 0.09, MAE = 0.017, NSE = 0.982, KGF = 0.977,RRMSE = 0.022) as the same as of XGBoost. The factors with the greatest overall impact on the daily evaporation for GB and XGboost Architecture were the SH, month, Tmax, and Tmin for Metehara and Melkasa, and Tmax, Tmin, and month had the greatest impact on the daily evaporation for Dubti. Furthermore, the interpretability of the models showed good agreement between the MLAs simulations and the actual hydro-climatic evaporation process. This result allows decision makers to not only rely on the results of an algorithm, but to make more informed decisions by using interpretable results for better control of the basin reservoir operating rules.
Chapter
Machine Learning is now commonly used to model complex phenomena, providing robust predictions and data exploration analysis. However, the lack of explanations for predictions leads to a black box effect which the domain called Explainability (XAI) attempts to overcome. In particular, XAI local attribution methods quantify the contribution of each attribute on each instance prediction, named influences. This type of explanation is the most precise as it focuses on each instance of the dataset and allows the detection of individual differences. Moreover, all local explanations can be aggregated to get further analysis of the underlying data. In this context, influences can be seen as new data space to understand and reveal complex data patterns. We then hypothesise that influences obtained through ML modelling are more informative than the original raw data, particularly in identifying homogeneous groups. The most efficient way to identify such groups is to consider a clustering approach. We thus compare clusters based on raw data against those based on influences (computed through several XAI local attribution methods). Our results indicate that clusters based on influences perform better than those based on raw data, even with low-accuracy models.KeywordsExplainable Artificial Intelligence (XAI)Instance clusteringPrediction explanationMachine learning explanation
Article
Full-text available
The traditional support vector machine (SVM) requires manual feature extraction to improve classification performance and relies on the expressive power of manually extracted features. However, this characteristic poses limitations in complex Industrial Internet of Things (IIoT) environments. Traditional manual feature extraction may fail to capture all relevant information, thereby restricting the application effectiveness of SVM in IIoT settings. CNN-RNN, as a deep learning network capable of simultaneously extracting spatial and temporal features, can alleviate researchers’ burden. In this paper, we propose a novel intrusion detection system (IDS) framework based on anomalies, called CRSF. The framework’s pre-training part employs a dimension transformation function to process input data into two-dimensional images. Two-dimensional convolutional kernels are then employed to extract spatial features, and the feature sequences are passed to an RNN to capture richer temporal features. After sufficient pre-training, SVM is used as a classifier to map the pre-training data from the feature space to a high-dimensional space and learn nonlinear decision boundaries, enabling the framework to accurately differentiate feature representations of different classes. Simulation experiments on the TON_IoT-Datasets demonstrate the effectiveness of the CRSF framework in intrusion detection. When using the "linear" kernel function in SVM, the framework achieves an accuracy, F1-score, and AUC of 0.9959, 0.9959, and 0.9977, respectively, indicating its capability and superiority in intrusion detection.
Chapter
In the field of machine learning, a crucial task is understanding the relative importance of the different input features in a predictive model. There is an approach in the literature whose aim is to analyze the predictive capacity of some features with respect to others. Can we explain a feature of the input space with others? Can we quantify this capacity? We propose a practical approach for analyzing the importance of features in a model and the explanatory capacity of some features over others. It is based on the adaptation of existing definitions from the literature that use the Shapley value and fuzzy measures. Our new approach aims to facilitate the understanding and application of these concepts by starting from a simple idea and considering well known methods. The main objective of this work is to provide a useful and practical approach for analyzing feature importance in real world cases.KeywordsFuzzy MeasuresMachine LearningFeatures ImportanceExplainable Artificial Intelligence
Article
Optimizing the activities and properties of lead compounds is an essential step in the drug discovery process. Despite recent advances in machine learning-aided drug discovery, most of the existing methods focus on making predictions for the desired objectives directly while ignoring the explanations for predictions. Although several techniques can provide interpretations for machine learning-based methods such as feature attribution, there are still gaps between these interpretations and the principles commonly adopted by medicinal chemists when designing and optimizing molecules. Here, we propose an interpretation framework, named MolSHAP, for quantitative structure-activity relationship analysis by estimating the contributions of R-groups. Instead of attributing the activities to individual input features, MolSHAP regards the R-group fragments as the basic units of interpretation, which is in accordance with the fragment-based modifications in molecule optimization. MolSHAP is a model-agnostic method that can interpret activity regression models with arbitrary input formats and model architectures. Based on the evaluations of numerous representative activity regression models on a specially designed R-group ranking task, MolSHAP achieved significantly better interpretation power compared with other methods. In addition, we developed a compound optimization algorithm based on MolSHAP and illustrated the reliability of the optimized compounds using an independent case study. These results demonstrated that MolSHAP can provide a useful tool for accurately interpreting the quantitative structure-activity relationships and rationally optimizing the compound activities in drug discovery.
Article
Full-text available
Acoustic wave features, including the velocity dispersion and attenuation, induced by fluid flow in porous media have attracted significant attention in reservoir exploration. To enhance the quantitative understanding of these features, various wave propagation mechanisms have been developed. It has been discovered that wave dispersion and attenuation are associated with multiple reservoir parameters, each with different sensitivity. It is difficult to distinguish the impacts of individual physical parameter on acoustic features by the traditional wave equations. Considering the ability of deep neural networks (DNNs) in establishing the relationships between two datasets, a fully connected DNN has been employed as a surrogate rock physics model, and the Shapley Additive exPlanations model (SHAP) based on this DNN has been introduced to evaluate the contributions of different parameters. In this study, the classic White model is utilized to generate datasets for training the DNN. Datasets include seven parameters (bulk modulus, shear modulus, and density of the solid matrix, frequency, porosity, fluid saturation, and permeability), along with velocity dispersion and attenuation. By embedding SHAP into the trained DNN, the presented ShaRock algorithm allows for a clear quantification of the contributions of various reservoir parameters to acoustic features. Furthermore, we analyse the underlying interactions between two parameters by utilizing their combined quantified contributions to the features. The application of this proposed algorithm, which is based on wave propagation mechanisms, demonstrates its potential in providing valuable insights for parameter inversions in hydrocarbon exploration.
Chapter
With the proliferation of misinformation on the web, automatic methods for detecting misinformation are becoming an increasingly important subject of study. If automatic misinformation detection is applied in a real-world setting, it is necessary to validate the methods being used. Large language models (LLMs) have produced the best results among text-based methods. However, fine-tuning such a model requires a significant amount of training data, which has led to the automatic creation of large-scale misinformation detection datasets. In this paper, we explore the biases present in one such dataset for misinformation detection in English, NELA-GT-2019. We find that models are at least partly learning the stylistic and other features of different news sources rather than the features of unreliable news. Furthermore, we use SHAP to interpret the outputs of a fine-tuned LLM and validate the explanation method using our inherently interpretable baseline. We critically analyze the suitability of SHAP for text applications by comparing the outputs of SHAP to the most important features from our logistic regression models.Keywordsmisinformation detectiondataset biasLLMXAISHAP
Conference Paper
Full-text available
This paper reviews methods for evaluating and analyzing the understandability of classification models in the context of data mining. The motivation for this study is the fact that the majority of previous work on evaluation and optimization of classification models has focused on assessing or increasing the accuracy of the models and thus user-oriented properties such as comprehensibility and understandability have been largely overlooked. We conduct a quantitative survey to examine the concept of understandability from the user’s point of view. The survey results are analyzed using the analytic hierarchy process (AHP) to rank models according to their understandability. The results indicate that decision tree models are perceived as more understandable than rulebased models. Using the survey results regarding understandability of a number of models in conjunction with quantitative measurements of the complexity of the models, we are able to establish a negative correlation between the complexity and understandability of the classification models, at least for one of the two studied data sets.
Article
Full-text available
Context-aware intelligent systems employ implicit inputs, and make decisions based on complex rules and machine learning models that are rarely clear to users. Such lack of system intelligibility can lead to loss of user trust, satisfaction and acceptance of these systems. However, automatically providing explanations about a system"s decision process can help mitigate this problem. In this paper we present results from a controlled study with over 200 participants in which the effectiveness of different types of explanations was examined. Participants were shown examples of a system"s operation along with various automatically generated explanations, and then tested on their understanding of the system. We show, for example, that explanations describing why the system behaved a certain way resulted in better understanding and stronger feelings of trust. Explanations describing why the system did not behave a certain way, resulted in lower understanding yet adequate performance. We discuss implications for the use of our findings in real-world context-aware applications.
Article
Full-text available
The process of automatically extracting novel, useful and ultimately comprehensible information from large databases, known as data mining, has become of great importance due to the ever-increasing amounts of data collected by large organizations. In particular, the emphasis is devoted to heuristic search methods able to discover patterns that are hard or impossible to detect using standard query mechanisms and classical statistical techniques. In this paper an evolutionary system capable of extracting explicit classification rules is presented. Special interest is dedicated to find easily interpretable rules that may be used to make crucial decisions. A comparison with the findings achieved by other methods on a real problem, the breast cancer diagnosis, is performed.
Article
Full-text available
In this paper, we describe the first practical application of two methods, which bridge the gap between the non-expert user and machine learning models. The first is a method for explaining classifiers’ predictions, which provides the user with additional information about the decision-making process of a classifier. The second is a reliability estimation methodology for regression predictions, which helps the users to decide to what extent to trust a particular prediction. Both methods are successfully applied to a novel breast cancer recurrence prediction data set and the results are evaluated by expert oncologists. KeywordsData mining-Machine learning-Breast cancer-Classification explanation-Prediction reliability
Conference Paper
Full-text available
Machine-learned classifiers are important components of many data mining and knowledge discovery systems. In several application domains, an explanation of the classifier's reasoning is critical for the classifier's acceptance by the end-user. We describe a framework, ExplainD, for explaining decisions made by classifiers that use additive evidence. ExplainD applies to many widely used classifiers, including linear discriminants and many additive models. We demonstrate our ExplainD framework using implementations of naïve Bayes, linear support vector machine, and logistic regression classifiers on example applications. ExplainD uses a simple graphical explanation of the classification process to provide visualizations of the classifier decisions, visualization of the evidence for those decisions, the capability to speculate on the effect of changes to the data, and the capability, wherever possible, to drill down and audit the source of the evidence. We demonstrate the effectiveness of ExplainD in the context of a deployed web-based system (Proteome Analyst) and using a downloadable Python-based implementation.
Conference Paper
Full-text available
Besides good predictive performance, the naive Bayesian classifier can also offer a valuable insight into the structure of the training data and effects of the attributes on the class probabilities. This structure may be effectively revealed through visualization of the classifier. We propose a new way to visualize the naive Bayesian model in the form of a nomogram. The advantages of the proposed method are simplicity of presentation, clear display of the effects of individual attribute values, and visualization of confidence intervals. Nomograms are intuitive and when used for decision support can provide a visual explanation of predicted probabilities. And finally, with a nomogram, a naive Bayesian model can be printed out and used for probability prediction without the use of computer or calculator.
Article
Full-text available
More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on Source-Forge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Article
Full-text available
We propose a simple yet potentially very effective way of visualizing trained support vector machines. Nomograms are an established model visualization technique that can graphically encode the complete model on a single page. The dimensionality of the visualization does not depend on the number of attributes, but merely on the properties of the kernel. To represent the effect of each predictive feature on the log odds ratio scale as required for the nomograms, we employ logistic regression to convert the distance from the separating hyperplane into a probability. Case studies on selected data sets show that for a technique thought to be a black-box, nomograms can clearly expose its internal structure. By providing an easy-to-interpret visualization the analysts can gain insight and study the effects of predictive factors.
Article
Full-text available
On account of the enormous amounts of rules that can be produced by data mining algorithms, knowledge post-processing is a difficult stage in an association rule discovery process. In order to find relevant knowledge for decision making, the user (a decision maker specialized in the data studied) needs to rummage through the rules. To assist him/her in this task, we here propose the rule-focusing methodology, an interactive methodology for the visual post-processing of association rules. It allows the user to explore large sets of rules freely by focusing his/her attention on limited subsets. This new approach relies on rule interestingness measures, on a visual representation, and on interactive navigation among the rules. We have implemented the rule-focusing methodology in a prototype system called ARVis. It exploits the user's focus to guide the generation of the rules by means of a specific constraint-based rule-mining algorithm.
Conference Paper
Full-text available
This paper presents a method to interpret the output of a classification (or regression) model. The interpretation is based on two concepts: the variable importance and the value importance of the variable. Unlike most of the state of art interpretation methods, our approach allows the interpretation of the model output for every instance. Understanding the score given by a model for one instance can for example lead to an immediate decision in a customer relational management (CRM) system. Moreover the proposed method does not depend on a particular model and is therefore usable for any model or software used to produce the scores.
Article
Full-text available
We present a method for explaining predictions for individual instances. The presented approach is general and can be used with all classification models that output probabilities. It is based on the decomposition of a model's predictions on individual contributions of each attribute. Our method works for the so-called black box models such as support vector machines, neural networks, and nearest neighbor algorithms, as well as for ensemble methods such as boosting and random forests. We demonstrate that the generated explanations closely follow the learned models and present a visualization technique that shows the utility of our approach and enables the comparison of different prediction methods.
Article
Many quantitative problems in science, engineering, and economics are nowadays solved via statistical sampling on a computer. Such Monte Carlo methods can be used in three different ways: (1) to generate random objects and processes in order to observe their behavior, (2) to estimate numerical quantities by repeated sampling, and (3) to solve complicated optimization problems through randomized algorithms. WIREs Comp Stat 2012, 4:48–58. doi: 10.1002/wics.194 For further resources related to this article, please visit the WIREs website.
Article
Corporate credit rating analysis has attracted lots of research interests in the literature. Recent studies have shown that Artificial Intelligence (AI) methods achieved better performance than traditional statistical methods. This article introduces a relatively new machine learning technique, support vector machines (SVM), to the problem in attempt to provide a model with better explanatory power. We used backpropagation neural network (BNN) as a benchmark and obtained prediction accuracy around 80% for both BNN and SVM methods for the United States and Taiwan markets. However, only slight improvement of SVM was observed. Another direction of the research is to improve the interpretability of the AI-based models. We applied recent research results in neural network model interpretation and obtained relative importance of the input financial variables from the neural network models. Based on these results, we conducted a market comparative analysis on the differences of determining factors in the United States and Taiwan markets.
Article
Credit card fraud is a serious and growing problem. While predictive models for credit card fraud detection are in active use in practice, reported studies on the use of data mining approaches for credit card fraud detection are relatively few, possibly due to the lack of available data for research. This paper evaluates two advanced data mining approaches, support vector machines and random forests, together with the well-known logistic regression, as part of an attempt to better detect (and thus control and prosecute) credit card fraud. The study is based on real-life data of transactions from an international credit card operation.
Article
Appropriate guidelines for controls in B2C (business-to-consumer) applications (hereafter B2C controls) should be provided such that these guidelines accomplish efficiency of controls in the context of specific system environments, given that many resources and skills are required for the implementation of such controls.This study uses a two-step process for the assessment of B2C controls, i.e., efficiency analysis and recommendation of controls. First, using a data envelopment analysis (DEA) model, the study analyzes the efficiency of B2C controls installed by three groups of organizations: financial firms, retail firms, and information service providers. The B2C controls are composed of controls for system continuity, access controls, and communication controls. DEA model uses B2C controls as input and three variables of implementation of B2C applications, i.e., volume, sophistication, and information contents as output. Second, decision trees are used to determine efficient firms and generate rules for recommending levels of controls.The results of the investigation of the DEA model indicate that retail firms and information service providers implement B2C controls more efficiently than financial firms do. Controls for system continuity are implemented more efficiently than access controls. In financial firms, controls for system continuity, communication controls, and access controls, in a descending order, are efficiently adopted in B2C applications.Every company can determine its relative level of reduction in each component of controls in order to make the control system efficient. The firms that efficiently implement B2C controls are determined using a decision tree model. The decision tree model is further used to recommend the level of controls and suggest rules for controls recommendation. This suggests the possibility of using decision trees for controls assessment in B2C applications.
Article
In this paper we develop a polynomial method based on sampling theory that can be used to estimate the Shapley value (or any semivalue) for cooperative games. Besides analyzing the complexity problem, we examine some desirable statistical properties of the proposed approach and provide some computational results.
Conference Paper
We propose a method for explaining regression models and their predictions for individual instances. The method successfully reveals how individual features influence the model and can be used with any type of regression model in a uniform way. We used different types of models and data sets to demonstrate that the method is a useful tool for explaining, comparing, and identifying errors in regression models.
Article
Although successful in medical diagnostic problems, inductive learning systems were not widely accepted in medical practice. In this paper two different approaches to machine learning in medical applications are compared: the system for inductive learning of decision trees Assistant, and the naive Bayesian classifier. Both methodologies were tested in four medical diagnostic problems: localization of primary tumor, prognostics of recurrence of breast cancer, diagnosis of thyroid diseases, and rheumatology. The accuracy of automatically acquired diagnostic knowledge from stored data records is compared, and the interpretation of the knowledge and the explanation ability of the classification process of each system is discussed. Surprisingly, the naive Bayesian classifier is superior to Assistant in classification accuracy and explanation ability, while the interpretation of the acquired knowledge seems to be equally valuable. In addition, two extensions to naive Bayesian classifier are briefly described: dealing with continuous attributes, and discovering the dependencies among attributes.
Article
We present a general method for explaining individual predictions of classification models. The method is based on fundamental concepts from coalitional game theory and predictions are explained with contributions of individual feature values. We overcome the method's initial exponential time complexity with a sampling-based approximation. In the experimental part of the paper we use the developed method on models generated by several well-known machine learning algorithms on both synthetic and real-world data sets. The results demonstrate that the method is efficient and that the explanations are intuitive and useful.
Article
While there is a growing professional interest on the application of Benford's law and “digit analysis” in financial fraud detection, there has been relatively little academic research to demonstrate its efficacy as a decision support tool in the context of an analytical review procedure pertaining to a financial audit. We conduct a numerical study using a genetically optimized artificial neural network. Building on an earlier work by others of a similar nature, we assess the benefits of Benford's law as a useful classifier in segregating naturally occurring (i.e. non-concocted) numbers from those that are made up. Alongside the frequency of the first and second significant digits and their mean and standard deviation, a posited set of ‘non-digit’ input variables categorized as “information theoretic”, “distance-based” and “goodness-of-fit” measures, help to minimize the critical classification errors that can lead to an audit failure. We come up with the optimal network structure for every instance corresponding to a 3×3 Manipulation–Involvement matrix that is drawn to depict the different combinations of the level of sophistication in data manipulation by the perpetrators of a financial fraud and also the extent of collusive involvement.
Article
The application of a diagnostic or prognostic Multiple Logistic Function (MLF) in medical practice may, depending on the complexity of the model, require considerable arithmetic. Various methods for the elimination of such arithmetic, e.g. sets of tables or nomograms, have been proposed. An alternative method of eliminating the necessary arithmetic is described here. It is based on the principle of the familiar slide-rule. As an example, the design of a slide-rule for the evaluation of a diagnostic model for acute myocardial infarction is described. The slide-rule method allows for the evaluation of logistic models with complex linear combinations in the exponent. Adequate devices can be produced at low cost.
Article
Few published studies have combined clinical prognostic factors into risk profiles that can be used to predict the likelihood of recurrence or metastatic progression in patients following treatment of prostate cancer. We developed a nomogram that allows prediction of disease recurrence through use of preoperative clinical factors for patients with clinically localized prostate cancer who are candidates for treatment with a radical prostatectomy. By use of Cox proportional hazards regression analysis, we modeled the clinical data and disease follow-up for 983 men with clinically localized prostate cancer whom we intended to treat with a radical prostatectomy. Clinical data included pretreatment serum prostate-specific antigen levels, biopsy Gleason scores, and clinical stage. Treatment failure was recorded when there was clinical evidence of disease recurrence, a rising serum prostate-specific antigen level (two measurements of 0.4 ng/mL or greater and rising), or initiation of adjuvant therapy. Validation was performed on a separate sample of 168 men, also from our institution. Treatment failure (i.e., cancer recurrence) was noted in 196 of the 983 men, and the patients without failure had a median follow-up of 30 months (range, 1-146 months). The 5-year probability of freedom from failure for the cohort was 73% (95% confidence interval = 69%-76%). The predictions from the nomogram appeared accurate and discriminating, with a validation sample area under the receiver operating characteristic curve (i.e., comparison of the predicted probability with the actual outcome) of 0.79. A nomogram has been developed that can be used to predict the 5-year probability of treatment failure among men with clinically localized prostate cancer treated with radical prostatectomy.
Visualizing the simple Bayesian classier. KDD workshop on issues in the integration of data mining and data visualization
  • B Becker
  • R Kohavi
  • D Sommerfield
Becker B, Kohavi R, Sommerfield D (1997) Visualizing the simple Bayesian classier. KDD workshop on issues in the integration of data mining and data visualization
A value for n-person games, vol II of Contributions to the theory of games
  • L S Shapley
Shapley LS (1953) A value for n-person games, vol II of Contributions to the theory of games. Princeton University Press, Princeton
Explanation and reliability of breast cancer recurrence predictions
  • E Štrumbelj
  • B C Bosni´
  • Kononenko
Štrumbelj E, Bosni´ B, Grašič-Kuhar C, Kononenko I (2010) Explanation and reliability of breast cancer recurrence predictions. Knowl Inf Syst 24(2):305–324
The datgen dataset generator
  • G Melli
Uci machine learning repository
  • A Frank