Figure 16 - available via license: CC BY
Content may be subject to copyright.
Images used as input for a FV model trained to detect horses, next to the corresponding relevance map. a) shows an image from the Pascal VOC 2007 dataset containing a copyright watermark, causing a strong response in the model. In b), the watermark has been edited out. The artificially created images c) and d) show a sports car on a lush green meadow with and without an added copyright watermark. In samples a) and c) the presence of class "horse" is detected, whereas in samples b) and d) this is not the case.

Images used as input for a FV model trained to detect horses, next to the corresponding relevance map. a) shows an image from the Pascal VOC 2007 dataset containing a copyright watermark, causing a strong response in the model. In b), the watermark has been edited out. The artificially created images c) and d) show a sports car on a lush green meadow with and without an added copyright watermark. In samples a) and c) the presence of class "horse" is detected, whereas in samples b) and d) this is not the case.

Source publication
Article
Full-text available
Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly "intelligent" behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-sol...

Citations

... Layer-Wise Relevance Propagation (LRP) is a typical backpropagation-based method to predict classification results by calculating the contributions of individual pixels. In general, it starts from the output layer of the network, moves to the opposite direction, and redistributes the relevant score until it reaches the input layer [22,23]. Simultaneously, it follows the global conservation property, which indicates that whatever a neuron receives must be redistributed in equal amounts to the lower layer. ...
Article
Full-text available
Deep learning has demonstrated remarkable performance in the medical domain, with accuracy that rivals or even exceeds that of human experts. However, it has a significant problem that these models are “black-box” structures, which means they are opaque, non-intuitive, and difficult for people to understand. This creates a barrier to the application of deep learning models in clinical practice due to lack of interpretability, trust, and transparency. To overcome this problem, several studies on interpretability have been proposed. Therefore, in this paper, we comprehensively review the interpretability of deep learning in medical diagnosis based on the current literature, including some common interpretability methods used in the medical domain, various applications with interpretability for disease diagnosis, prevalent evaluation metrics, and several disease datasets. In addition, the challenges of interpretability and future research directions are also discussed here. To the best of our knowledge, this is the first time that various applications of interpretability methods for disease diagnosis have been summarized.
... These kinds of neural networks seemingly perform well under laboratory conditions, making them difficult to spot before deployment. The problem of CHP arises if a network focuses on features that are logically irrelevant for inference [77]. These could be watermarks or other text on images or on background details instead of the objects of interest. ...
Article
Full-text available
Neural networks for deep-learning applications, also called artificial neural networks, are important tools in science and industry. While their widespread use was limited because of inadequate hardware in the past, their popularity increased dramatically starting in the early 2000s when it became possible to train increasingly large and complex networks. Today, deep learning is widely used in biomedicine from image analysis to diagnostics. This also includes special topics, such as forensics. In this review, we discuss the latest networks and how they work, with a focus on the analysis of biomedical data, particularly biomarkers in bioimage data. We provide a summary on numerous technical aspects, such as activation functions and frameworks. We also present a data analysis of publications about neural networks to provide a quantitative insight into the use of network types and the number of journals per year to determine the usage in different scientific field.
... We refer readers to [18] for more information about this. It is also shown in [19,20] that one of the winning methods of the PASCAL VOC competition [21] was recognizing boats by the presence of water, wolves by the presence of snow, and trains by the presence of rails in the image. It was also recognizing horses by the presence of a copyright watermark simply because this watermark tag has appeared in all horse images in the training dataset. ...
Preprint
Full-text available
Self-Supervised vision learning has revolutionized deep learning, becoming the next big challenge in the domain and rapidly closing the gap with supervised methods on large computer vision benchmarks. With current models and training data exponentially growing, explaining and understanding these models becomes pivotal. We study the problem of explainable artificial intelligence in the domain of self-supervised learning for vision tasks, and present methods to understand networks trained with self-supervision and their inner workings. Given the huge diversity of self-supervised vision pretext tasks, we narrow our focus on understanding paradigms which learn from two views of the same image, and mainly aim to understand the pretext task. Our work focuses on explaining similarity learning, and is easily extendable to all other pretext tasks. We study two popular self-supervised vision models: SimCLR and Barlow Twins. We develop a total of six methods for visualizing and understanding these models: Perturbation-based methods (conditional occlusion, context-agnostic conditional occlusion and pairwise occlusion), Interaction-CAM, Feature Visualization, Model Difference Visualization, Averaged Transforms and Pixel Invaraince. Finally, we evaluate these explanations by translating well-known evaluation metrics tailored towards supervised image classification systems involving a single image, into the domain of self-supervised learning where two images are involved. Code is at: https://github.com/fawazsammani/xai-ssl
... However, they are traditionally seen as black-box models, i.e. given the network model, it has been unclear to the user and even the engineer designing the algorithm, what has been most important to reach a particular output prediction. This can cause serious obstacles for applications since, say, networks using spurious image features that are only present in the training data (Clever Hans effect [17,18]) might go unnoticed. Such undesired behaviour hampering the network's generalization ability is particularly problematic in safety-critical areas. ...
Preprint
Full-text available
Counterfactuals can explain classification decisions of neural networks in a human interpretable way. We propose a simple but effective method to generate such counterfactuals. More specifically, we perform a suitable diffeomorphic coordinate transformation and then perform gradient ascent in these coordinates to find counterfactuals which are classified with great confidence as a specified target class. We propose two methods to leverage generative models to construct such suitable coordinate systems that are either exactly or approximately diffeomorphic. We analyze the generation process theoretically using Riemannian differential geometry and validate the quality of the generated counterfactuals using various qualitative and quantitative measures.
... While more data helps to build better models, it becomes more and more unfeasible to inspect this overwhelming abundance of training data -which in consequence can lead to models that place focus on (unknown or undetected) spurious correlations in the training data (e.g. [35,1]). This may heavily compromise their broad generalization and may thus ultimately lead to unrepresentative, unfair, or even unsafe rsp. ...
... Numerous different explanation approaches were introduced to illustrate the decision-making process of machines: either on a 'local' level, i.e., explaining the model's prediction for individual samples, or on a 'global' level, i.e., explaining the general discriminative concepts learned by the model across the whole data set. In particular, with the help of local explanation methods, undesirable behavior of trained networks could be found that were based on learned artifactual data or data containing spurious correlations [35]. ...
... Recently local XAI methods have shown the potential to reveal predictions that are made based on artifacts learned by the model, such as Clever Hans or Backdoor artifacts. One step toward a more universal understanding of the learned decision strategies aiming to detect undesired behavior of a model at scale was established by the Spectral Relevance analysis method -SpRAy -which was introduced to unmask the presence of a CH-behavior in the model [35]. SpRAy aims to perform a global explanation of the model, by analyzing local explanation over the dataset -after collecting local attribution maps, they are clustered by the Spectral Clustering algorithm, and the clusters are forwarded for manual inspection. ...
Preprint
Full-text available
Deep Neural Networks (DNNs) draw their power from the representations they learn. In recent years, however, researchers have found that DNNs, while being incredibly effective in learning complex abstractions, also tend to be infected with artifacts, such as biases, Clever Hanses (CH), or Backdoors, due to spurious correlations inherent in the training data. So far, existing methods for uncovering such artifactual and malicious behavior in trained models focus on finding artifacts in the input data, which requires both availabilities of a data set and human intervention. In this paper, we introduce DORA (Data-agnOstic Representation Analysis): the first automatic data-agnostic method for the detection of potentially infected representations in Deep Neural Networks. We further show that contaminated representations found by DORA can be used to detect infected samples in any given dataset. We qualitatively and quantitatively evaluate the performance of our proposed method in both, controlled toy scenarios, and in real-world settings, where we demonstrate the benefit of DORA in safety-critical applications.
... Unter (ii) Modellerklärung lassen sich Methoden einordnen, welche Interpretationen ermöglichen, was ein trainiertes Modell gelernt hat (charakteristische Repräsentationen einer ganzen Klasse). Als wichtige Methode lässt sich Spectral Relevance Analysis (SpRAy) [103] benennen. ...
Thesis
Full-text available
Die Synopsis setzt sich auseinander mit dem Einsatz von Künstlicher Intelligenz (Maschinelles Lernen) im Kontext biomechanischer Daten. Potentiale der Methoden werden herausgearbeitet und ausgewählte praxisrelevante Limitationen anhand von fünf Publikationen adressiert. Unter anderem können durch Verwendung von Ensemble Feature Selection, Explainable Artificial Intelligence und Metric Learning sowie die Entwicklung eines pathologieunabhängigen Klassifikators vielversprechende Perspektiven aufgezeigt werden.
... during decision making [109,112]: The reasoning behind their decisions is generally not obvious, and as such, they are simply not trustworthy enough, as their decisions may be (and often are) biased towards unintended or undesired features, as shown in, e.g., [127,69,114,6]. This in turn hampers the transferability of ML models to many application domains of interest, e.g., due to the risks involved in high-stakes decision making [142,109], or the requirements set in governmental regulatory frameworks [45,131] and guidelines brought forward [31]. ...
... As a post-hoc XAI method, CRP can be applied to (almost) any ML model with no extra requirements on the data, model or training process. We demonstrate on multiple datasets, model architectures and application domains, that CRP-based analyses allow one to (1) gain insights into the representation and composition of concepts in the model as well as quantitatively investigate their role in prediction, (2) identify and counteract Clever Hans filters [69] focusing on spurious correlations in the data, and (3) analyze whole concept subspaces and their contributions to fine-grained decision making. Analogously to Activation Maximization [90], we also propose the Relevance Maximization (RelMax) approach, which uses CRP in order to search for the most relevant samples in the training dataset, and show its advantages when "explaining by example". ...
... Thus, activation indicates which general pattern a filter activates whereas relevance clarifies its specific usage. In this particular case, RelMax-sorting has revealed a Clever Hans [69] feature, which extends to several classes of the ImageNet dataset, as further analysis in Section 4.3 will verify. The stripe concept however is used to detect white letters, as relevance-based samples suggest. ...
Preprint
The emerging field of eXplainable Artificial Intelligence (XAI) aims to bring transparency to today's powerful but opaque deep learning models. While local XAI methods explain individual predictions in form of attribution maps, thereby identifying where important features occur (but not providing information about what they represent), global explanation techniques visualize what concepts a model has generally learned to encode. Both types of methods thus only provide partial insights and leave the burden of interpreting the model's reasoning to the user. Only few contemporary techniques aim at combining the principles behind both local and global XAI for obtaining more informative explanations. Those methods, however, are often limited to specific model architectures or impose additional requirements on training regimes or data and label availability, which renders the post-hoc application to arbitrarily pre-trained models practically impossible. In this work we introduce the Concept Relevance Propagation (CRP) approach, which combines the local and global perspectives of XAI and thus allows answering both the "where" and "what" questions for individual predictions, without additional constraints imposed. We further introduce the principle of Relevance Maximization for finding representative examples of encoded concepts based on their usefulness to the model. We thereby lift the dependency on the common practice of Activation Maximization and its limitations. We demonstrate the capabilities of our methods in various settings, showcasing that Concept Relevance Propagation and Relevance Maximization lead to more human interpretable explanations and provide deep insights into the model's representations and reasoning through concept atlases, concept composition analyses, and quantitative investigations of concept subspaces and their role in fine-grained decision making.
... There is considerable evidence that context is a useful cue [27,35]. However, many of the associated background elements and foreground shortcuts are only relevant to the specific data distribution, which leads to lack of robustness to distribution shifts [32,38]. ...
... With the rapid advancements in object recognition, the measuring stick is being narrowed down to a single number, accuracy [47]. By relying solely on accuracy, classifiers introduce biases, since they utilize shortcuts to select the right class [27,18]. The shortcomings of models that rely on shortcuts have been demonstrated for domains other than vision, such as natural language processing [14] and multi-modal learning [17]. ...
Preprint
Full-text available
It has been observed that visual classification models often rely mostly on the image background, neglecting the foreground, which hurts their robustness to distribution changes. To alleviate this shortcoming, we propose to monitor the model's relevancy signal and manipulate it such that the model is focused on the foreground object. This is done as a finetuning step, involving relatively few samples consisting of pairs of images and their associated foreground masks. Specifically, we encourage the model's relevancy map (i) to assign lower relevance to background regions, (ii) to consider as much information as possible from the foreground, and (iii) we encourage the decisions to have high confidence. When applied to Vision Transformer (ViT) models, a marked improvement in robustness to domain shifts is observed. Moreover, the foreground masks can be obtained automatically, from a self-supervised variant of the ViT model itself; therefore no additional supervision is required.
... The field of Explainable AI (XAI) has thus put forth a number of interpretability approaches, among them saliency maps [36] and mechanistic interpretability [38]. These have had some successes, such as detecting biases in established datasets [26], or connecting individual neurons to understandable features [9]. However, they are motivated purely by heuristics and come without any theoretical guarantees. ...
... For example, consider as φ the water in the boat images of the PASCAL VOC dataset [26]. The feature is a strong predictor for the "boat" class in the test data distribution D but should not be indicative for the real world distribution which includes images of water without boats and vice versa. ...
Preprint
We present a new theoretical framework for making black box classifiers such as Neural Networks interpretable, basing our work on clear assumptions and guarantees. In our setting, which is inspired by the Merlin-Arthur protocol from Interactive Proof Systems, two functions cooperate to achieve a classification together: the \emph{prover} selects a small set of features as a certificate and presents it to the \emph{classifier}. Including a second, adversarial prover allows us to connect a game-theoretic equilibrium to information-theoretic guarantees on the exchanged features. We define notions of completeness and soundness that enable us to lower bound the mutual information between features and class. To demonstrate good agreement between theory and practice, we support our framework by providing numerical experiments for Neural Network classifiers, explicitly calculating the mutual information of features with respect to the class.
... Despite all these advantages, ProtoPNets are prone -like regular neural networks -to picking up confounds from the training data (e.g., class-correlated watermarks), thus suffering from compromised generalization and out-of-distribution performance [7,8]. This occurs even with well-known data sets, as we will show, and it is especially alarming as it can impact high-stakes applications like COVID-19 diagnosis [9] and scientific analysis [10]. ...
... Explanations excel at exposing confounds picked up by models from data [7,8], hence constraining or supervising them can effectively dissuade the model from acquiring those confounds. This observation lies at the heart of recent approaches for debugging machine learning models. ...
Preprint
Part-prototype Networks (ProtoPNets) are concept-based classifiers designed to achieve the same performance as black-box models without compromising transparency. ProtoPNets compute predictions based on similarity to class-specific part-prototypes learned to recognize parts of training examples, making it easy to faithfully determine what examples are responsible for any target prediction and why. However, like other models, they are prone to picking up confounds and shortcuts from the data, thus suffering from compromised prediction accuracy and limited generalization. We propose ProtoPDebug, an effective concept-level debugger for ProtoPNets in which a human supervisor, guided by the model's explanations, supplies feedback in the form of what part-prototypes must be forgotten or kept, and the model is fine-tuned to align with this supervision. An extensive empirical evaluation on synthetic and real-world data shows that ProtoPDebug outperforms state-of-the-art debuggers for a fraction of the annotation cost.