PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

We present the Rate-Distortion Explanation (RDE) framework, a mathematically well-founded method for explaining black-box model decisions. The framework is based on perturbations of the target input signal and applies to any differentiable pre-trained model such as neural networks. Our experiments demonstrate the framework's adaptability to diverse data modalities, particularly images, audio, and physical simulations of urban environments.
Content may be subject to copyright.
A Rate-Distortion Framework for Explaining Black-box
Model Decisions
Stefan Kolek1, Duc Anh Nguyen1, Ron Levie1, Joan Bruna2, and Gitta Kutyniok1
1Department of Mathematics, Ludwig Maximilian University, Munich
2Courant Institute of Mathematical Sciences, New York University, New York
We present the Rate-Distortion Explanation (RDE) framework, a mathematically well-founded method
for explaining black-box model decisions. The framework is based on perturbations of the target input sig-
nal and applies to any differentiable pre-trained model such as neural networks. Our experiments demon-
strate the framework’s adaptability to diverse data modalities, particularly images, audio, and physical
simulations of urban environments.
1 Introduction
Powerful machine learning models such as deep neural networks are inherently opaque, which has motivated
numerous explanation methods that the research community developed over the last decade [1, 24, 26, 20,
15, 16, 7, 2]. The meaning and validity of an explanation depends on the underlying principle of the expla-
nation framework. Therefore, a trustworthy explanation framework must align intuition with mathematical
rigor while maintaining maximal flexibility and applicability. We believe the Rate-Distortion Explanation
(RDE) framework, first proposed by [16], then extended by [9], as well as the similar framework in [2],
meets the desired qualities. In this chapter, we aim to present the RDE framework in a revised and holistic
manner. Our generalized RDE framework can be applied to any model (not just classification tasks), sup-
ports in-distribution interpretability (by leveraging in-painting GANs), and admits interpretation queries (by
considering suitable input signal representations).
The typical setting of a (local) explanation method is given by a pre-trained model Φ : RnRm,and
a data instance xRn. The model Φcan be either a classification task with mclass labels or a regression
task with m-dimensional model output. The model decision Φ(x)is to be explained. In the original RDE
framework [16], an explanation for Φ(x)is a set of feature components S⊂ {1, . . . , n}in xthat are deemed
relevant for the decision Φ(x). The core principle behind the RDE framework is that a set S⊂ {1, . . . , n}
contains all the relevant components if Φ(x)remains (approximately) unchanged after modifying xSc, i.e.,
the components in xthat are not deemed relevant. In other words, Scontains all relevant features if they are
sufficient for producing the output Φ(x). To convey concise explanatory information, one aims to find the
minimal set S⊂ {1, . . . , n}with all the relevant components. As demonstrated in [16] and [28], the minimal
relevant set S⊂ {1, . . . , n}cannot be found combinatorically in an efficient manner for large input sizes.
A meaningful approximation can nevertheless be found by optimizing a sparse continuous mask s[0,1]n
that has no significant effect on the output Φ(x)in the sense that Φ(x)Φ(xs+ (1 s)v)should
hold for appropriate perturbations vRn, where denotes the componentwise multiplication. Suppose
dΦ(x),Φ(y)is a measure of distortion (e.g. the `2-norm) between the model outputs for x, y Rnand V
is a distribution over appropriate perturbations v∼ V. An explanation in the RDE framework can be found
as a solution mask sto the following minimization problem:
s:= arg min
v∼V "dΦ(x),Φ(xs+ (1 s)v)#+λksk1,
where λ > 0is a hyperparameter controlling the sparsity of the mask.
arXiv:2110.08252v1 [cs.LG] 12 Oct 2021
We further generalize the RDE framework to abstract input signal representations x=f(h), where f
is a data representation function with input h. The philosophy of the generalized RDE framework is that
an explanation for generic input signals x=f(h)should be some simplified version of the signal, which
is interpretable to humans. This is achieved by demanding sparsity in a suitable representation system h,
which ideally optimally represents the class of explanations that are desirable for the underlying domain
and interpretation query. This philosophy underpins our experiments on image classification in the wavelet
domain, on audio signal classification in the Fourier domain, and on radio map estimation in an urban
environment domain. Therein we demonstrate the versatility of our generalized RDE framework.
2 Related works
To our knowledge, the explanation principle of optimizing a mask s[0,1]nhas been first proposed in [7].
Fong et al. [7] explained image classification decisions by considering one of the two “deletion games”: (1)
optimizing for the smallest deletion mask that causes the class score to drop significantly or (2) optimizing
for the largest deletion mask that has no significant effect on the class score. The original RDE approach
[16] is based on the second deletion game and connects the deletion principle to rate-distortion-theory, which
studies lossy data compression. Deleted entries in [7] were replaced with either constants, noise, or blurring
and deleted entries in [16] were replaced with noise.
Explanation methods introduced before the “deletion games” principle from [7] were typically based
upon gradient-based methods [24][26], propagation of activations in neurons [1][23], surrogate models
[20], and game-theory [15]. Gradient-based methods such as smoothgrad [24] suffer from a lacking prin-
ciple of relevance beyond local sensitivity. Reference-based methods such as Integrated Gradients [26] and
DeepLIFT [23] depend on a reference value, which has no clear optimal choice. DeepLIFT and LRP as-
sign relevance by propagating neuron activations, which makes them dependent on the implementation of
Φ. LIME [20] uses an interpretable surrogate model that approximates Φin a neighborhood around x.
Surrogate model explanations are inherently limited for complex models Φ(such as image classifiers) as
they only admit very local approximations. Generally, explanations that only depend on the model behav-
ior on a small neighborhood Uxof xoffer limited insight. Lastly, Shapley values-based explanations [15]
are grounded in Shapley values from game-theory. They assign relevance scores as weighted averages of
marginal contributions of respective features. Though Shapley values are mathematically well-founded, rel-
evance scores cannot be computed exactly for common input sizes such as n50, since one exact relevance
score generally requires O(2n)evaluations of Φ[27].
A notable difference between the RDE method and additive feature explanations [15] is that the values
in the mask sdo not add up to the model output. The additive property as in [15] takes the view that
features individually contribute to the model output and relevance should be reflected by their contributions.
We emphasize that the RDE method is designed to look for a set of relevant features and not an estimate of
individual relative contributions. This is particularly desirable when only groups of features are interpretable,
as for example in image classification tasks, where individual pixels do not carry any interpretable meaning.
Similarly to Shapley values, the explanation in the RDE framework cannot be computed exactly, as it requires
solving a non-convex minimization problem. However, the RDE method can take full advantage of modern
optimization techniques. Furthermore, the RDE method is a model-agnostic explanation technique, with a
mathematically principled and intuitive notion of relevance as well as enough flexibility to incorporate the
model behavior on meaningful input regions of Φ.
The meaning of an explanation based on deletion masks s[0,1]ndepends on the nature of the pertur-
bations that replace the deleted regions. Random [16] [7] or blurred [7] replacements vRnmay result
in a data point xs+ (1 s)vthat falls out of the natural data manifold on which Φwas trained on.
This is a subtle though important problem, since such an explanation may depend on evaluations of Φon
data points from undeveloped decision regions. The latter motivates in-distribution interpretability, which
considers meaningful perturbations that keep xs+ (1 s)vin the data manifold. [2] was the first work
that suggested to use an inpainting-GAN to generate meaningful perturbations to the “deletion games”. The
authors of [9] then applied in-distribution interpretability to the RDE method in the challenging modalities
music and physical simulations of urban environments. Moreover, they demonstrated that the RDE method
in [16] can be extended to answer so-called “interpretation queries”. For example, the RDE method was
applied in [9] to an instrument classifier to answer the global interpretation query “Is magnitude or phase in
the signal more important for the classifier?”. Most recently, in [11], we introduced CartoonX as a novel ex-
planation method for image classifiers, answering the interpretation query “What is the relevant piece-wise
smooth part of an image?” by applying RDE in the wavelet basis of images.
3 Rate-distortion explanation framework
Based on the original RDE approach from [16], in this section, we present a general formulation of the
RDE framework and discuss several implementations. While [16] focuses merely on image classification
with explanations in pixel representation, we will apply the RDE framework not only to more challenging
domains but also to different input signal representations. Not surprisingly, the combinatorical optimization
problem in the RDE framework, even in simpler form, is extremely hard to solve [16] [28]. This motivates
heuristic solution strategies, which will be discussed in Subsection 3.2.
3.1 General formulation
It is well-known that in practice there are different ways to describe a signal xRn. Generally speaking, x
can be represented by a data representation function f:Qk
i=1 RdiRn,
x=f(h1, . . . , hk),(1)
for some inputs hiRdi,diN,i∈ {1, . . . , k},kN. Note, we do not restrict ourselves to linear data
representation functions f. To briefly illustrate the generality of this abstract representation, we consider the
following examples.
Example 1 (Pixel representation) An arbitrary (vectorized) image xRncan be simply represented pix-
=f(h1, . . . , hn),
with hi:=xibeing the individual pixel values and f:RnRnbeing the identity transform.
Due to its simplicity, this standard basis representation is a reasonable choice when explaining image clas-
sification models. However, in many other applications, one requires more sophisticated representations of
the signals, such as through a possibly redundant dictionary.
Example 2 Let {ψj}k
j=1,kN, be a dictionary in Rn, e.g., a basis. A signal xRnis represented as
where hjR,j∈ {1, . . . , k}, are appropriate coefficients. In terms of the abstract representation (1),
we have dj= 1 for j∈ {1, . . . , k }and fis the function that yields the weighted sum over ψj. Note that
Example 1 can be seen as a special case of this representation.
The following gives an example of a non-linear representation function f.
Example 3 Consider the discrete inverse Fourier transform, defined as
f(m1, ..., mn, ω1, ..., ωn)l:=1
| {z }
ei2πl(j1)/n, l ∈ {1, . . . , n},
where mjand ωjare respectively the magnitude and the phase of the j-th discrete Fourier coefficient cj.
Thus every signal xRnCncan be represented in terms of (1) with fbeing the discrete inverse Fourier
transform while hj,j= 1, . . . , k (with k= 2n) being specified as mj0and ωj0,j0= 1, . . . , n.
Further examples of dictionaries {ψj}k
j=1 include the discrete wavelet [21], cosine [19] or shearlet [12]
representation systems and many more. In these cases, the coefficients hiare given by the forward transform
and fis referred to as the backward transform. Note that in the above examples we have di= 1, i.e., the input
vectors hiare real-valued. In many situations, one is also interested in representations x=f(h1, . . . , hk)
with hiRdiwhere di>1.
Example 4 Let k= 2 and define fagain as the discrete inverse Fourier transform, but as a function of two
components: (1) the entire magnitude spectrum and (2) the entire frequency spectrum, namely
f(m, ω)l:=1
| {z }
ei2πl(j1)/n, l ∈ {1, . . . , n}.
Similarly, instead of individual pixel values, one can consider patches of pixels in an image xRnfrom
Example 1 as the input vectors hito the identity transform f. We will come back to these examples in the
experiments in Section 4.
Finally, we would like to remark that our abstract representation
x=f(h1, . . . , hk)
also covers the cases where the signal is the output of a decoder or generative model fwith inputs h1, . . . , hk
as the code or the latent variables.
As was discussed in previous sections, the main idea of the RDE framework is to extract the relevant
features of the signal based on the optimization over its perturbations defined through masks. The ingredients
of this idea are formally defined below.
Definition 1 (Obfuscations and expected distortion) Let Φ : RnRmbe a model and xRna data
point with a data representation x=f(h1, ..., hk)as discussed above. For every mask s[0,1]k, let Vsbe
a probability distribution over Qk
i=1 Rdi. Then the obfuscation of xwith respect to sand Vsis defined as
the random vector
y:=f(sh+ (1 s)v),
where v∼ Vs,(sh)i=sihiRdiand ((1s)v)i= (1si)viRdifor i∈ {1, . . . , k }. Furthermore,
the expected distortion of xwith respect to the mask sand the perturbation distribution Vsis defined as
D(x, s, Vs,Φ) :=E
where d:Rm×RmR+is a measure of distortion between two model outputs.
In the RDE framework, the explanation is given by a mask that minimizes distortion while remaining rela-
tively sparse. The rate-distortion-explanation mask is defined in the following.
Definition 2 (The RDE mask) In the setting of Definition 1 we define the RDE mask as a solution s(`)to
the minimization problem
s∈{0,1}kD(x, s, Vs,Φ) s.t. ksk0`, (2)
where `∈ {1, . . . , k}is the desired level of sparsity.
Here, the RDE mask is defined as the binary mask that minimizes the expected distortion while keeping the
sparsity smaller than a certain threshold. Besides this, one could obviously also define the RDE mask as the
sparsest binary mask that keeps the distortion lower than a given threshold, as defined in [16]. Geometrically,
one can interpret the RDE mask as a subspace that is stable under Φ. If x=f(h)is the input signal and sis
the RDE mask for Φ(x)on the coefficients h, then the associated subspace RΦ(s)is defined as the space of
feasible obfuscations of xwith sunder Vs, i.e.,
RΦ(s):={f(sh+ (1 s)v)|vsuppVs},
where suppVsdenotes the support of the distribution Vs. The model Φwill act similarly on signals in RΦ(s)
due to the low expected distortion D(x, s, Vs,Φ)—making the subspace stable under Φ. Note that RDE
directly optimizes towards a subspace that is stable under Φ. If, instead, one would choose the mask sbased
on information of the gradient Φ(x)and Hessian 2Φ(x), then only a local neighborhood around xwould
tend to be stable under Φdue to the local nature of the gradient and Hessian. Before discussing practical
algorithms to approximate the RDE mask in Subsection 3.2, we will review frequently used obfuscation
strategies, i.e., the distribution Vs, and measures of distortion.
3.1.1 Obfuscation strategies and in-distribution interpretability.
The meaning of an explanation in RDE depends greatly on the nature of the perturbations v∼ Vs. A
particular choice of Vsdefines an obfuscation strategy. Obfuscations are either in-distribution, i.e., if the
f(sh+ (1 s)v)
lies on the natural data manifold that Φwas trained on, or out-of-distribution otherwise. Out-of-distribution
obfuscations pose the following problem. The RDE mask (see Definition 2) depends on evaluations of Φon
obfuscations f(sh+ (1 s)v). If f(sh+ (1 s)v)is not on the natural data manifold that
Φwas trained on, then it may lie in undeveloped regions of Φ. In practice, we are interested in explaining
the behavior of Φon realistic data and an explanation can be corrupted if Φdid not develop the region
of out-of distribution points f(sh+ (1 s)v). One can guard against this by choosing Vsso that
f(sh+ (1 s)v)is in-distribution. Choosing Vsin-distribution boils down to modeling the conditional
data distribution – a non-trivial task.
Example 5 (In-distribution obfuscation strategy) In light of the recent success of generative adversarial
networks (GANs) in generative modeling [8], one can train an in-painting GAN [29]
G(h, s, z)
where zare random latent variables of the GAN, such that the obfuscation fsh+ (1 s)G(h, s, z )
lies on the natural data manifold (see also [2]). In other words, one can choose Vsas the distribution of
v:=G(h, s, z), where the randomness comes from the random latent variables z.
Example 6 (Out-of-distribution obfuscation strategies) A very simple obfuscation strategy is Gaussian
noise. In that case, one defines Vsfor every s[0,1]kas
Vs:=N(µ, Σ),
where µand Σdenote a pre-defined mean vector and covariance matrix. In Section 4.1, we give an example
of a reasonable choice for µand Σfor image data. Alternatively, for images with pixel representation (see
Example 1) one can mask out the deleted pixels by blurred inputs, v=Kx, where Kis a suitable blur
Obfuscation strategy Perturbation formula In-distribution
Constant vRd
Noise v N (µ, Σ)
Blurring v=Kx
Inpainting-GAN v=G(h, s, z)X
Table 1: Common obfuscation strategies with their perturbation formulas.
We summarize common obfuscation strategies for a given target signal in Table 1.
3.1.2 Measure of distortion.
Various options exist for the measure d:Rm×RmRof the distortion between model outputs. The
measure of distortion should be chosen according to the task of the model Φ : RnRmand the objective
of the explanation.
Example 7 (Measure of distortion for classification task) Consider a classification model Φ : RnRm
and a target input signal xRn. The model Φassigns to each class j∈ {1, . . . , m}a (pre-softmax) score
Φj(x)and the predicted label is given by j:= arg maxj∈{1,...,m}Φj(x). One commonly used measure of
the distortion between the outputs at xand another data point yRnis given as
On the other hand, the vector j(x)]m
j=1 is usually normalized to a probability vector [˜
j=1 by ap-
plying the softmax function, namely ˜
Φj(x):= exp Φj(x)/Pm
i=1 exp Φi(x). This, in turn, gives another
measure of the distortion between Φ(x),Φ(y)Rm, namely
where j:= arg maxj∈{1,...,m}Φj(x) = arg maxj∈{1,...,m}˜
Φj(x). An important property of the softmax
function is the invariance under translation by a vector [c, . . . , c]>Rm, where cRis a constant. By
definition, only d2respects this invariance while d1does not.
Example 8 (Measure of distortion for regression task) Consider a regression model Φ : RnRmand
an input signal xRn. One can then define the measure of distortion between the outputs of xand another
data point yRnas
Sometimes it is reasonable to consider a certain subset of components J⊆ {1, . . . , m}of the output vectors
instead of all mentries. Denoting the vector formed by corresponding entries by ΦJ(x), the measure of
distortion between the outputs can be defined as
The measure d4will be used in our experiments for radio maps in Subsection 4.3.
3.2 Implementation
The RDE mask from Definition 2 was defined as a solution to
s∈{0,1}kD(x, s, Vs,Φ) s.t. ksk0`.
In practice, we need to relax this problem. We offer the following three approaches.
3.2.1 `1-relaxation with Lagrange multiplier.
The RDE mask can be approximately computed by finding an approximate solution to the following relaxed
minimization problem:
s[0,1]kD(x, s, Vs,Φ) + λksk1,(P1)
where λ > 0is a hyperparameter for the sparsity level. Note that the optimization problem is not necessarily
convex, thus the solution might not be unique.
The expected distortion D(x, s, Vs,Φ) can typically be approximated with simple Monte-Carlo esti-
mates, i.e., by averaging i.i.d. samples from Vs. After estimating D(x, s, Vs,Φ), one can optimize the mask
swith stochastic gradient descent (SGD) to solve the optimization problem (P1).
3.2.2 Bernoulli relaxation.
By viewing the binary mask as Bernoulli random variables sBer(θ)and optimizing over θ, one can
guarantee that the expected distortion D(x, s, Vs,Φ) is evaluated on binary masks s∈ {0,1}n. To encour-
age sparsity of the resulting mask, one can still apply `1-regularization on s, giving rise to the following
optimization problem:
sBer(θ)"D(x, s, Vs,Φ)#+λksk1.(P2)
Optimizing the parameter θrequires a continuous relaxation to apply SGD. This can be done using the
concrete distribution [17], which samples sfrom a continuous relaxation of the Bernoulli distribution.
3.2.3 Matching pursuit.
As an alternative, one can also perform matching pursuit [18]. Here, the non-zero entries of s∈ {0,1}nare
determined sequentially in a greedy fashion to minimize the resulting distortion in each step. More precisely,
we start with a zero mask s0= 0 and gradually build up the mask by updating stat step tby the rule given
st+1 =st+ arg min
D(x, st+ej,Vs,Φ).
Here, the minimization is taken over all standard basis vectors ejRkwith st
j= 0. The algorithm
terminates when reaching some desired error tolerance or after a prefixed number of iterations. While this
means that in each iteration we have to test every entry of s, it is applicable when kis small or when we are
only interested in very sparse masks.
4 Experiments
With our experiments, we demonstrate the broad applicability of the generalized RDE framework. Moreover,
our experiments illustrate how different choices of obfuscation strategies, optimization procedures, measures
of distortion, and input signal representations, discussed in Section 3.1, can be leveraged in practice. We
explain model decisions on various challenging data modalities and tailor the input signal representation and
measure of distortion to the domain and interpretation query. In Section 4.1, we focus on image classifi-
cation, a common baseline task in the interpretability literature. In Sections 4.2 and 4.3, we consider two
other data modalities that are often unexplored. Section 4.2 focuses on audio data, where the underlying
task is to classify acoustic instruments based on a short audio sample of distinct notes, while in Section 4.3,
the underlying task is a regression with data in the form of physical simulations in urban environments. We
also believe our explanation framework sustains applications beyond interpretability tasks. An example is
given in Section 4.3.2, where we add an RDE inspired regularizer to the training objective of a radio map
estimation model.
4.1 Images
We begin with the most ordinary domain in the interpretability literature: image classification tasks. The
authors of [16] applied RDE to image data before by considering pixel-wise perturbations. We refer to this
method as Pixel RDE. Other explanation methods [20], [1], [2], and [3], have also previously exclusively
operated in the pixel domain. In [11], we challenged this customary practice by successfully applying RDE
in a wavelet basis, where sparsity translates into piece-wise smooth images (also called cartoon-like images).
The novel explanation method was coined CartoonX [11] and extracts the relevant piece-wise smooth part
of an image. First, we review the Pixel RDE method and present experiments on the ImageNet dataset [4],
which is commonly considered a challenging classification task. Finally, we present CartoonX and discuss
its advantages. For all the ImageNet experiments, we use the pre-trained MobileNetV3-Small [10], which
achieved a top-1 accuracy of 67.668% and a top-5 accuracy of 87.402%, as the classifier.
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
Figure 1: Top row: original images correctly classified as (a) snail, (b) male duck, and (c) airplane. Middle
row: Pixel RDEs. Bottom row: CartoonX. Notably, CartoonX is roughly piece-wise smooth and overall
more interpretable than the jittery Pixel RDEs.
4.1.1 Pixel RDE.
Consider the following pixel-wise representation of an RGB image xR3×n:
R3Rn×3, x =f(h1, ..., hn),
where hiR3represents the three color channel values of the i-th pixel in the image x, i.e. (xi,j )j=1,..,3=
hi. In pixel RDE a sparse mask s[0,1]nwith nentries—one for each pixel—is optimized to achieve low
expected distortion D(x, s, Vs,Φ). The obfuscation of an image xwith the pixel mask sand a distribution
v∼ Vson Qn
i=1 R3is defined as f(sh+ (1 s)v). In our experiments, we initialize the mask with
ones, i.e., si= 1 for every i∈ {1, . . . , n}, and consider Gaussian noise perturbations Vs=N(µ, Σ). We
set the noise mean µR3×nas the pixel value mean of the original image xand the covariance matrix
Σ:=σ2Id R3n×3nas a diagonal matrix with σ > 0defined as the pixel value standard deviation of the
original image x. We then optimize the pixel mask sfor 2000 gradient descent steps on the `1-relaxation of
the RDE objective (see Section 3.2.1). We computed the distortion dΦ(x),Φ(y)in D(x, s, Vs,Φ) in the
post-softmax activation of the predicted label multiplied by a constant C= 100, i.e.,
The expected distortion D(x, s, Vs,Φ) was approximated as a simple Monte-Carlo estimate after sampling
64 noise perturbations. For the sparsity level, we set the Lagrange multiplier to λ= 0.6. All images were
resized to 256 ×256 pixels. The mask was optimized for 2000 steps using the Adam optimizer with step
size 0.003. In the middle row of Figure 1, we show three example explanations with Pixel RDE for an image
of a snail, a male duck, and an airplane, all from the ImageNet dataset. Pixel RDE highlights as relevant
both the snail’s inner shell and part of its head, the lower segment of the male duck along with various lines
in the water, and the airplane’s fuselage and part of its rudder.
4.1.2 CartoonX.
Formally, we represent an RGB image x[0,1]3×nin its wavelet coefficients h={hi}n
i=1 Qn
i=1 R3
with J∈ {1,...,blog2nc} scales as x=f(h), where f is the discrete inverse wavelet transform. Each
hi= (hi,c)3
c=1 R3contains three wavelet coefficients of the image, one for each color channel and is
associated with a scale ki∈ {1, . . . , J }and a position in the image. Low scales describe high frequencies
and high scales describe low frequencies at the respective image position. We briefly illustrate the wavelet
coefficients in Figure 2, which visualizes the discrete wavelet transform of an image.
(a) (b)
Figure 2: Discrete Wavelet Transform of an image: (a) original image (b) discrete wavelet transform. The
coefficients of the largest quadrant in (b) correspond to the lowest scale and coefficients of smaller quad-
rants gradually build up to the highest scales, which are located in the four smallest quadrants. Three nested
L-shaped quadrants represent horizontal, vertical and diagonal edges at a resolution determined by the asso-
ciated scale.
CartoonX [11] is a special case of the generalized RDE framework, particularly a special case of Ex-
ample 2, and optimizes a sparse mask s[0,1]non the wavelet coefficients (see Figure 3c) so that the
expected distortion D(x, s, Vs,Φ) remains small. The obfuscation of an image xwith a wavelet mask s
and a distribution v∼ Vson the wavelet coefficients is f(sh+ (1 s)v). In our experiments, we
used Gaussian noise perturbations and chose the standard deviation and mean adaptively for each scale: the
standard deviation and mean for wavelet coefficients of scale j∈ {1, . . . , J }were chosen as the standard
deviation and mean of the wavelet coefficients of scale j∈ {1, . . . , J }of the original image. Figure 3d
shows the obfuscation f(sh+ (1 s)v)with the final wavelet mask safter the RDE optimization
procedure. In Pixel RDE, the mask itself is the explanation as it lies in pixel space (see middle row in Figure
1), whereas the CartoonX mask lies in the wavelet domain. To go back to the natural image domain, we
multiply the wavelet mask element-wise with the wavelet coefficients of the original greyscale image and
invert this product back to pixel space with the discrete inverse wavelet transform. The inversion is finally
clipped into [0,1] as are obfuscations during the RDE optimization to avoid overflow (we assume here the
pixel values in xare normalized into [0,1]). The clipped inversion in pixel space is the final CartoonX
explanation (see Figure 3e).
The following points should be kept in mind when interpreting the final CartoonX explanation, i.e., the
inversion of the wavelet coefficient mask: (1) CartoonX provides the relevant pice-wise smooth part of the
image. (2) The inversion of the wavelet coefficient mask was not optimized to be sparse in pixel space but in
the wavelet basis. (3) A region that is black in the inversion could nevertheless be relevant if it was already
black in the original image. This is due to the multiplication of the mask with the wavelet coefficients
of the greyscale image before taking the discrete inverse wavelet transform. (4) Bright high resolution
regions are relevant in high resolution and bright low resolution regions are relevant in low resolution. (5)
It is inexpensive for CartoonX to mark large regions in low resolution as relevant. (6) It is expensive for
CartoonX to mark large regions in high resolution as relevant.
In Figure 1, we compare CartoonX to Pixel RDE. The piece-wise smooth wavelet explanations are more
(a) (b) (c)
(d) (e) (f)
Figure 3: CartoonX machinery: (a) image classified as park-bench, (b) discrete wavelet transform of the
image, (c) final mask on the wavelet coefficients after the RDE optimization procedure, (d) obfuscation with
final wavelet mask and noise, (e) final CartoonX, (f) Pixel RDE for comparison.
interpretable than the jittery Pixel RDEs. In particular, CartoonX asserts that the snail’s shell without the
head suffices for the classification, unlike Pixel RDE, which insinuated that both the inner shell and part
of the head are relevant. Moreover, CartoonX shows that the water gives the classifier context for the
classification of the duck, which one could have only guessed from the Pixel RDE. Both Pixel RDE and
CartoonX state that the head of the duck is not relevant. Lastly, CartoonX, like Pixel RDE, confirms that the
wings play a subordinate role in the classification of the airplane.
4.1.3 Why explain in the wavelet basis?
Wavelets provide optimal representation for piece-wise smooth 1D functions [5], and represent 2D piece-
wise smooth images, also called cartoon-like images [12], efficiently as well [21]. Indeed, sparse vectors in
the wavelet coefficient space encode cartoon-like images reasonably well [25], certainly better than sparse
pixel representations. Moreover, the optimization process underlying CartoonX produces sparse vectors in
the wavelet coefficient space. Hence CartoonX typically generates cartoon-like images as explanations. This
is the fundamental difference to Pixel RDE, which produces rough, jittery, and pixel-sparse explanations.
Cartoon-like images are more interpretable and provide a natural model of simplified images. Since the goal
of the RDE explanation is to generate an easy to interpret simplified version of the input signal, we argue
that CartoonX explanations are more appropriate for image classification than Pixel RDEs. Our experiments
confirm that the CartoonX explanations are roughly piece-wise smooth explanations and are overall more
interpretable than Pixel RDEs (see Figure 1).
4.1.4 CartoonX implementation.
Throughout our CartoonX experiments we chose the Daubechies 3 wavelet system, J= 5 levels of scales
and zero padding for the discrete wavelet transform. For the implementation of the discrete wavelet trans-
form, we used the Pytorch Wavelets package, which supports gradient computation in Pytorch. Distortion
0.00 0.02 0.04 0.06 0.08 0.10
DWT based mask
Pixel based mask
Center of mass
Center of mass
Figure 4: Scatter plot of rate-distortion in pixel basis and wavelet basis. Each point is an explanation of a
distinct image in the ImageNet dataset with distortion and normalized `1-norm measured for the final mask.
The wavelet mask achieves lower distortion than the pixel mask, while using less coefficients.
was computed as in the Pixel RDE experiments. The perturbations v∼ Vson the wavelet coefficients were
chosen as Gaussian noise with standard deviation and mean computed adaptively per scale. As in the Pixel
RDE experiments, the wavelet mask was optimized for 2000 steps with the Adam optimizer to minimize the
`1-relaxation of the RDE objective. We used λ= 3 for CartoonX.
4.1.5 Efficiency of CartoonX.
Finally, we compare Pixel RDE to CartoonX quantitatively by analyzing the distortion and sparsity asso-
ciated with the final explanation mask. Intuitively, we expect the CartoonX method to have an efficiency
advantage, since the discrete wavelet transform already encodes natural images sparsely, and hence less
wavelet coefficients are required to represent images than pixel coefficients. Our experiments confirmed this
intuition, as can be seen in the scatter plot in Figure 4.
4.2 Audio
We consider the NSynth dataset [6], a library of short audio samples of distinct notes played on a variety of
instruments. We pre-process the data by computing the power-normalized magnitude spectrum and phase
information using the discrete Fourier transform on a logarithmic scale from 20 to 8000 Hertz. Each data
instance is then represented by the magnitude and the phase of its Fourier coefficients as well as the discrete
inverse Fourier transform (see Example 2).
4.2.1 Explaining the classifier.
Our model Φis a network trained to classify acoustic instruments. We compute the distortion with respect to
the pre-softmax scores, i.e., deploy d1in Example 7 as the measure of distortion. We follow the obfuscation
strategy described in Example 5 and train an inpainter Gto generate the obfuscation G(h, s, z). Here, h
corresponds to the representation of a signal, sis a binary mask and zis a normally distributed seed to the
We use a residual CNN architecture for Gwith added noise in the input and deep features. More details
can be found in Section 4.2.3. We train Guntil the outputs are found to be satisfactory, exemplified by the
outputs in Figure 5.
To compute the explanation maps, we numerically solve (P2) as discussed in Subsection 3.2. In par-
ticular, sis a binary mask indicating whether the phase and magnitude information of a certain frequency
should be dropped and is specified as a Bernoulli variable sBer(θ). We chose a regularization parameter
of λ= 50 and minimized the corresponding objective using the Adam optimizer with a step size of 105
Figure 5: Inpainted Bass: Example inpainting from G. The bottom plot depicts phase versus frequency and
the top plot depicts magnitude versus frequency. The random binary mask is represented by the green parts.
The axes for the inpainted signal (black) and the original signal (blue dashed) are offset to improve visibility.
Note how the inpainter generates plausible peaks in the magnitude and phase spectra, especially with regard
to rapid (600Hz) versus smooth (<270Hz) changes in phase.
in 106iterations. For the concrete distribution, we used a temperature of 0.1. Two examples resulting from
this process can be seen in Figure 6.
Notice here that the method actually shows a strong reliance of the classifier on low frequencies (30Hz-
60Hz) to classify the top sample in Figure 6 as a guitar, as only the guitar samples have this low frequency
slope in the spectrum. We can also see in contrast that classifying the bass sample relies more on the
continuous signal between 100Hz and 230Hz.
4.2.2 Magnitude vs Phase.
In the above experiment, we have represented the signals by the magnitude and phase information at each
frequency, hence the mask sacts on each frequency. Now we consider the interpretation query of whether
the entire magnitude spectrum or the entire phase spectrum is more relevant for the prediction. Accordingly,
we consider the representation discussed in Example 4 and apply the mask sto turn off or on the whole
magnitude spectrum or the phase information. Furthermore, we can optimize snot only for one datum but
for all samples from a class. This extracts the information whether magnitude or phase is more important
for predicting samples from a specific class.
For this, we again minimized (P2) (meaned over all samples of a class) with θas the Bernoulli parameter
using the Adam optimizer for 2×105iterations with a step size of 104and the regularization parameter
λ= 30. Again, a temperature of t= 0.1was used for the concrete distribution.
ORG AN 0.829 1.0
GUI TAR 0.0 0.999
FLU TE 0.092 1.0
BAS S 1.0 1.0
REE D 0.136 1.0
VOC AL 1.0 1.0
MAL LE T 0.005 0.217
BRA SS 0.999 1.0
KEY BOA RD 0.003 1.0
STRING 1.0 0.0
Table 2: Magnitude importance versus phase importance.
From the results of these computations, which can be seen in Table 2, we can observe that there is a
clear difference on what the classifier bases its decision on across instruments. The classification of most
(a) Guitar
(b) Bass
Figure 6: Interpreting NSynth Model: The optimized importance parameter θ(green) overlayed on top of
the DFT (blue). For each of guitar and bass, the top graph shows the power-normalized magnitude and the
bottom the phase. Notice the solid peaks between 30Hz and 60Hz for guitar and between 100Hz and 230Hz
for bass. These occur because the model is relying on those parts of the spectra, for the classification. Notice
also how many parts of the spectrum are important even when the magnitude is near zero. This indicates
that the model pays attention to whether those frequencies are missing.
instruments is largely based on phase information. For the mallet, the values are low for magnitude and
phase, which means that the expected distortion is very low compared to the `1-norm of the mask, even
when the signal is completely inpainted. This underlines that the regularization parameter λmay have to be
adjusted for different data instances, especially when measuring distortion in the pre-softmax scores.
4.2.3 Architecture of the inpainting network G.
Here, we briefly describe the architecture of the inpainting network Gthat was used to generate obfusca-
tions to the target signals. In particular, Figure 7 shows the diagram of the network Gand Table 3 shows
information about its layers.
4.3 Radio Maps
In this subsection, we assume a set of transmitting devices (Tx) broadcasting a signal within a city. The
received strength varies with location and depends on physical factors such as line of sight, reflection, and
diffraction. We consider the regression problem of estimating a function that assigns the proper signal
strength to each location in the city. Our dataset Dis RadioMapSeer [14] containing 700 maps, 80 Tx
per map, and a corresponding grayscale label encoding the signal strength at every location. Our model Φ
receives as input x= [x(0), x(1), x(2) ], where x(0) is a binary map of the Tx locations, x(1) is a noisy binary
map of the city (where a few buildings are missing), and x(2) is a grayscale image representing a number of
CON V1D-1 21 [-1, 32, 1024] 4,736
RELU-2 [-1, 32, 1024] 0
CON V1D-3 21 [-1, 64, 502] 43,072
RELU-4 [-1, 64, 502] 0
BATCH NOR M1D-5 [-1, 64, 502] 128
CON V1D-6 21 [-1, 128, 241] 172,160
RELU-7 [-1, 128, 241] 0
BATCH NOR M1D-8 [-1, 128, 241] 256
CON V1D-9 21 [-1, 16, 112] 43,024
RELU-10 [-1, 16, 112] 0
BATCH NOR M1D-11 [-1, 16, 112] 32
CON VTR AN SP O SE 1D-12 21 [-1, 64, 243] 43,072
RELU-13 [-1, 64, 243] 0
BATCH NOR M1D-14 [-1, 64, 243] 128
CON VTR AN SP O SE 1D-15 21 [-1, 128, 505] 172,160
RELU-16 [-1, 128, 505] 0
BATCH NOR M1D-17 [-1, 128, 505] 256
CON VTR AN SP O SE 1D-18 20 [-1, 64, 1024] 163,904
RELU-19 [-1, 64, 1024] 0
BATCH NOR M1D-20 [-1, 64, 1024] 128
SKIP CONNECTION [-1, 103, 1024] 0
CON V1D-21 7 [-1, 128, 1024] 92,416
RELU-22 [-1, 128, 1024] 0
CON V1D-23 7 [-1, 2, 1024] 1,794
RELU-24 [-1, 2, 1024] 0
Table 3: Layer table of the Inpainting model for the NSynth task.
ground truth measurements of the strength of the signal at the measured locations and zero elsewhere. We
apply the UNet [22, 14, 13] architecture and train Φto output the estimation of the signal strength throughout
the city that interpolates the input measurements.
Apart from the model Φ, we also have a simpler model Φ0, which only receives the city map and the Tx
locations as inputs and is trained with unperturbed input city maps. This second model Φ0will be deployed
to inpaint measurements to input to Φ. See Figure 8a, 8b, and 8c for examples of a ground truth map and
estimations for Φand Φ0, respectively.
Magnitude and
Phase Spectrum
Binary Mask
Gaussian Noise
Gaussian Noise
Skip connection
Skip connection
Figure 7: Diagram of the inpainting network for NSynth.
(a) Ground Truth (b) ΦEstimation (c) Φ0Estimation
Figure 8: Radio map estimations: The radio map (gray), input buildings (blue), and input measurements
4.3.1 Explaining Radio Map Φ.
Observe that in Figure 8a there is a missing building in the input (the black one) and in Figure 8b, Φin-fills
this building with a shadow. As a black box method, it is unclear why it made this decision. Did it rely
on signal measurements or on building patterns? To address this, we consider each building as a cluster of
pixels and each measurement as potential targets for our mask s= [s(1), s(2)], where s(1) acts on buildings
and s(2) acts on measurements. We then apply matching pursuit (see Subsection 3.2.3) to find a minimal
mask sof critical components (buildings and measurements).
To be precise, suppose we are given a target input signal x= [x(0), x(1), x(2)]. Let k1denote the
number of buildings in x(1) and k2denote the number of measurements in x(2). Consider the function
f1that takes as inputs vectors in {0,1}k1, which indicate the existence of buildings in x(1), and maps
them to the corresponding city map in the original city map format. Analogously, consider the function f2
that takes as input the measurements in Rk2and maps them to the corresponding grayscale image of the
original measurements format. Then, f1and f2encode the locations of the buildings and measurements in
the target signal x= [x(0), f1(h(1)), f2(h(2))], where h(1) and h(2) denotes the building and measurement
representation of xin f1and f2. When s(1) has a zero entry, i.e., a building in h(1) was not selected, we
replace the value in the obfuscation with zero (this corresponds to a constant perturbation equal to zero).
Then, the obfuscation of the target signal xwith a mask s= [s(1), s(2) ]and perturbations v= [v(1), v (2)]:=
[0, v(2)]becomes:
y:= [x(0), f1(s(1) h(1)), f2(s(2) h(2) + (1 s(2))v(2))].
While it is natural to model masking out a building by simply zeroing out the corresponding cluster of
pixels by choosing v(1) = 0, we need to also properly choose v(2) for the entries, where the mask s(2)
takes value 0, in order to obtain appropriate obfuscations. For this, we can deploy the second model Φ0as
an inpainter. We consider the following two extreme obfuscation strategies. The first is to set also v(2) to
zero, i.e., simply remove the unchosen measurements from the input, with the underlying assumption being
that any subset of measurements is valid for a city map. In the other extreme case, we inpaint all unchosen
measurements by sampling at their locations the estimated radio map obtained by Φ0based on the buildings
selected by s(1).
The two extreme measurement completion methods correspond to two extremes of the interpretation
query. Filling-in the missing measurements by Φ0tends to overestimate the strength of the signal because
there are fewer buildings to obstruct the transmissions. The empty mask will complete all measurements to
the maximal possible signal strength – the free space radio map. The overestimation in signal strength is
reduced when more measurements and buildings are chosen, resulting in darker estimated radio maps. Thus,
this strategy is related to the query of which measurements and buildings are important to darken the free
space radio map, turning it to the radio map produced by Φ. In the other extreme, adding more measurements
to the mask with a fixed set of buildings typically brightens the resulting radio map. This allows us to answer
which measurements are most important for brightening the radio map.
Between these two extreme strategies lies a continuum of completion methods where a random subset of
the unchosen measurements is sampled from Φ0, while the rest are set to zero. Examples of explanations of
a prediction Φ(x)according to these methods are presented in Figure 9. Since we only care about specific
small patches exemplified by the green boxes, the distortion here is measured with respect to the `2distance
between the output images restricted to the corresponding region (see also Example 8).
(a) Estimated map. (b) Explanation: Inpaint all uncho-
sen measurements.
(c) Explanation: Inpaint 2.5% of
unchosen measurements.
Figure 9: Radio map queries and explanations: The radio map (gray), input buildings (blue), input mea-
surements (red), and area of interest (green box). Middle represents the query “How to fill in the image
with shadows”, while right is the query “How to fill in the image both with shadows and bright spots?”. We
inpaint with Φ0.
When the query is how to darken the free space radio map (Figure 9b), the optimized mask ssuggests
that samples in the shadow of the missing building are the most influential in the prediction. These dark
measurements are supposed to be in line-of-sight of a Tx, which indicates that the network deduced that
there is a missing building. When the query is how to fill in the image both with shadows and bright spots
(Figure 9c), both samples in the shadow of the missing building and samples right before the building are
influential. This indicates that the network used the bright measurements in line-of-sight and avoided pre-
dicting an overly large building. To understand the chosen buildings, note that Φis based on a composition
of UNets and is thus interpreted as a procedure of extracting high level and global information from the
inputs to synthesize the output. The locations of the chosen buildings in Figure 9 reflect this global nature.
4.3.2 Interpretation-Driven Training.
We now discuss an example application of the explanation obtained by the RDE approach described above,
called interpretation driven training. When a missing building is in line-of-sight of a Tx, we would like Φ
to reconstruct this building relying on samples in the shadow of the building rather than patterns in the city.
To reduce the reliance of Φon the city information in this situation, one can add a regularization term in the
training loss which promotes explanations relying on measurements. Suppose x= [x(0) , x(1), x(2)]contains
a missing input building in line-of-sight of the Tx location and denote the subset of pixels of the missing
building in the city map as Jx. Denote the prediction by Φrestricted to the subset Jxas ΦJx. Moreover,
define ˜x:= [x(0),0, x(2)]to be the modification of xwith all input buildings masked out. We then define the
interpretation loss for xas
`int, x):=kΦJx(x)ΦJxx)k2
The interpretation driven training objective then regularizes Φduring training by adding the interpretation
loss for all inputs xthat contain a missing input building in line-of-sight of the Tx location. An example
comparison between explanations of the vanilla RadioUNet Φand the interpretation driven network Φint is
given in Figure 10.
(a) Vanilla Φestimation (b) Interpretation-driven
Φint estimation
(c) Vanilla Φexplanation (d) Interpretation-driven
Φint explanation
Figure 10: Radio map estimations, interpretation driven training vs vanilla training: The radio map (gray),
input buildings (blue), input measurements (red), and domain of the missing building (green box).
5 Conclusion
In this work, we presented the Rate-Distortion Explanation (RDE) framework in a revised and compre-
hensive manner. Our framework is flexible enough to answer various interpretation queries by considering
suitable data representations tailored to the underlying domain and query. We demonstrate the latter and the
overall efficacy of the RDE framework on an image classification task, on an audio signal classification task,
and on a radio map estimation task, a seldomly explored regression task.
[1] Sebastian Bach, Alexander Binder, Gr´
egoire Montavon, Frederick Klauschen, Klaus-Robert M¨
and Wojciech Samek. On pixel-wise explanations for non-linear classifier decisions by layer-wise
relevance propagation. PLoS ONE, 10(7):e0130140, 2015.
[2] Chun-Hao Chang, Elliot Creager, Anna Goldenberg, and David Duvenaud. Explaining image classi-
fiers by counterfactual generation. In Proceedings of the 7th International Conference on Learning
Representations, ICLR, 2019.
[3] Piotr Dabkowski and Yarin Gal. Real time image saliency for black box classifiers. In Proceed-
ings of the 31st International Conference on Neural Information Processing Systems, NeurIPS, page
6970–6979, 2017.
[4] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale
hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and
Pattern Recognition, CVPR, pages 248–255, 2009.
[5] Ronald A. DeVore. Nonlinear approximation. Acta Numerica, 7:51–150, 1998.
[6] Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Mohammad Norouzi, Douglas Eck,
and Karen Simonyan. Neural audio synthesis of musical notes with wavenet autoencoders. In Proceed-
ings of the 34th International Conference on Machine Learning, ICML, volume 70, page 1068–1077,
[7] R. C. Fong and A. Vedaldi. Interpretable explanations of black boxes by meaningful perturbation. In
Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV), pages 3449–3457,
[8] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair,
Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Proceedings of the 27th Interna-
tional Conference on Neural Information Processing Systems, NeurIPS, page 2672–2680, 2014.
[9] Cosmas Heiß, Ron Levie, Cinjon Resnick, Gitta Kutyniok, and Joan Bruna. In-distribution inter-
pretability for challenging modalities. Preprint arXiv:2007.00758, 2020.
[10] Andrew Howard, Mark Sandler, Bo Chen, Weijun Wang, Liang-Chieh Chen, Mingxing Tan, Grace
Chu, Vijay Vasudevan, Yukun Zhu, Ruoming Pang, Hartwig Adam, and Quoc Le. Searching for
MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision
(ICCV), pages 1314–1324, 2019.
[11] Stefan Kolek, Duc Anh Nguyen, Ron Levie, Joan Bruna, and Gitta Kutyniok. Cartoon explanations of
image classifiers. arXiv: 2110.03485, 2021.
[12] Gitta Kutyniok and Wang-Q Lim. Compactly supported shearlets are optimally sparse. Journal of
Approximation Theory, 163(11):1564–1589, 2011.
[13] Ron Levie, Cagkan Yapar, Gitta Kutyniok, and Giuseppe Caire. Pathloss prediction using deep learning
with applications to cellular optimization and efficient d2d link scheduling. In ICASSP 2020 - 2020
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8678–
8682, 2020.
[14] Ron Levie, Cagkan Yapar, Gitta Kutyniok, and Giuseppe Caire. RadioUNet: Fast radio map estimation
with convolutional neural networks. IEEE Transactions on Wireless Communications, 20(6):4001–
4015, 2021.
[15] Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In Proceed-
ings of the 31st International Conference on Neural Information Processing Systems, NeurIPS, page
4768–4777, 2017.
[16] Jan Macdonald, Stephan W¨
aldchen, Sascha Hauch, and Gitta Kutyniok. A rate-distortion framework
for explaining neural network decisions. Preprint arXiv:1905.11092, 2019.
[17] Chris J. Maddison, Andriy Mnih, and Yee Whye Teh. The concrete distribution: A continuous relax-
ation of discrete random variables. Preprint arXiv:1611.00712, 2016.
[18] S.G. Mallat and Zhifeng Zhang. Matching pursuits with time-frequency dictionaries. IEEE Transac-
tions on Signal Processing, 41(12):3397–3415, 1993.
[19] M. Narasimha and A. Peterson. On the computation of the discrete cosine transform. IEEE Transac-
tions on Communications, 26(6):934–936, 1978.
[20] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining
the predictions of any classifier. In Proceedings of the 22nd International Conference on Knowledge
Discovery and Data Mining, ACM SIGKDD, page 1135–1144. Association for Computing Machinery,
[21] Justin K. Romberg, Michael B. Wakin, and Richard G. Baraniuk. Wavelet-domain approximation and
compression of piecewise smooth images. IEEE Trans. Image Processing, 15:1071–1087, 2006.
[22] O. Ronneberger, P.Fischer, and T. Brox. U-Net: Convolutional networks for biomedical image segmen-
tation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), volume 9351 of
LNCS, pages 234–241, 2015.
[23] Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through prop-
agating activation differences. In Proceedings of the 34th International Conference on Machine Learn-
ing, ICML, volume 70, page 3145–3153, 2017.
[24] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Vi´
egas, and Martin Wattenberg. Smoothgrad:
removing noise by adding noise. In Workshop on Visualization for Deep Learning, ICML, 2017.
[25] Mallat St´
ephane. Chapter 11.3. In Mallat St´
ephane, editor, A Wavelet Tour of Signal Processing (Third
Edition), pages 535–610. Academic Press, Boston, third edition edition, 2009.
[26] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In Proceed-
ings of the 34th International Conference on Machine Learning, ICML, volume 70, page 3319–3328,
[27] Jacopo Teneggi, Alexandre Luster, and Jeremias Sulam. Fast hierarchical games for image explana-
tions. Preprint arXiv:2104.06164, 2021.
[28] Stephan W¨
aldchen, Jan Macdonald, Sascha Hauch, and Gitta Kutyniok. The computational complexity
of understanding network decisions. Preprint arXiv:1905.09163, 2019.
[29] Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. Generative image
inpainting with contextual attention. In Proceedings of the 2018 IEEE/CVF Conference on Computer
Vision and Pattern Recognition, CVPR, pages 5505–5514, 2018.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
As modern complex neural networks keep breaking records and solving harder problems, their predictions also become less and less intelligible. The current lack of interpretability often undermines the deployment of accurate machine learning tools in sensitive settings. In this work, we present a model-agnostic explanation method for image classification based on a hierarchical extension of Shapley coefficients-Hierarchical Shap (h-Shap)-that resolves some of the limitations of current approaches. Unlike other Shapley-based explanation methods, h-Shap is scalable and can be computed without the need of approximation. Under certain distributional assumptions, such as those common in multiple instance learning, h-Shap retrieves the exact Shapley coefficients with an exponential improvement in computational complexity. We compare our hierarchical approach with popular Shapley-based and non-Shapley-based methods on a synthetic dataset, a medical imaging scenario, and a general computer vision problem, showing that h-Shap outperforms the state of the art in both accuracy and runtime. Code and experiments are made publicly available.
Full-text available
In this paper we propose a highly efficient and very accurate deep learning method for estimating the propagation pathloss from a point x (transmitter location) to any point y on a planar domain. For applications such as user-cell site association and device-to-device link scheduling, an accurate knowledge of the pathloss function for all pairs of transmitter-receiver locations is very important. Commonly used statistical models approximate the pathloss as a decaying function of the distance between transmitter and receiver. However, in realistic propagation environments characterized by the presence of buildings, street canyons, and objects at different heights, such radial-symmetric functions yield very misleading results. In this paper we show that properly designed and trained deep neural networks are able to learn how to estimate the pathloss function, given an urban environment, in a very accurate and computationally efficient manner. Our proposed method, termed RadioUNet, learns from a physical simulation dataset, and generates pathloss estimations that are very close to the simulations, but are much faster to compute for real-time applications. Moreover, we propose methods for transferring what was learned from simulations to real-life. Numerical results show that our method significantly outperforms previously proposed methods.
Conference Paper
Full-text available
Understanding why a model makes a certain prediction can be as crucial as the prediction's accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we present a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent methods in the class lack the proposed desirable properties. Based on insights from this unification, we present new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.
For a d-ary Boolean function Φ: {0, 1}d → {0, 1} and an assignment to its variables x = (x1, x2, . . . , xd) we consider the problem of finding those subsets of the variables that are sufficient to determine the function value with a given probability δ. This is motivated by the task of interpreting predictions of binary classifiers described as Boolean circuits, which can be seen as special cases of neural networks. We show that the problem of deciding whether such subsets of relevant variables of limited size k ≤ d exist is complete for the complexity class NPPP and thus, generally, unfeasible to solve. We then introduce a variant, in which it suffices to check whether a subset determines the function value with probability at least δ or at most δ − γ for 0 < γ < δ. This promise of a probability gap reduces the complexity to the class NPBPP. Finally, we show that finding the minimal set of relevant variables cannot be reasonably approximated, i.e. with an approximation factor d1−α for α > 0, by a polynomial time algorithm unless P = NP. This holds even with the promise of a probability gap.
Conference Paper
The reparameterization trick enables the optimization of large scale stochastic computation graphs via gradient descent. The essence of the trick is to refactor each stochastic node into a differentiable function of its parameters and a random variable with fixed distribution. After refactoring, the gradients of the loss propagated by the chain rule through the graph are low variance unbiased estimators of the gradients of the expected loss. While many continuous random variables have such reparameterizations, discrete random variables lack continuous reparameterizations due to the discontinuous nature of discrete states. In this work we introduce concrete random variables -- continuous relaxations of discrete random variables. The concrete distribution is a new family of distributions with closed form densities and a simple reparameterization. Whenever a discrete stochastic node of a computation graph can be refactored into a one-hot bit representation that is treated continuously, concrete stochastic nodes can be used with automatic differentiation to produce low-variance biased gradients of objectives (including objectives that depend on the log-likelihood of latent stochastic nodes) on the corresponding discrete graph. We demonstrate their effectiveness on density estimation and structured prediction tasks using neural networks.
Conference Paper
Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally varound the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.