PreprintPDF Available

NIF: A Framework for Quantifying Neural Information Flow in Deep Networks

Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

In this paper, we present a new approach to interpreting deep learning models. More precisely, by coupling mutual information with network science, we explore how information flows through feed forward networks. We show that efficiently approximating mutual information via the dual representation of Kullback-Leibler divergence allows us to create an information measure that quantifies how much information flows between any two neurons of a deep learning model. To that end, we propose NIF, Neural Information Flow, a new metric for codifying information flow which exposes the internals of a deep learning model while providing feature attributions.
Content may be subject to copyright.
NIF: A Framework for Quantifying Neural Information Flow in Deep Networks
Brian Davis, Umang Bhatt, Kartikeya Bhardwaj, Radu Marculescu, Jos´
e Moura
Carnegie Mellon University, Pittsburgh, Pennsylvania 15213-3890
{briandavis, umang, kbhardwa, radum, moura}
In this paper, we present a new approach to interpreting deep
learning models. More precisely, by coupling mutual infor-
mation with network science, we explore how information
flows through feed forward networks. We show that efficiently
approximating mutual information via the dual representation
of Kullback-Leibler divergence allows us to create an infor-
mation measure that quantifies how much information flows
between any two neurons of a deep learning model. To that
end, we propose NIF, Neural Information Flow, a new metric
for codifying information flow which exposes the internals of
a deep learning model while providing feature attributions.
As deep learning gains popularity, there has been an influx
in methods that attempt to explain how deep learning begets
its predictive power. Most approaches to interpreting deep
learning models are model agnostic and make local approxi-
mations in the feature space region around the datapoints to
be explained (Ribeiro, Singh, and Guestrin 2016). However,
such techniques fail to capture global model-specific behavior
that is crucial to understanding if the function learned by a
deep learning model aligns well with a users’ intention. More-
over, current noisy approximations neglect the topological
structure of model used for prediction (Sundararajan, Taly,
and Yan 2017).
We note that it is easy to forget the network structure
of deep learning models, particularly feed forward models,
which resemble directed acyclic graphs. However, under-
standing the topological structure of different models can not
only help decide the architecture best suited for the task at
hand, but also help expose the internal interactions between
neurons at inference time. While the existing interpretability
techniques (Chen et al
2018) shed light on which input fea-
tures are responsible for a given prediction, prior art still fails
to quantify how information flows through a deep network
at the neuron-level. This prevents answering one of the most
fundamental questions in deep learning: How much informa-
tion flows through a deep network model from input features
to each of its intermediate neurons?
Equal Contribution
2019, Association for the Advancement of Artificial
Intelligence ( All rights reserved.
To address this question, we consider two types of inter-
pretability notions: (i) model interpretability via attribution
to input features, and (ii) network architecture interpretability
with respect to how information flows from neuron to neu-
ron within a given pretrained model. We believe, addressing
notion (ii) above from the fundamental information theory
standpoint will automatically reveal insights about the pre-
cise decision-making process followed by the model (i.e., the
notion (i) above).
To that end, using an information theoretic measure, we
propose to model the flow of information via Neural Infor-
mation Flow (NIF) between neurons in consecutive layers
to expose how simple deep learning models can learn com-
plex functions of their input features. We further analyze
this flow of information between neurons from a network
science (Barab
asi and Bonabeau 2003) perspective, where
each neuron in the deep network essentially becomes a node
in the network of information flow. Eventually, the NIF can
help recover an information-theoretic feature attribution, a
rank of feature importance to a given class.
Combining an information measure with the ability to
propagate information through the network can help us vi-
sualize the information flow. Feature attributions not only
expose which features are important to a model (just like
current feature attribution techniques (Ribeiro, Singh, and
Guestrin 2016)), but also which information flow paths in the
network are crucial to a model’s prediction; the latter will
allow us to study how information flow is amplified or thwart-
ed when we introduce current state of the art deep learning
building blocks: shortcuts, residuals, dropout, etc. To the best
of our knowledge, we are the first to propose an information-
and network-theoretic model for explaining how information
flows through a deep learning model while accounting for its
network structure.
Network Science
Network science has gained a lot of interest for many bio-
logical and social science applications. However, to the best
of our knowledge, network concepts have not been used to
understand the inner workings of deep neural networks. To
that end, several ideas from network science can be used for
better understanding deep network architectures.
Betweenness Centrality
Given a network
betweenness centrality
of a node
v∈ V
is a measure of
how central a node is in the network. Specifically, the
computes how many shortest paths between different pairs of
nodes in the network pass through node
. Mathematically,
betweeness can be computed as:
B(v) =
is the number of shortest paths between nodes
s, t ∈ V
, and
are the shortest paths passing through
Community Structure
Communities in a network refer to
groups of tightly connected nodes. Intuitively, a community
can be defined as a group of nodes if the number of con-
nections between this group is significantly more than what
we would expect at random. Mathematically, communities
can be computed by maximizing a modularity function as
follows (Newman 2006):
ij Aij 1
2mδ(gi, gj)(1)
is the number of edges,
is the degree (number
of connections) of node
is the weight of the link be-
tween nodes
, and
is Kronecker delta. The idea is to
find groups of tightly connected nodes,
g={g1, g2, . . . , gk}
which map the nodes
communities. The
factor represents the number of links one would expect in a
randomly connected network. Finally,
controls the resolu-
tion of communities: lower gamma will detect more number
of smaller communities (i.e., the number of communities
depends on the resolution of communities, γ).
Current interpretability techniques fall into two classes. The
first class of work are gradient-based methods, which com-
pute the gradient of the output with respect to the input, treat-
ing gradient flow as a saliency map (Sundararajan, Taly, and
Yan 2017). The other type of research leverages perturbation-
based techniques to approximate a complex model using a
locally additive model, thus explaining the difference be-
tween test output-input pair and some reference output-input
pair. Lundberg and Lee proposed SHAP, a class of methods
which randomly draws points from a kernel centered at the
test point and fits a sparse linear model to locally approximate
the decision boundary (Lundberg and Lee 2017). Approxi-
mating Shapley values to quantify the importance of features
of a given input, kernel SHAP can learn a feature attribution.
While gradient-based techniques like (Sundararajan, Taly,
and Yan 2017) consider infinitesimal regions on the decision
surface and take the first-order term in the Taylor expansion
as the additive model, perturbation-based additive models
consider the finite difference between an input vector and a
reference vector.
Information Theory
Mutual information has proven to be a valuable tool for fea-
ture selection at training time leveraging dimensionality re-
duction (Bollacker and Ghosh 1996). More recent work has
represented deep neural networks as Markovian chains to
create an information bottleneck theory for deep learning
(Shwartz-Ziv and Tishby 2017). However, these works do
not tackle the interpretability problem directly.
Other works look to find
, the mutual information
between a subset of the input vector and the output vector.
In order to explain the conditional distribution of the output
vector given the input vector, Chen et al. develop an efficient
variational approximation to mutual information (Chen et al
2018). However, this model fails to recover the per-feature
mutual information, a requisite of our model to explain how
information flows through all possible paths.
Additionally, (Belghazi et al
2018) proposes to esti-
mate mutual information via a neural information measure,
: this quantity is grounded in the dual representa-
tion of the Kullback-Leibler divergence between the joint
and product of the marginals parameterized by
astatistics network
TΘ:X × Z R
: a deep neural net-
work used to estimate the neural information measure from
empirical samples from the joint (
) and the product of
the marginal distributions (
). The empirical neural
information measure (
I) is defined as follows:
I(X,Z) = sup
EP  [Tθ]log(EPP[eTθ])
We use this approximation to start our detailed understanding
of information flow in deep networks. We control
to be different quantities of interest within our model,
namely a specific input feature, a hidden neuron, etc.
Our proposed approach, NIF, transforms a traditional deep
learning model into a representation that actually captures
the information-theoretic relationship between nodes learned
by the model (Fig 1).
Figure 1: Traditional Model to NIF Network. Color of nodes
corresponds to communities. Size of nodes corresponds to
betweenness centrality
Our approach extends the work of (Belghazi et al
and decomposes their approximation of
to give us
, where
is a dimension of
(specifically the
feature of the input vector) and
is any quantity of
interest (perhaps the
neuron in a hidden layer or a class
of the output vector).
Assuming that mutual information is composable and en-
tropy is non-decreasing, we can calculate the mutual infor-
mation for any feature
by leveraging a tractable approxi-
mation from (Bollacker and Ghosh 1996):
I(Xi;Qk) = I(X;Qk)β
can be used to tune the interactive effect of mutual
information between features. The first term is referred to
as the relevance of
and the second term is called
redundancy, as it removes interactions between dimensions of
the input. We desire a tractable approximation to Equation
using the statistics network
, that calculates the mutual
information between two empirical distributions (in this case,
Xto Qk). We can find the relevance term via:
I(X,Qk, TΘ) = sup
θΘEP [Tθ]log(EPP[eTθ])
Similarly, the redundancy term is as follows:
I(Xi,Xj, TΘ) = sup
Combining both relevance and redundancy, we get the fol-
lowing estimate of neural information:
I(Xi,Qk, TΘ) =
I(X,Qk, TΘ)β
I(Xi,Xj, TΘ)
Since we share model parameters between the redundancy
and relevance components, we derive a weaker least upper
bound that allows us to get granular about distributional
interactions. First, let the following hold:
A=EP [Tθ]log(EPP[eTθ])
To that end, we propose NIF, a new metric for neural infor-
mation flow.
N IF = sup
I(Xi,Q, TΘ)(3)
By jointly training
, we can approximate the mutual in-
formation between a feature and a quantity of interest; for
concreteness, let’s assume the quantity of interest is the first
hidden neuron of a hidden layer. Solving Equation
all possible
will place a weight on every edge between a
feature and a quantity of interest.2
In order to test the fidelity of NIF, we run a few experiments
that not only validate our proposed metric, but also lead
to novel interpretations of deep learning models. We run
all experiments on UCI datasets, namely Iris and Banknote
authentication, both of which provide us with a small enough
feature space for us to interpret and visualize (Dheeru and
Karra Taniskidou 2017).
For a thorough derivation of the statistics network as a valid
measure of mutual information, see (Belghazi et al. 2018).
2Note we can scale Xiand Qkto be any two model internals.
One Layer Perceptron
We start by visualizing NIF for a
one layer perceptron trained on the Iris dataset with ReLU
activations and optimized via ADAM. Note that we make a
feature independence assumption for the Iris dataset, since
its low number of samples hinders NIF convergence.
(a) (b)
Figure 2: One layer perceptron for the Iris dataset with ReLU
activation and trained with ADAM (a) NIF network model
(b) Activation distribution of the original model
In Figure 2, we show the NIF network created using Equa-
and a distribution of activations, as a sanity check.
Particularly, in Figure 2(a), we normalize the information
flow per layer to ease visualization of the edges. The thick-
ness of an edge denotes how much information is flowing
between any two nodes: the thicker the connection, the more
information travelling form one node to the next. The size of
the node denotes its centrality: the bigger the node, the more
central it is for information to freely propagate through the
network. The color of the node denotes which community the
node is a member of: using a standard resolution of
γ= 1
we use Equation
to find three distinct communities in the
network. Upon first glance, it is clear that of the five hidden
neurons in the one hidden layer, only three are central to the
model’s final prediction. This results makes intuitive sense as
ReLU activation at those nodes is zero (see Figure 2(b)): thus,
we can reason that ReLU effectively stifles information from
flowing through the network. Moreover, Figure 2(b) confirms
that the distribution of activations at nodes three and five are
zero and, therefore, have no connections in the NIF model.
(a) (b)
Figure 3: One layer perceptron for the Banknote dataset with
ReLU activation and trained with ADAM (a) NIF network
(b) Activation distribution of the original model
We perform similar analysis for the Banknote dataset and
report results in Figure 3. We see a strong information propa-
gation from feature one to hidden layer node five, so much so
that both nodes belong to their own community. Leveraging
the activation distribution in Figure 3(b) confirms the equal
importance of all central nodes to the model’s prediction.
Two Layered Network
To show the initial ability of NIF
to generalize larger networks, we train a two layered network
with ReLU activation on the Banknote dataset. Shown in
Figure 4, we find that two nodes per layer are zero which
means there are information pathways that are inherently
stifled due to use of ReLU activation.
Figure 4: NIF network for a two layer MLP for the Banknote
dataset with ReLU activation and trained with ADAM
Accuracy Recovery
It is worthwhile to note that all of the
models described above received upwards of 96% accuracy
on a held-out test set. The NIF model shown in Figure 2
and in Figure 4 found that ReLU can zero out activations
at certain neurons in the hidden layer of a network while
still passing enough information through the rest of the neu-
rons to maintain predictive accuracy. We ran another set of
experiments wherein we zero out the weights and biases of
the original model for zero activation neurons in the original
model. We find that using NIF to identify useless weights
(and then acting upon that learning to zero out the correspond-
ing elements in the weight matrix). To our surprise, we did
find a drop in accuracy when we zeroed out the weights
in the matrix. This will have massive implications as we scale
to larger networks.
Feature Attribution
NIF naturally recovers a feature at-
tribution which we calculate in the following manner. We
find all the possible paths between a feature of interest
and any of the outputs
y1, . . . , yc
. To find out the value of a
path, we take the product of all NIF calculations along the
path. We then sum over all of the possible values to find
our desired feature attribution for feature
and class
Mathematically, the element
of our attribution matrix
A ∈ Rn×c
is the number of features and
is the
number of classes) can be given as:
Aij =
N IF p(l)
is the set of all directed paths from input
to class
in the neural information flow network, and
is the set of
links on each path pP.
We compare NIF to current feature attribution techniques
SHAP (Lundberg and Lee 2017) and Integrated Gradients
(Sundararajan, Taly, and Yan 2017) in Table 1. Using the two
sample Kolmogorov-Smirnov test for goodness of fit between
two empirical distributions (in this case, the raw mutual in-
formation attribution between the input and output classes
and the attribution in question), we find that NIF surpasses
current benchmarks, which means NIF is likely drawn from
the same distribution as the raw mutual information between
the input and output classes. This leads us to believe that
information theoretic feature attribution is viable.
NIF 1.0 0.011
SHAP 0.75 0.107
Table 1: Feature attribution comparison
Conclusion and Future Work
We have proposed NIF, Neural Information Flow, a new met-
ric for measuring information flow through deep learning
models. Merging a dual representation of Kullback-Leilber
divergence and classical feature selection literature, we find
that NIF not only provides insight into which information
pathways are crucial within a network but also allows us to
leverage fewer parameters at inference time, since we can
remove parameters deemed useless by the NIF without loss
of accuracy. Finally, we have shown how NIF recovers an
information theoretic feature attribution that aligns with ex-
isting benchmarks. In our future work, we plan to apply NIF
to larger architectures.
asi, A.-L., and Bonabeau, E. 2003. Scale-free networks.
Scientific American 288(60-69).
Belghazi, M. I.; Baratin, A.; Rajeshwar, S.; Ozair, S.; Bengio, Y.;
Courville, A.; and Hjelm, D. 2018. Mutual information neural
estimation. In Proc. ICML, volume 80, 531–540. PMLR.
Bollacker, K. D., and Ghosh, J. 1996. Linear feature extractors
based on mutual information. In Proc. ICPR, 720–724 vol.2.
Chen, J.; Song, L.; Wainwright, M. J.; and Jordan, M. I. 2018.
Learning to explain: An information-theoretic perspective on model
interpretation. ICML 2018.
Dheeru, D., and Karra Taniskidou, E. 2017. UCI machine learning
Lundberg, S. M., and Lee, S.-I. 2017. A unified approach to
interpreting model predictions. In Advances in Neural Information
Processing Systems 30. 4765–4774.
Newman, M. E. 2006. Modularity and community structure
in networks. Proceedings of the national academy of sciences
Ribeiro, M. T.; Singh, S.; and Guestrin, C. 2016. Why should I trust
you?: Explaining the predictions of any classifier. In Proc. KDD,
Shwartz-Ziv, R., and Tishby, N. 2017. Opening the black box of
deep neural networks via information. CoRR.
Sundararajan, M.; Taly, A.; and Yan, Q. 2017. Axiomatic attribution
for deep networks. In Proc. ICML, volume 70, 3319–3328. PMLR.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example. This feature selector is trained to maximize the mutual information between selected features and the response variable, where the conditional distribution of the response variable given the input is the model to be explained. We develop an efficient variational approximation to the mutual information, and show that the resulting method compares favorably to other model explanation methods on a variety of synthetic and real data sets using both quantitative metrics and human evaluation.
Conference Paper
Full-text available
Understanding why a model makes a certain prediction can be as crucial as the prediction's accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we present a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent methods in the class lack the proposed desirable properties. Based on insights from this unification, we present new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.
Full-text available
Despite their great success, there is still no com- prehensive theoretical understanding of learning with Deep Neural Networks (DNNs) or their in- ner organization. Previous work [Tishby & Zaslavsky (2015)] proposed to analyze DNNs in the Information Plane; i.e., the plane of the Mutual Information values that each layer preserves on the input and output variables. They suggested that the goal of the network is to optimize the In- formation Bottleneck (IB) tradeoff between com- pression and prediction, successively, for each layer. In this work we follow up on this idea and demonstrate the effectiveness of the Information- Plane visualization of DNNs. We first show that the stochastic gradient descent (SGD) epochs have two distinct phases: fast empirical error minimization followed by slow representation compression, for each layer. We then argue that the DNN layers end up very close to the IB theo- retical bound, and present a new theoretical argu- ment for the computational benefit of the hidden layers.
Conference Paper
Full-text available
This paper presents and evaluates two linear feature extractors based on mutual information. These feature extractors consider general dependencies between features and class labels, as opposed to well known linear methods such as PCA which does not consider class labels and LDA, which uses only simple low order dependencies. As evidenced by several simulations on high dimensional data sets, the proposed techniques provide superior feature extraction and better dimensionality reduction while having similar computational requirements
Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust in a model. Trust is fundamental if one plans to take action based on a prediction, or when choosing whether or not to deploy a new model. Such understanding further provides insights into the model, which can be used to turn an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction. We further propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). The usefulness of explanations is shown via novel experiments, both simulated and with human subjects. Our explanations empower users in various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and detecting why a classifier should not be trusted.
We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a deep network, and to enable users to engage with models better.
Mutual information neural estimation
  • M I Belghazi
  • A Baratin
  • S Rajeshwar
  • S Ozair
  • Y Bengio
  • A Courville
  • D Hjelm
Belghazi, M. I.; Baratin, A.; Rajeshwar, S.; Ozair, S.; Bengio, Y.; Courville, A.; and Hjelm, D. 2018. Mutual information neural estimation. In Proc. ICML, volume 80, 531-540. PMLR.
UCI machine learning repository
  • D Dheeru
  • Karra Taniskidou
Dheeru, D., and Karra Taniskidou, E. 2017. UCI machine learning repository.
Modularity and community structure in networks
  • M E Newman
Newman, M. E. 2006. Modularity and community structure in networks. Proceedings of the national academy of sciences 103(23):8577-8582.