Conference PaperPDF Available

Comparing Neural Networks via Generalized Persistence

Authors:

Abstract and Figures

Artificial neural networks are often used as black boxes to solve supervised tasks. At each layer, the network updates its representation of the dataset in order to minimize a given error function, which depends on the correct assignment of predetermined labels to each observed data point. On the other end of the spectrum, topological persistence is commonly used to compare hand-crafted low-dimensional data representations. Here, we provide an application of rank-based persistence, a generalized persistence framework that allows us to characterize the data representation generated by each layer of an artificial neural network, and compare different neural architectures.
Content may be subject to copyright.
Comparing Neural Networks via Generalized
Persistence
Persistenza generalizzata per la comparazione di reti
neurali
Mattia G. Bergomi, Pietro Vertechi
Abstract Artificial neural networks are often used as black boxes to solve super-
vised tasks. At each layer, the network updates its representation of the dataset in or-
der to minimize a given error function, which depends on the correct assignment of
predetermined labels to each observed data point. On the other end of the spectrum,
topological persistence is commonly used to compare hand-crafted low-dimensional
data representations. Here, we provide an application of rank-based persistence,a
generalized persistence framework that allows us to characterize the data represen-
tation generated by each layer of an artificial neural network, and compare different
neural architectures.
Abstract Le reti neurali artificiali sono spesso usate come scatole nere per risol-
vere task supervisionati. Durante il training la rete aggiorna la rappresentazione
del dataset ad ogni layer, in modo da minimizzare una certa funzione errore che
dipende dall’assegnamento di etichette predeterminate ad ogni sample analizzato.
In maniera antipodale, la persistenza omologica `
e usata per comparare rapida-
mente dati, in base a caratteristiche specificate dall’utente. Proponiamo di utilizzare
rank-based persistence, una generalizzazione della teoria della persistenza topolog-
ica, per caratterizzare la rappresentazione dei dati ottenuta ad ogni livello di una
rete neurale e comparare architetture neurali differenti.
Key words: Generalized persistence, rank-based persistence, interpretable neural
networks.
Mattia G. Bergomi
Veos Digital, Via Gustavo Fara 20, 20124 Milano, e-mail: mattia.bergomi@veos.digital
Pietro Vertechi
Champalimaud Research, Av. Braslia, 1400-038 Lisboa, e-mail: pietro.vertechi@neuro.
fchampalimaud.org
1
582
2 Mattia G. Bergomi, Pietro Vertechi
1 Introduction
Topological persistence allows for swift quantitative comparison of topological
spaces [1]. However, often data are not organized as, or easily mappable to such
spaces. In [2], we generalize persistence to work with a broader set of categories
and functors than topological spaces and homology.
Here, we exemplify one of the use cases discussed in [2], where persistent ho-
mology is used to characterize labeled point clouds, by working on the category
of metric spaces and the poset induced by the inclusion on non-empty subsets of
the label set. We focus on point clouds generated by each layer of an artificial neu-
ral network when solving a supervised classification task. We use the generalized
persistence framework for both the evaluation of the layer-wise representation of
different subsets of labels and to compare neural architectures.
In section 2, we give an intuition about the classical and generalized persistence
frameworks. Section 3 shows how data represented at each layer of a feed-forward
neural network can be summarized via multicolored persistence diagrams. After-
wards, two neural architectures are compared by computing the multicolored bottle-
neck distance between their diagrams.
2 From classical to generalized persistence
Classical persistent homology is based on three main ingredients: 1. A filtered topo-
logical space; 2. the homology functor Hkmapping topological spaces to finite vec-
tor spaces; 3. a notion of rank, such as the dimension of the vector space, or the
cardinality in the category of sets [3]. See fig. 1 for an example and an intuitive
introduction to persistent homology, and [4] for details.
In [2], we introduce a framework that extends classical persistence to new cate-
gories, functors and rank-like functions. We will not discuss the technical details of
rank-based persistence here. See [2, Table 1] for an intuitive summary.
The main theoretical tool used in the following applications is what we called
multicolored persistence. Multicolored persistence allows one to use persistence in
semisimple categories (which can have more than one non-isomorphic indecompos-
able object), by preserving the fundamental properties of persistent homology: flex-
ibility (dependence on the filtering function), stability [5], and resistance to occlu-
sions [6]. In [2, Section 4] we discuss the stability conditions of such construction;
we define multicolored persistence diagrams (MPD), and adapt the bottleneck dis-
tance to the semisimple case. These results allow us to use the classical Vietoris-Rips
construction to study the interactions at the homological level of cycles generated
by labeled points in a metric space.
Vietoris-Rips filtration with labeled data. Let us consider a metric space X,a
finite set of points dXand a labeling function l:X!L={l1,...,ln}associating
each point with a label in the finite set L. Let {Xi=l1(li)}i2{1,...,n}be the family
of subdatasets corresponding to each label.
583
Comparing Neural Networks via Generalized Persistence 3
104 CHAPTER 7. TOPOLOGICAL PERSISTENCE
Figure 7.9: A matching between two
k
-persistence diagrams. The bijections b etween
elements of the diagrams is denoted using left-right arrows.
Theorem 7.2.1.
Let
X
be a triangulable topological space and
f,g
:
XR
two
tame functions. For every integer k, the inequality
dB(Dk(f),D
k(g)) fg,
where fg=sup
x|f(x)g(x)|,holds.
7.2.3 An algorithm for computing persistence
Persistence is computed through an algorithm mirroring the one we described
in Algorithm 7.1. Let
K
be a triangulation of
X
, and
˜
f
:
KX
a monotone
function such that
˜
f()˜
f()
if
is a face of
. Consider an ordering of the
simplices of
K
, such that each simplex is preceded by its faces and
˜
f
is non-decreasing.
This ordering allows to store the simplicial complex in a boundary matrix
B
,
whose entries are defined as
B(i,j )=1if i<j
0otherwise .(7.2.1)
The algorithm receives in input a boundary matrix
B
and reduces it to a new
0
1matrix
R
via elementary column operations. Let
J
=
{1,...,n}
be the indices
of the columns of Band
lowR:JN
j l,
where
l
is the lower row index of the last 1entry of the
j
th column. If a column has
only 0entries
lowR(j)
is undefined. A matrix
R
is reduced if for every couple of
7.2. FROM HOMOLOGY TO PERSISTENT HOMOLOGY 103
f
c
1
c
2
c3
c4
c5
c6
Figure 7.8: An example of persistence barcode and persistence diagram. Noisy
classes are represented as short bars in the barcodes and as points near the diagonal
in the diagram representation. The critical points of the height function are denoted
by red circles. According to their labels, the pairing is given by (
c1,
),(
c2,c
4
),
(c3,c
5)and (c6,).
filtration induced by the sub-level sets of a tame functions
f
. Moreover, the lifespan
of the homology classes represented by a cornerpoint corresponds to its distance
from the diagonal. Thus, noisy and persistent homological classes are represented
by cornerpoints lying near to or far from the diagonal, respectively.
Bottleneck distance
Persistence diagrams are simpler than the shape they represents and describe its
topological and geometrical properties, as they are highlighted by the homological
critical values of the function used to build the filtration. The bottleneck distance
allows to compare such diagrams.
Definition 7.2.7.
Let
X
be a triangulable topological space and
f,g
:
XR
two
tame functions. The bottleneck distance between Dk(f)and Dk(g)is
dB(Dk(f),D
k(g)) = inf
sup
pDk(f)p(p),
where :Dk(f)Dk(g)is a bijection and p(p)= maxpDk(f)|p(p)|.
In Figure 7.9 a bijection between two
k
-persistence diagrams is depicted. Corner
points belonging to the two diagrams are depicted in orange and yellow, respectively.
Observe how the inclusions of the points of allows the comparison of multisets of
points whose underlying set has dierent cardinality (see Section 3.1 for a definition
of multiset) by associating one of the purple points to one of the points lying on the
diagonal.
An important property of persistence diagrams is their stability. A small pertur-
bation of the tame function
f
produces small variations in the persistence diagram
with respect to the bottleneck distance.
f
f
c
1
c
2
c3
c4
c5
c6
a) Filtration b) Persistence diagram c) Matching
Fig. 1 Let (X,f)be a a topological space Xand a continuous function f:X!R, respectively.
The (homological) critical values {c1,...,c6}of finduce a sub-level set filtration of X, depicted
in panel a. The change in number of generators of the kth homology groups is represented as a
persistence diagram (panel b). Birth and death of homological classes are represented as points and
half-lines, named cornerpoints and cornerlines, color coded depending on their associated degree:
connected components in green, and the void obtained at the last sublevel set as the blue half-
line. Two persistence diagrams can be compared by computing the optimal cornerpoints matching
(panel c). Unmatched points are associated with their projection on the diagonal.
In the classical framework, it would be straightforward to build a filtration of
Xthrough the Vietoris-Rips construction and compute its persistent homology.
However, this procedure cannot address the problem of quantifying how points be-
longing to different subdatasets contribute to the evolution of homological classes
throughout the filtration. To do this we consider the poset (P
n,)of non-empty
subsets of {1,...,n}ordered by inclusion, and the functor F:P
n!Met mapping
{1,...,k}7! ti2{1,...,k}Xi, where Met is the category of metric spaces and kn.
The Vietoris-Rips construction allows us to build a (P
n,)-indexed diagram (see [2,
Remark 3.8]) and consequently a multicolored persistent diagram, i.e. a persistence
diagram in which the information concerning the subset of labels contributing to a
homological class is retained and color coded.
Implementation. Let C={c1,...,ck}be a neural network classifier composed of
klayers, and (d,ld)and (t,lt)the training and test labeled datasets. We compute
the Vietoris-Rips multicolored persistence as follows: 1. We train Con (d,ld), by
using the cross-entropy loss function and stochastic gradient descent. 2. We obtain
the space X=tiXiby considering the activation of the j-th layer on the samples
belonging to the test dataset (250 randomly chosen samples per label). 3. We filter X
with the classical Vietoris-Rips construction, labeling each simplex according to the
labels associated to its vertices. 4. We sort the simplices of the filtration according
first to their sublevel set, and then by considering the mode of their associated labels.
5. Finally, we compute the persistent homology of the sorted filtration and retrieve
the labeling information associated to each homology class.
584
4 Mattia G. Bergomi, Pietro Vertechi
FinVec(P,)
b) Label poset
c) Multicolored persistence diagram for labels 0, 8
d) Multicolored persistence diagram for labels 0, 2
e) Multicolored persistence diagram for labels 2, 7
a) Classifier
Images Vectorised
images
Layer 0 !
100 units
Layer 1!
10 units
Classification accuracy:
97.75%
Classification accuracy:
97.33%
Classification accuracy:
94.73%
Fig. 2 Multicolored persistence. We encode the data representation obtained considering the ac-
tivity pf a layer of an artificial neural network (panel a), by considering the multicolored persistence
diagrams induced by the labeling defined on the dataset (panel b). In panels c,dand e, we show
the multicolored persistence diagrams computed by considering the activation of the last layer of
the trained network on the corresponding MNIST test samples. Cornerpoints originated by the
interaction of simplices associated with multiple labels as highlighted (red circles).
3 Applications
In the following applications we use a shallow feed-forward neural network com-
posed only by fully-connected layers, to classify the images of handwritten digits
of MNIST [7]. The neural network is composed of two layers of 100 and 10 units,
where the first layer is followed by a ReLU [8] nonlinear function.
Layer-wise embedding evaluation. As a first application we want to show how
homological classes arising from the interaction of different labels carry information
concerning the separability of those labels at a given layer. We consider samples
belonging to pairs of labels in the MNIST dataset, train the neural network described
above for 20 epochs, and evaluate it on the test set. We then consider the label poset
of fig. 2, panel (b) and compute the multicolored persistence diagram by following
the algorithm described in section 2.
The 1-multicolored persistence diagrams corresponding to the pairs of labels
(0,8),(0,2)and (2,7)obtained considering the activity of the last layer of the net-
work on the samples belonging to the test dataset are shown in panels (c, d, e)
of fig. 2. An artificial neural network applies nonlinear transformations in order to
separate samples belonging to different classes. Indeed, the cornerpoints associated
with cycles generated by samples belonging to a single label appear overlapping
and with low persistence. Observe how, although present in all the examples we
585
Comparing Neural Networks via Generalized Persistence 5
a) Multicolored persistence diagram for labels 0, 1, 6.
Layer 0 embedding
b) Multicolored persistence diagram for labels 0, 1, 6.
Layer 1 embedding
c) UMAP for labels 0, 1, 6. Layer 0 embedding d) UMAP for labels 0, 1, 6. Layer 1 embedding
Fig. 3 Multicolored persistence diagrams obtained by considering the point clouds generated by
the activation of the first (panel a) and last (panel b) layers of the neural network described in fig. 2,
after training to distinguish the MNIST samples labeled with 0,1 and 6. Panels c and d are obtained
by reducing the dimensionality of the same point clouds.
showcase, it modulates depending on the considered pair of labels, reflecting the
validation accuracy of the classifier, it being 97.75%,94.73%,97.33% for each of
the considered pair of labels. However, cornerpoints associated with multiple classes
are born later along the filtration (they are larger holes) and have larger persistence,
again correlating with the classifier’s score.
The same analysis for samples with labels (0,1,6)are reported in fig. 3. There,
however, we considered the point cloud generated by the first and second (last) layer
of the network. The corresponding multicolored persistence diagrams are shown in
panels (a) and (b) of fig. 3. As a comparison, we reduced the dimensionality of the
point clouds by using UMAP [9].
A distance for neural architectures. One of the main advantages in using sta-
ble data representations such as multicolored persistence diagrams is that they can
be compared through the multicolored bottleneck distance. This distance is essen-
tially a color-wise version of the classical bottleneck distance used in persistent
homology [10], i.e. it only admits matching between cornerpoints that respect their
coloring (labeling).
As a proof of concept we compared the neural architecture and a second one,
identical in its structure, but with 10 units in the first layer. The classification ac-
curacy of the two models after 20 training epochs are 98.6% and 99%, respec-
tively. Although the difference in accuracy is low, the multicolored bottleneck
distance between the 1-MDPs computed by considering the activity of the last
layer of the two trained architectures can be used to discriminate between them.
586
6 Mattia G. Bergomi, Pietro Vertechi
The values of the multicolored bottleneck distance (label-subset !distance) are:
{0}!0.6,{1}!0.2,{6}!0.3,{1,6}!0.5,{0,6}!1.Coherently with our
previous applications, cycles produced by multiple labels or associated with labels
that are easily misclassified (0 and 6) have higher distances.
4 Discussion
After providing a short introduction to persistence homology and its rank-based
generalization, we showed how this latter technique can be used to represent and
compare in a robust and stable fashion the transformations learned by artificial neu-
ral networks in supervised classification tasks.
This work is intended as an exemplification of the theoretical framework de-
scribed in [2], and in particular of the generalization of persistence to semisimple
categories. In a forthcoming work, we plan to apply this technique to the evaluation
and selection of more complex architectures (e.g. convolutional neural networks)
and biological neural networks.
References
1. M. Ferri, Persistent topology for natural data analysis - A survey, arXiv:1706.00411
[math]ArXiv: 1706.00411.
URL http://arxiv.org/abs/1706.00411
2. M. G. Bergomi, P. Vertechi, Rank-based persistence, Theory and applications of categories 35
(2020) 34.
3. M. G. Bergomi, M. Ferri, P. Vertechi, L. Zuffi, Beyond topological persistence: Starting from
networks, arXiv preprint arXiv:1901.08051.
4. H. Edelsbrunner, D. Letscher, A. Zomorodian, Topological persistence and simplification, in:
Proceedings 41st Annual Symposium on Foundations of Computer Science, 2000, pp. 454–
463. doi:10.1109/SFCS.2000.892133.
5. D. Cohen-Steiner, H. Edelsbrunner, J. Harer, Stability of Persistence Diagrams,
Discrete & Computational Geometry 37 (1) (2007) 103–120. doi:10.1007/
s00454-006- 1276-5.
URL https://doi.org/10.1007/s00454-006- 1276-5
6. B. Di Fabio, C. Landi, A mayer–vietoris formula for persistent homology with an application
to shape recognition in the presence of occlusions, Foundations of Computational Mathemat-
ics 11 (5) (2011) 499.
7. L. Deng, The mnist database of handwritten digit images for machine learning research [best
of the web], IEEE Signal Processing Magazine 29 (6) (2012) 141–142.
8. R. H. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, H. S. Seung, Digital selec-
tion and analogue amplification coexist in a cortex-inspired silicon circuit, Nature 405 (6789)
(2000) 947–951.
9. L. McInnes, J. Healy, J. Melville, Umap: Uniform manifold approximation and projection for
dimension reduction, arXiv preprint arXiv:1802.03426.
10. M. d’Amico, P. Frosini, C. Landi, Using matching distance in size theory: A survey, Interna-
tional Journal of Imaging Systems and Technology 16 (5) (2006) 154–161.
587
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
The persistence diagram of a real-valued function on a topological space is a multiset of points in the extended plane. We prove that under mild assumptions on the function, the persistence diagram is stable: small changes in the function imply only small changes in the diagram. We apply this result to estimating the homology of sets in a metric space and to comparing and classifying geometric shapes.
Article
Full-text available
In algebraic topology it is well-known that, using the Mayer-Vietoris sequence, the homology of a space $X$ can be studied splitting $X$ into subspaces $A$ and $B$ and computing the homology of $A$, $B$, $A\cap B$. A natural question is to which an extent persistent homology benefits of a similar property. In this paper we show that persistent homology has a Mayer-Vietoris sequence that in general is not exact but only of order two. However, we obtain a Mayer-Vietoris formula involving the ranks of the persistent homology groups of $X$, $A$, $B$ and $A\cap B$ plus three extra terms. This implies that topological features of $A$ and $B$ either survive as topological features of $X$ or are hidden in $A\cap B$. As an application of this result, we show that persistence diagrams are able to recognize an occluded shape by showing a common subset of points.
Article
In this survey we illustrate how the matching distance between reduced size functions can be applied for shape comparison. We assume that each shape can be thought of as a compact connected manifold with a real continuous function defined on it, that is a pair (ℳ,φ : ℳ → ℝ), called size pair. In some sense, the function φ focuses on the properties and the invariance of the problem at hand. In this context, matching two size pairs (ℳ, φ) and (, ψ) means looking for a homeomorphism between ℳ and that minimizes the difference of values taken by φ and ψ on the two manifolds. Measuring the dissimilarity between two shapes amounts to the difficult task of computing the value δ = inff maxP∈ℳ |φ(P) − ψ(f(P))|, where f varies among all the homeomorphisms from ℳ to . From another point of view, shapes can be described by reduced size functions associated with size pairs. The matching distance between reduced size functions allows for a robust to perturbations comparison of shapes. The link between reduced size functions and the dissimilarity measure δ is established by a theorem, stating that the matching distance provides an easily computable lower bound for δ. Throughout this paper we illustrate this approach to shape comparison by means of examples and experiments. © 2007 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 16, 154–161, 2006
Article
In this issue, “Best of the Web” presents the modified National Institute of Standards and Technology (MNIST) resources, consisting of a collection of handwritten digit images used extensively in optical character recognition and machine learning research.
Article
Digital circuits such as the flip-flop use feedback to achieve multistability and nonlinearity to restore signals to logical levels, for example 0 and 1. Analogue feedback circuits are generally designed to operate linearly, so that signals are over a range, and the response is unique. By contrast, the response of cortical circuits to sensory stimulation can be both multistable and graded. We propose that the neocortex combines digital selection of an active set of neurons with analogue response by dynamically varying the positive feedback inherent in its recurrent connections. Strong positive feedback causes differential instabilities that drive the selection of a set of active neurons under the constraints embedded in the synaptic weights. Once selected, the active neurons generate weaker, stable feedback that provides analogue amplification of the input. Here we present our model of cortical processing as an electronic circuit that emulates this hybrid operation, and so is able to perform computations that are similar to stimulus selection, gain modulation and spatiotemporal pattern generation in the neocortex.
Conference Paper
We formalize a notion of topological simplification within the framework of a filtration, which is the history of a growing complex. We classify a topological change that happens during growth as either a feature or noise, depending on its life-time or persistence within the filtration. We give fast algorithms for completing persistence and experimental evidence for their speed and utility
  • M Ferri
M. Ferri, Persistent topology for natural data analysis -A survey, arXiv:1706.00411 [math]ArXiv: 1706.00411. URL http://arxiv.org/abs/1706.00411
Rank-based persistence, Theory and applications of categories
  • M G Bergomi
  • P Vertechi
M. G. Bergomi, P. Vertechi, Rank-based persistence, Theory and applications of categories 35 (2020) 34.
  • M G Bergomi
  • M Ferri
  • P Vertechi
  • L Zuffi
M. G. Bergomi, M. Ferri, P. Vertechi, L. Zuffi, Beyond topological persistence: Starting from networks, arXiv preprint arXiv:1901.08051.
Umap: Uniform manifold approximation and projection for dimension reduction
  • L Mcinnes
  • J Healy
  • J Melville
L. McInnes, J. Healy, J. Melville, Umap: Uniform manifold approximation and projection for dimension reduction, arXiv preprint arXiv:1802.03426.