Content uploaded by Mattia Bergomi
Author content
All content in this area was uploaded by Mattia Bergomi on Aug 28, 2021
Content may be subject to copyright.
Comparing Neural Networks via Generalized
Persistence
Persistenza generalizzata per la comparazione di reti
neurali
Mattia G. Bergomi, Pietro Vertechi
Abstract Artificial neural networks are often used as black boxes to solve super-
vised tasks. At each layer, the network updates its representation of the dataset in or-
der to minimize a given error function, which depends on the correct assignment of
predetermined labels to each observed data point. On the other end of the spectrum,
topological persistence is commonly used to compare hand-crafted low-dimensional
data representations. Here, we provide an application of rank-based persistence,a
generalized persistence framework that allows us to characterize the data represen-
tation generated by each layer of an artificial neural network, and compare different
neural architectures.
Abstract Le reti neurali artificiali sono spesso usate come scatole nere per risol-
vere task supervisionati. Durante il training la rete aggiorna la rappresentazione
del dataset ad ogni layer, in modo da minimizzare una certa funzione errore che
dipende dall’assegnamento di etichette predeterminate ad ogni sample analizzato.
In maniera antipodale, la persistenza omologica `
e usata per comparare rapida-
mente dati, in base a caratteristiche specificate dall’utente. Proponiamo di utilizzare
rank-based persistence, una generalizzazione della teoria della persistenza topolog-
ica, per caratterizzare la rappresentazione dei dati ottenuta ad ogni livello di una
rete neurale e comparare architetture neurali differenti.
Key words: Generalized persistence, rank-based persistence, interpretable neural
networks.
Mattia G. Bergomi
Veos Digital, Via Gustavo Fara 20, 20124 Milano, e-mail: mattia.bergomi@veos.digital
Pietro Vertechi
Champalimaud Research, Av. Braslia, 1400-038 Lisboa, e-mail: pietro.vertechi@neuro.
fchampalimaud.org
1
582
2 Mattia G. Bergomi, Pietro Vertechi
1 Introduction
Topological persistence allows for swift quantitative comparison of topological
spaces [1]. However, often data are not organized as, or easily mappable to such
spaces. In [2], we generalize persistence to work with a broader set of categories
and functors than topological spaces and homology.
Here, we exemplify one of the use cases discussed in [2], where persistent ho-
mology is used to characterize labeled point clouds, by working on the category
of metric spaces and the poset induced by the inclusion on non-empty subsets of
the label set. We focus on point clouds generated by each layer of an artificial neu-
ral network when solving a supervised classification task. We use the generalized
persistence framework for both the evaluation of the layer-wise representation of
different subsets of labels and to compare neural architectures.
In section 2, we give an intuition about the classical and generalized persistence
frameworks. Section 3 shows how data represented at each layer of a feed-forward
neural network can be summarized via multicolored persistence diagrams. After-
wards, two neural architectures are compared by computing the multicolored bottle-
neck distance between their diagrams.
2 From classical to generalized persistence
Classical persistent homology is based on three main ingredients: 1. A filtered topo-
logical space; 2. the homology functor Hkmapping topological spaces to finite vec-
tor spaces; 3. a notion of rank, such as the dimension of the vector space, or the
cardinality in the category of sets [3]. See fig. 1 for an example and an intuitive
introduction to persistent homology, and [4] for details.
In [2], we introduce a framework that extends classical persistence to new cate-
gories, functors and rank-like functions. We will not discuss the technical details of
rank-based persistence here. See [2, Table 1] for an intuitive summary.
The main theoretical tool used in the following applications is what we called
multicolored persistence. Multicolored persistence allows one to use persistence in
semisimple categories (which can have more than one non-isomorphic indecompos-
able object), by preserving the fundamental properties of persistent homology: flex-
ibility (dependence on the filtering function), stability [5], and resistance to occlu-
sions [6]. In [2, Section 4] we discuss the stability conditions of such construction;
we define multicolored persistence diagrams (MPD), and adapt the bottleneck dis-
tance to the semisimple case. These results allow us to use the classical Vietoris-Rips
construction to study the interactions at the homological level of cycles generated
by labeled points in a metric space.
Vietoris-Rips filtration with labeled data. Let us consider a metric space X,a
finite set of points d⇢Xand a labeling function l:X!L={l1,...,ln}associating
each point with a label in the finite set L. Let {Xi=l1(li)}i2{1,...,n}be the family
of subdatasets corresponding to each label.
583
Comparing Neural Networks via Generalized Persistence 3
104 CHAPTER 7. TOPOLOGICAL PERSISTENCE
Figure 7.9: A matching between two
k
-persistence diagrams. The bijections b etween
elements of the diagrams is denoted using left-right arrows.
Theorem 7.2.1.
Let
X
be a triangulable topological space and
f,g
:
XR
two
tame functions. For every integer k, the inequality
dB(Dk(f),D
k(g)) fg,
where fg=sup
x|f(x)g(x)|,holds.
7.2.3 An algorithm for computing persistence
Persistence is computed through an algorithm mirroring the one we described
in Algorithm 7.1. Let
K
be a triangulation of
X
, and
˜
f
:
KX
a monotone
function such that
˜
f()˜
f()
if
is a face of
. Consider an ordering of the
simplices of
K
, such that each simplex is preceded by its faces and
˜
f
is non-decreasing.
This ordering allows to store the simplicial complex in a boundary matrix
B
,
whose entries are defined as
B(i,j )=1if i<j
0otherwise .(7.2.1)
The algorithm receives in input a boundary matrix
B
and reduces it to a new
0
1matrix
R
via elementary column operations. Let
J
=
{1,...,n}
be the indices
of the columns of Band
lowR:JN
j l,
where
l
is the lower row index of the last 1entry of the
j
th column. If a column has
only 0entries
lowR(j)
is undefined. A matrix
R
is reduced if for every couple of
7.2. FROM HOMOLOGY TO PERSISTENT HOMOLOGY 103
f
c
1
c
2
c3
c4
c5
c6
Figure 7.8: An example of persistence barcode and persistence diagram. Noisy
classes are represented as short bars in the barcodes and as points near the diagonal
in the diagram representation. The critical points of the height function are denoted
by red circles. According to their labels, the pairing is given by (
c1,
),(
c2,c
4
),
(c3,c
5)and (c6,).
filtration induced by the sub-level sets of a tame functions
f
. Moreover, the lifespan
of the homology classes represented by a cornerpoint corresponds to its distance
from the diagonal. Thus, noisy and persistent homological classes are represented
by cornerpoints lying near to or far from the diagonal, respectively.
Bottleneck distance
Persistence diagrams are simpler than the shape they represents and describe its
topological and geometrical properties, as they are highlighted by the homological
critical values of the function used to build the filtration. The bottleneck distance
allows to compare such diagrams.
Definition 7.2.7.
Let
X
be a triangulable topological space and
f,g
:
XR
two
tame functions. The bottleneck distance between Dk(f)and Dk(g)is
dB(Dk(f),D
k(g)) = inf
sup
pDk(f)p(p),
where :Dk(f)Dk(g)is a bijection and p(p)= maxpDk(f)|p(p)|.
In Figure 7.9 a bijection between two
k
-persistence diagrams is depicted. Corner
points belonging to the two diagrams are depicted in orange and yellow, respectively.
Observe how the inclusions of the points of allows the comparison of multisets of
points whose underlying set has different cardinality (see Section 3.1 for a definition
of multiset) by associating one of the purple points to one of the points lying on the
diagonal.
An important property of persistence diagrams is their stability. A small pertur-
bation of the tame function
f
produces small variations in the persistence diagram
with respect to the bottleneck distance.
f
7.2. FROM HOMOLOGY TO PERSISTENT HOMOLOGY 103
f
c
1
c
2
c3
c4
c5
c6
Figure 7.8: An example of persistence barcode and persistence diagram. Noisy
classes are represented as short bars in the barcodes and as points near the diagonal
in the diagram representation. The critical points of the height function are denoted
by red circles. According to their labels, the pairing is given by (
c1,
),(
c2,c
4
),
(c3,c
5)and (c6,).
filtration induced by the sub-level sets of a tame functions
f
. Moreover, the lifespan
of the homology classes represented by a cornerpoint corresponds to its distance
from the diagonal. Thus, noisy and persistent homological classes are represented
by cornerpoints lying near to or far from the diagonal, respectively.
Bottleneck distance
Persistence diagrams are simpler than the shape they represents and describe its
topological and geometrical properties, as they are highlighted by the homological
critical values of the function used to build the filtration. The bottleneck distance
allows to compare such diagrams.
Definition 7.2.7.
Let
X
be a triangulable topological space and
f,g
:
XR
two
tame functions. The bottleneck distance between Dk(f)and Dk(g)is
dB(Dk(f),D
k(g)) = inf
sup
pDk(f)p(p),
where :Dk(f)Dk(g)is a bijection and p(p)= maxpDk(f)|p(p)|.
In Figure 7.9 a bijection between two
k
-persistence diagrams is depicted. Corner
points belonging to the two diagrams are depicted in orange and yellow, respectively.
Observe how the inclusions of the points of allows the comparison of multisets of
points whose underlying set has different cardinality (see Section 3.1 for a definition
of multiset) by associating one of the purple points to one of the points lying on the
diagonal.
An important property of persistence diagrams is their stability. A small pertur-
bation of the tame function
f
produces small variations in the persistence diagram
with respect to the bottleneck distance.
a) Filtration b) Persistence diagram c) Matching
Fig. 1 Let (X,f)be a a topological space Xand a continuous function f:X!R, respectively.
The (homological) critical values {c1,...,c6}of finduce a sub-level set filtration of X, depicted
in panel a. The change in number of generators of the kth homology groups is represented as a
persistence diagram (panel b). Birth and death of homological classes are represented as points and
half-lines, named cornerpoints and cornerlines, color coded depending on their associated degree:
connected components in green, and the void obtained at the last sublevel set as the blue half-
line. Two persistence diagrams can be compared by computing the optimal cornerpoints matching
(panel c). Unmatched points are associated with their projection on the diagonal.
In the classical framework, it would be straightforward to build a filtration of
Xthrough the Vietoris-Rips construction and compute its persistent homology.
However, this procedure cannot address the problem of quantifying how points be-
longing to different subdatasets contribute to the evolution of homological classes
throughout the filtration. To do this we consider the poset (P
n,✓)of non-empty
subsets of {1,...,n}ordered by inclusion, and the functor F:P
n!Met mapping
{1,...,k}7! ti2{1,...,k}Xi, where Met is the category of metric spaces and kn.
The Vietoris-Rips construction allows us to build a (P
n,✓)-indexed diagram (see [2,
Remark 3.8]) and consequently a multicolored persistent diagram, i.e. a persistence
diagram in which the information concerning the subset of labels contributing to a
homological class is retained and color coded.
Implementation. Let C={c1,...,ck}be a neural network classifier composed of
klayers, and (d,ld)and (t,lt)the training and test labeled datasets. We compute
the Vietoris-Rips multicolored persistence as follows: 1. We train Con (d,ld), by
using the cross-entropy loss function and stochastic gradient descent. 2. We obtain
the space X=tiXiby considering the activation of the j-th layer on the samples
belonging to the test dataset (250 randomly chosen samples per label). 3. We filter X
with the classical Vietoris-Rips construction, labeling each simplex according to the
labels associated to its vertices. 4. We sort the simplices of the filtration according
first to their sublevel set, and then by considering the mode of their associated labels.
5. Finally, we compute the persistent homology of the sorted filtration and retrieve
the labeling information associated to each homology class.
584
4 Mattia G. Bergomi, Pietro Vertechi
FinVec(P,)
b) Label poset
c) Multicolored persistence diagram for labels 0, 8
d) Multicolored persistence diagram for labels 0, 2
e) Multicolored persistence diagram for labels 2, 7
a) Classifier
Images Vectorised
images
Layer 0 !
100 units
Layer 1!
10 units
Classification accuracy:
97.75%
Classification accuracy:
97.33%
Classification accuracy:
94.73%
Fig. 2 Multicolored persistence. We encode the data representation obtained considering the ac-
tivity pf a layer of an artificial neural network (panel a), by considering the multicolored persistence
diagrams induced by the labeling defined on the dataset (panel b). In panels c,dand e, we show
the multicolored persistence diagrams computed by considering the activation of the last layer of
the trained network on the corresponding MNIST test samples. Cornerpoints originated by the
interaction of simplices associated with multiple labels as highlighted (red circles).
3 Applications
In the following applications we use a shallow feed-forward neural network com-
posed only by fully-connected layers, to classify the images of handwritten digits
of MNIST [7]. The neural network is composed of two layers of 100 and 10 units,
where the first layer is followed by a ReLU [8] nonlinear function.
Layer-wise embedding evaluation. As a first application we want to show how
homological classes arising from the interaction of different labels carry information
concerning the separability of those labels at a given layer. We consider samples
belonging to pairs of labels in the MNIST dataset, train the neural network described
above for 20 epochs, and evaluate it on the test set. We then consider the label poset
of fig. 2, panel (b) and compute the multicolored persistence diagram by following
the algorithm described in section 2.
The 1-multicolored persistence diagrams corresponding to the pairs of labels
(0,8),(0,2)and (2,7)obtained considering the activity of the last layer of the net-
work on the samples belonging to the test dataset are shown in panels (c, d, e)
of fig. 2. An artificial neural network applies nonlinear transformations in order to
separate samples belonging to different classes. Indeed, the cornerpoints associated
with cycles generated by samples belonging to a single label appear overlapping
and with low persistence. Observe how, although present in all the examples we
585
Comparing Neural Networks via Generalized Persistence 5
a) Multicolored persistence diagram for labels 0, 1, 6.
Layer 0 embedding
b) Multicolored persistence diagram for labels 0, 1, 6.
Layer 1 embedding
c) UMAP for labels 0, 1, 6. Layer 0 embedding d) UMAP for labels 0, 1, 6. Layer 1 embedding
Fig. 3 Multicolored persistence diagrams obtained by considering the point clouds generated by
the activation of the first (panel a) and last (panel b) layers of the neural network described in fig. 2,
after training to distinguish the MNIST samples labeled with 0,1 and 6. Panels c and d are obtained
by reducing the dimensionality of the same point clouds.
showcase, it modulates depending on the considered pair of labels, reflecting the
validation accuracy of the classifier, it being 97.75%,94.73%,97.33% for each of
the considered pair of labels. However, cornerpoints associated with multiple classes
are born later along the filtration (they are larger holes) and have larger persistence,
again correlating with the classifier’s score.
The same analysis for samples with labels (0,1,6)are reported in fig. 3. There,
however, we considered the point cloud generated by the first and second (last) layer
of the network. The corresponding multicolored persistence diagrams are shown in
panels (a) and (b) of fig. 3. As a comparison, we reduced the dimensionality of the
point clouds by using UMAP [9].
A distance for neural architectures. One of the main advantages in using sta-
ble data representations such as multicolored persistence diagrams is that they can
be compared through the multicolored bottleneck distance. This distance is essen-
tially a color-wise version of the classical bottleneck distance used in persistent
homology [10], i.e. it only admits matching between cornerpoints that respect their
coloring (labeling).
As a proof of concept we compared the neural architecture and a second one,
identical in its structure, but with 10 units in the first layer. The classification ac-
curacy of the two models after 20 training epochs are 98.6% and 99%, respec-
tively. Although the difference in accuracy is low, the multicolored bottleneck
distance between the 1-MDPs computed by considering the activity of the last
layer of the two trained architectures can be used to discriminate between them.
586
6 Mattia G. Bergomi, Pietro Vertechi
The values of the multicolored bottleneck distance (label-subset !distance) are:
{0}!0.6,{1}!0.2,{6}!0.3,{1,6}!0.5,{0,6}!1.Coherently with our
previous applications, cycles produced by multiple labels or associated with labels
that are easily misclassified (0 and 6) have higher distances.
4 Discussion
After providing a short introduction to persistence homology and its rank-based
generalization, we showed how this latter technique can be used to represent and
compare in a robust and stable fashion the transformations learned by artificial neu-
ral networks in supervised classification tasks.
This work is intended as an exemplification of the theoretical framework de-
scribed in [2], and in particular of the generalization of persistence to semisimple
categories. In a forthcoming work, we plan to apply this technique to the evaluation
and selection of more complex architectures (e.g. convolutional neural networks)
and biological neural networks.
References
1. M. Ferri, Persistent topology for natural data analysis - A survey, arXiv:1706.00411
[math]ArXiv: 1706.00411.
URL http://arxiv.org/abs/1706.00411
2. M. G. Bergomi, P. Vertechi, Rank-based persistence, Theory and applications of categories 35
(2020) 34.
3. M. G. Bergomi, M. Ferri, P. Vertechi, L. Zuffi, Beyond topological persistence: Starting from
networks, arXiv preprint arXiv:1901.08051.
4. H. Edelsbrunner, D. Letscher, A. Zomorodian, Topological persistence and simplification, in:
Proceedings 41st Annual Symposium on Foundations of Computer Science, 2000, pp. 454–
463. doi:10.1109/SFCS.2000.892133.
5. D. Cohen-Steiner, H. Edelsbrunner, J. Harer, Stability of Persistence Diagrams,
Discrete & Computational Geometry 37 (1) (2007) 103–120. doi:10.1007/
s00454-006- 1276-5.
URL https://doi.org/10.1007/s00454-006- 1276-5
6. B. Di Fabio, C. Landi, A mayer–vietoris formula for persistent homology with an application
to shape recognition in the presence of occlusions, Foundations of Computational Mathemat-
ics 11 (5) (2011) 499.
7. L. Deng, The mnist database of handwritten digit images for machine learning research [best
of the web], IEEE Signal Processing Magazine 29 (6) (2012) 141–142.
8. R. H. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, H. S. Seung, Digital selec-
tion and analogue amplification coexist in a cortex-inspired silicon circuit, Nature 405 (6789)
(2000) 947–951.
9. L. McInnes, J. Healy, J. Melville, Umap: Uniform manifold approximation and projection for
dimension reduction, arXiv preprint arXiv:1802.03426.
10. M. d’Amico, P. Frosini, C. Landi, Using matching distance in size theory: A survey, Interna-
tional Journal of Imaging Systems and Technology 16 (5) (2006) 154–161.
587