Content uploaded by Mattia Bergomi

Author content

All content in this area was uploaded by Mattia Bergomi on Aug 28, 2021

Content may be subject to copyright.

Comparing Neural Networks via Generalized

Persistence

Persistenza generalizzata per la comparazione di reti

neurali

Mattia G. Bergomi, Pietro Vertechi

Abstract Artiﬁcial neural networks are often used as black boxes to solve super-

vised tasks. At each layer, the network updates its representation of the dataset in or-

der to minimize a given error function, which depends on the correct assignment of

predetermined labels to each observed data point. On the other end of the spectrum,

topological persistence is commonly used to compare hand-crafted low-dimensional

data representations. Here, we provide an application of rank-based persistence,a

generalized persistence framework that allows us to characterize the data represen-

tation generated by each layer of an artiﬁcial neural network, and compare different

neural architectures.

Abstract Le reti neurali artiﬁciali sono spesso usate come scatole nere per risol-

vere task supervisionati. Durante il training la rete aggiorna la rappresentazione

del dataset ad ogni layer, in modo da minimizzare una certa funzione errore che

dipende dall’assegnamento di etichette predeterminate ad ogni sample analizzato.

In maniera antipodale, la persistenza omologica `

e usata per comparare rapida-

mente dati, in base a caratteristiche speciﬁcate dall’utente. Proponiamo di utilizzare

rank-based persistence, una generalizzazione della teoria della persistenza topolog-

ica, per caratterizzare la rappresentazione dei dati ottenuta ad ogni livello di una

rete neurale e comparare architetture neurali differenti.

Key words: Generalized persistence, rank-based persistence, interpretable neural

networks.

Mattia G. Bergomi

Veos Digital, Via Gustavo Fara 20, 20124 Milano, e-mail: mattia.bergomi@veos.digital

Pietro Vertechi

Champalimaud Research, Av. Braslia, 1400-038 Lisboa, e-mail: pietro.vertechi@neuro.

fchampalimaud.org

1

582

2 Mattia G. Bergomi, Pietro Vertechi

1 Introduction

Topological persistence allows for swift quantitative comparison of topological

spaces [1]. However, often data are not organized as, or easily mappable to such

spaces. In [2], we generalize persistence to work with a broader set of categories

and functors than topological spaces and homology.

Here, we exemplify one of the use cases discussed in [2], where persistent ho-

mology is used to characterize labeled point clouds, by working on the category

of metric spaces and the poset induced by the inclusion on non-empty subsets of

the label set. We focus on point clouds generated by each layer of an artiﬁcial neu-

ral network when solving a supervised classiﬁcation task. We use the generalized

persistence framework for both the evaluation of the layer-wise representation of

different subsets of labels and to compare neural architectures.

In section 2, we give an intuition about the classical and generalized persistence

frameworks. Section 3 shows how data represented at each layer of a feed-forward

neural network can be summarized via multicolored persistence diagrams. After-

wards, two neural architectures are compared by computing the multicolored bottle-

neck distance between their diagrams.

2 From classical to generalized persistence

Classical persistent homology is based on three main ingredients: 1. A ﬁltered topo-

logical space; 2. the homology functor Hkmapping topological spaces to ﬁnite vec-

tor spaces; 3. a notion of rank, such as the dimension of the vector space, or the

cardinality in the category of sets [3]. See ﬁg. 1 for an example and an intuitive

introduction to persistent homology, and [4] for details.

In [2], we introduce a framework that extends classical persistence to new cate-

gories, functors and rank-like functions. We will not discuss the technical details of

rank-based persistence here. See [2, Table 1] for an intuitive summary.

The main theoretical tool used in the following applications is what we called

multicolored persistence. Multicolored persistence allows one to use persistence in

semisimple categories (which can have more than one non-isomorphic indecompos-

able object), by preserving the fundamental properties of persistent homology: ﬂex-

ibility (dependence on the ﬁltering function), stability [5], and resistance to occlu-

sions [6]. In [2, Section 4] we discuss the stability conditions of such construction;

we deﬁne multicolored persistence diagrams (MPD), and adapt the bottleneck dis-

tance to the semisimple case. These results allow us to use the classical Vietoris-Rips

construction to study the interactions at the homological level of cycles generated

by labeled points in a metric space.

Vietoris-Rips ﬁltration with labeled data. Let us consider a metric space X,a

ﬁnite set of points d⇢Xand a labeling function l:X!L={l1,...,ln}associating

each point with a label in the ﬁnite set L. Let {Xi=l1(li)}i2{1,...,n}be the family

of subdatasets corresponding to each label.

583

Comparing Neural Networks via Generalized Persistence 3

104 CHAPTER 7. TOPOLOGICAL PERSISTENCE

Figure 7.9: A matching between two

k

-persistence diagrams. The bijections b etween

elements of the diagrams is denoted using left-right arrows.

Theorem 7.2.1.

Let

X

be a triangulable topological space and

f,g

:

XR

two

tame functions. For every integer k, the inequality

dB(Dk(f),D

k(g)) fg,

where fg=sup

x|f(x)g(x)|,holds.

7.2.3 An algorithm for computing persistence

Persistence is computed through an algorithm mirroring the one we described

in Algorithm 7.1. Let

K

be a triangulation of

X

, and

˜

f

:

KX

a monotone

function such that

˜

f()˜

f()

if

is a face of

. Consider an ordering of the

simplices of

K

, such that each simplex is preceded by its faces and

˜

f

is non-decreasing.

This ordering allows to store the simplicial complex in a boundary matrix

B

,

whose entries are deﬁned as

B(i,j )=1if i<j

0otherwise .(7.2.1)

The algorithm receives in input a boundary matrix

B

and reduces it to a new

0

1matrix

R

via elementary column operations. Let

J

=

{1,...,n}

be the indices

of the columns of Band

lowR:JN

j l,

where

l

is the lower row index of the last 1entry of the

j

th column. If a column has

only 0entries

lowR(j)

is undeﬁned. A matrix

R

is reduced if for every couple of

7.2. FROM HOMOLOGY TO PERSISTENT HOMOLOGY 103

f

c

1

c

2

c3

c4

c5

c6

Figure 7.8: An example of persistence barcode and persistence diagram. Noisy

classes are represented as short bars in the barcodes and as points near the diagonal

in the diagram representation. The critical points of the height function are denoted

by red circles. According to their labels, the pairing is given by (

c1,

),(

c2,c

4

),

(c3,c

5)and (c6,).

ﬁltration induced by the sub-level sets of a tame functions

f

. Moreover, the lifespan

of the homology classes represented by a cornerpoint corresponds to its distance

from the diagonal. Thus, noisy and persistent homological classes are represented

by cornerpoints lying near to or far from the diagonal, respectively.

Bottleneck distance

Persistence diagrams are simpler than the shape they represents and describe its

topological and geometrical properties, as they are highlighted by the homological

critical values of the function used to build the ﬁltration. The bottleneck distance

allows to compare such diagrams.

Deﬁnition 7.2.7.

Let

X

be a triangulable topological space and

f,g

:

XR

two

tame functions. The bottleneck distance between Dk(f)and Dk(g)is

dB(Dk(f),D

k(g)) = inf

sup

pDk(f)p(p),

where :Dk(f)Dk(g)is a bijection and p(p)= maxpDk(f)|p(p)|.

In Figure 7.9 a bijection between two

k

-persistence diagrams is depicted. Corner

points belonging to the two diagrams are depicted in orange and yellow, respectively.

Observe how the inclusions of the points of allows the comparison of multisets of

points whose underlying set has diﬀerent cardinality (see Section 3.1 for a deﬁnition

of multiset) by associating one of the purple points to one of the points lying on the

diagonal.

An important property of persistence diagrams is their stability. A small pertur-

bation of the tame function

f

produces small variations in the persistence diagram

with respect to the bottleneck distance.

f

7.2. FROM HOMOLOGY TO PERSISTENT HOMOLOGY 103

f

c

1

c

2

c3

c4

c5

c6

Figure 7.8: An example of persistence barcode and persistence diagram. Noisy

classes are represented as short bars in the barcodes and as points near the diagonal

in the diagram representation. The critical points of the height function are denoted

by red circles. According to their labels, the pairing is given by (

c1,

),(

c2,c

4

),

(c3,c

5)and (c6,).

ﬁltration induced by the sub-level sets of a tame functions

f

. Moreover, the lifespan

of the homology classes represented by a cornerpoint corresponds to its distance

from the diagonal. Thus, noisy and persistent homological classes are represented

by cornerpoints lying near to or far from the diagonal, respectively.

Bottleneck distance

Persistence diagrams are simpler than the shape they represents and describe its

topological and geometrical properties, as they are highlighted by the homological

critical values of the function used to build the ﬁltration. The bottleneck distance

allows to compare such diagrams.

Deﬁnition 7.2.7.

Let

X

be a triangulable topological space and

f,g

:

XR

two

tame functions. The bottleneck distance between Dk(f)and Dk(g)is

dB(Dk(f),D

k(g)) = inf

sup

pDk(f)p(p),

where :Dk(f)Dk(g)is a bijection and p(p)= maxpDk(f)|p(p)|.

In Figure 7.9 a bijection between two

k

-persistence diagrams is depicted. Corner

points belonging to the two diagrams are depicted in orange and yellow, respectively.

Observe how the inclusions of the points of allows the comparison of multisets of

points whose underlying set has diﬀerent cardinality (see Section 3.1 for a deﬁnition

of multiset) by associating one of the purple points to one of the points lying on the

diagonal.

An important property of persistence diagrams is their stability. A small pertur-

bation of the tame function

f

produces small variations in the persistence diagram

with respect to the bottleneck distance.

a) Filtration b) Persistence diagram c) Matching

Fig. 1 Let (X,f)be a a topological space Xand a continuous function f:X!R, respectively.

The (homological) critical values {c1,...,c6}of finduce a sub-level set ﬁltration of X, depicted

in panel a. The change in number of generators of the kth homology groups is represented as a

persistence diagram (panel b). Birth and death of homological classes are represented as points and

half-lines, named cornerpoints and cornerlines, color coded depending on their associated degree:

connected components in green, and the void obtained at the last sublevel set as the blue half-

line. Two persistence diagrams can be compared by computing the optimal cornerpoints matching

(panel c). Unmatched points are associated with their projection on the diagonal.

In the classical framework, it would be straightforward to build a ﬁltration of

Xthrough the Vietoris-Rips construction and compute its persistent homology.

However, this procedure cannot address the problem of quantifying how points be-

longing to different subdatasets contribute to the evolution of homological classes

throughout the ﬁltration. To do this we consider the poset (P

n,✓)of non-empty

subsets of {1,...,n}ordered by inclusion, and the functor F:P

n!Met mapping

{1,...,k}7! ti2{1,...,k}Xi, where Met is the category of metric spaces and kn.

The Vietoris-Rips construction allows us to build a (P

n,✓)-indexed diagram (see [2,

Remark 3.8]) and consequently a multicolored persistent diagram, i.e. a persistence

diagram in which the information concerning the subset of labels contributing to a

homological class is retained and color coded.

Implementation. Let C={c1,...,ck}be a neural network classiﬁer composed of

klayers, and (d,ld)and (t,lt)the training and test labeled datasets. We compute

the Vietoris-Rips multicolored persistence as follows: 1. We train Con (d,ld), by

using the cross-entropy loss function and stochastic gradient descent. 2. We obtain

the space X=tiXiby considering the activation of the j-th layer on the samples

belonging to the test dataset (250 randomly chosen samples per label). 3. We ﬁlter X

with the classical Vietoris-Rips construction, labeling each simplex according to the

labels associated to its vertices. 4. We sort the simplices of the ﬁltration according

ﬁrst to their sublevel set, and then by considering the mode of their associated labels.

5. Finally, we compute the persistent homology of the sorted ﬁltration and retrieve

the labeling information associated to each homology class.

584

4 Mattia G. Bergomi, Pietro Vertechi

FinVec(P,)

b) Label poset

c) Multicolored persistence diagram for labels 0, 8

d) Multicolored persistence diagram for labels 0, 2

e) Multicolored persistence diagram for labels 2, 7

a) Classiﬁer

Images Vectorised

images

Layer 0 !

100 units

Layer 1!

10 units

Classiﬁcation accuracy:

97.75%

Classiﬁcation accuracy:

97.33%

Classiﬁcation accuracy:

94.73%

Fig. 2 Multicolored persistence. We encode the data representation obtained considering the ac-

tivity pf a layer of an artiﬁcial neural network (panel a), by considering the multicolored persistence

diagrams induced by the labeling deﬁned on the dataset (panel b). In panels c,dand e, we show

the multicolored persistence diagrams computed by considering the activation of the last layer of

the trained network on the corresponding MNIST test samples. Cornerpoints originated by the

interaction of simplices associated with multiple labels as highlighted (red circles).

3 Applications

In the following applications we use a shallow feed-forward neural network com-

posed only by fully-connected layers, to classify the images of handwritten digits

of MNIST [7]. The neural network is composed of two layers of 100 and 10 units,

where the ﬁrst layer is followed by a ReLU [8] nonlinear function.

Layer-wise embedding evaluation. As a ﬁrst application we want to show how

homological classes arising from the interaction of different labels carry information

concerning the separability of those labels at a given layer. We consider samples

belonging to pairs of labels in the MNIST dataset, train the neural network described

above for 20 epochs, and evaluate it on the test set. We then consider the label poset

of ﬁg. 2, panel (b) and compute the multicolored persistence diagram by following

the algorithm described in section 2.

The 1-multicolored persistence diagrams corresponding to the pairs of labels

(0,8),(0,2)and (2,7)obtained considering the activity of the last layer of the net-

work on the samples belonging to the test dataset are shown in panels (c, d, e)

of ﬁg. 2. An artiﬁcial neural network applies nonlinear transformations in order to

separate samples belonging to different classes. Indeed, the cornerpoints associated

with cycles generated by samples belonging to a single label appear overlapping

and with low persistence. Observe how, although present in all the examples we

585

Comparing Neural Networks via Generalized Persistence 5

a) Multicolored persistence diagram for labels 0, 1, 6.

Layer 0 embedding

b) Multicolored persistence diagram for labels 0, 1, 6.

Layer 1 embedding

c) UMAP for labels 0, 1, 6. Layer 0 embedding d) UMAP for labels 0, 1, 6. Layer 1 embedding

Fig. 3 Multicolored persistence diagrams obtained by considering the point clouds generated by

the activation of the ﬁrst (panel a) and last (panel b) layers of the neural network described in ﬁg. 2,

after training to distinguish the MNIST samples labeled with 0,1 and 6. Panels c and d are obtained

by reducing the dimensionality of the same point clouds.

showcase, it modulates depending on the considered pair of labels, reﬂecting the

validation accuracy of the classiﬁer, it being 97.75%,94.73%,97.33% for each of

the considered pair of labels. However, cornerpoints associated with multiple classes

are born later along the ﬁltration (they are larger holes) and have larger persistence,

again correlating with the classiﬁer’s score.

The same analysis for samples with labels (0,1,6)are reported in ﬁg. 3. There,

however, we considered the point cloud generated by the ﬁrst and second (last) layer

of the network. The corresponding multicolored persistence diagrams are shown in

panels (a) and (b) of ﬁg. 3. As a comparison, we reduced the dimensionality of the

point clouds by using UMAP [9].

A distance for neural architectures. One of the main advantages in using sta-

ble data representations such as multicolored persistence diagrams is that they can

be compared through the multicolored bottleneck distance. This distance is essen-

tially a color-wise version of the classical bottleneck distance used in persistent

homology [10], i.e. it only admits matching between cornerpoints that respect their

coloring (labeling).

As a proof of concept we compared the neural architecture and a second one,

identical in its structure, but with 10 units in the ﬁrst layer. The classiﬁcation ac-

curacy of the two models after 20 training epochs are 98.6% and 99%, respec-

tively. Although the difference in accuracy is low, the multicolored bottleneck

distance between the 1-MDPs computed by considering the activity of the last

layer of the two trained architectures can be used to discriminate between them.

586

6 Mattia G. Bergomi, Pietro Vertechi

The values of the multicolored bottleneck distance (label-subset !distance) are:

{0}!0.6,{1}!0.2,{6}!0.3,{1,6}!0.5,{0,6}!1.Coherently with our

previous applications, cycles produced by multiple labels or associated with labels

that are easily misclassiﬁed (0 and 6) have higher distances.

4 Discussion

After providing a short introduction to persistence homology and its rank-based

generalization, we showed how this latter technique can be used to represent and

compare in a robust and stable fashion the transformations learned by artiﬁcial neu-

ral networks in supervised classiﬁcation tasks.

This work is intended as an exempliﬁcation of the theoretical framework de-

scribed in [2], and in particular of the generalization of persistence to semisimple

categories. In a forthcoming work, we plan to apply this technique to the evaluation

and selection of more complex architectures (e.g. convolutional neural networks)

and biological neural networks.

References

1. M. Ferri, Persistent topology for natural data analysis - A survey, arXiv:1706.00411

[math]ArXiv: 1706.00411.

URL http://arxiv.org/abs/1706.00411

2. M. G. Bergomi, P. Vertechi, Rank-based persistence, Theory and applications of categories 35

(2020) 34.

3. M. G. Bergomi, M. Ferri, P. Vertechi, L. Zufﬁ, Beyond topological persistence: Starting from

networks, arXiv preprint arXiv:1901.08051.

4. H. Edelsbrunner, D. Letscher, A. Zomorodian, Topological persistence and simpliﬁcation, in:

Proceedings 41st Annual Symposium on Foundations of Computer Science, 2000, pp. 454–

463. doi:10.1109/SFCS.2000.892133.

5. D. Cohen-Steiner, H. Edelsbrunner, J. Harer, Stability of Persistence Diagrams,

Discrete & Computational Geometry 37 (1) (2007) 103–120. doi:10.1007/

s00454-006- 1276-5.

URL https://doi.org/10.1007/s00454-006- 1276-5

6. B. Di Fabio, C. Landi, A mayer–vietoris formula for persistent homology with an application

to shape recognition in the presence of occlusions, Foundations of Computational Mathemat-

ics 11 (5) (2011) 499.

7. L. Deng, The mnist database of handwritten digit images for machine learning research [best

of the web], IEEE Signal Processing Magazine 29 (6) (2012) 141–142.

8. R. H. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, H. S. Seung, Digital selec-

tion and analogue ampliﬁcation coexist in a cortex-inspired silicon circuit, Nature 405 (6789)

(2000) 947–951.

9. L. McInnes, J. Healy, J. Melville, Umap: Uniform manifold approximation and projection for

dimension reduction, arXiv preprint arXiv:1802.03426.

10. M. d’Amico, P. Frosini, C. Landi, Using matching distance in size theory: A survey, Interna-

tional Journal of Imaging Systems and Technology 16 (5) (2006) 154–161.

587