ArticlePDF Available

Multilayer representation of collaboration networks with higher-order interactions

Springer Nature
Scientific Reports
Authors:

Abstract and Figures

Collaboration patterns offer important insights into how scientific breakthroughs and innovations emerge in small and large research groups. However, links in traditional networks account only for pairwise interactions, thus making the framework best suited for the description of two-person collaborations, but not for collaborations in larger groups. We therefore study higher-order scientific collaboration networks where a single link can connect more than two individuals, which is a natural description of collaborations entailing three or more people. We also consider different layers of these networks depending on the total number of collaborators, from one upwards. By doing so, we obtain novel microscopic insights into the representativeness of researchers within different teams and their links with others. In particular, we can follow the maturation process of the main topological features of collaboration networks, as we consider the sequence of graphs obtained by progressively merging collaborations from smaller to bigger sizes starting from the single-author ones. We also perform the same analysis by using publications instead of researchers as network nodes, obtaining qualitatively the same insights and thus confirming their robustness. We use data from the arXiv to obtain results specific to the fields of physics, mathematics, and computer science, as well as to the entire coverage of research fields in the database.
Illustration of the maturation process of different topological features. Panel (a): the average degree ⟨k⟩\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\langle k \rangle$$\end{document} vs. the normalized fusion index n/n¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n/{\bar{n}}$$\end{document} (see text for definitions), for the areas of mathematics (light red curve) and computer science (light blue curve). The horizontal light red and light blue bars stand for the (plus or minus) ε=0.05\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon = 0.05$$\end{document} errors around the respective asymptotic values ⟨k⟩(n¯)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\langle k \rangle ({\bar{n}})$$\end{document}. Panel (b): the upper (lower) sub-panel reports the evolution of the diameter d (of the shortest path L) in the areas of mathematics (light red curve) and computer science (light blue curve). d maturates at layer 3 in the area of mathematics and at layer 10 in the area of computer science; L instead maturates at layer 4 in mathematics and again at layer 8 in computer science. Notice that different topological features maturate at different fusion stages. Panel (c): the average degree ⟨k⟩\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\langle k \rangle$$\end{document} in the area of physics vs. the fusion index n, for the direct graph Gphys\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_{phys}$$\end{document} (light blue line) and for the dual graph Gphys∗\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G^*_{phys}$$\end{document} (light red line).
… 
This content is subject to copyright. Terms and conditions apply.

Scientic Reports | (2021) 11:5666 | 
www.nature.com/scientificreports
Multilayer representation
of collaboration networks
with higher‑order interactions
E. Vasilyeva1,2, A. Kozlov1, K. Alfaro‑Bittner3,4*, D. Musatov1,5,6, A. M. Raigorodskii1,6,7,8,
M. Perc9,10,11 & S. Boccaletti1,3,12,13
Collaboration patterns oer important insights into how scientic breakthroughs and innovations
emerge in small and large research groups. However, links in traditional networks account only for
pairwise interactions, thus making the framework best suited for the description of two‑person
collaborations, but not for collaborations in larger groups. We therefore study higher‑order scientic
collaboration networks where a single link can connect more than two individuals, which is a natural
description of collaborations entailing three or more people. We also consider dierent layers of these
networks depending on the total number of collaborators, from one upwards. By doing so, we obtain
novel microscopic insights into the representativeness of researchers within dierent teams and their
links with others. In particular, we can follow the maturation process of the main topological features
of collaboration networks, as we consider the sequence of graphs obtained by progressively merging
collaborations from smaller to bigger sizes starting from the single‑author ones. We also perform the
same analysis by using publications instead of researchers as network nodes, obtaining qualitatively
the same insights and thus conrming their robustness. We use data from the arXiv to obtain results
specic to the elds of physics, mathematics, and computer science, as well as to the entire coverage
of research elds in the database.
Scientic collaboration networks are an important subset of complex social networks14. ey document patterns
of collaboration that we have formed to do research, and to arrive at new scientic discoveries and breakthroughs
that drive technological progress and innovation in our societies. e outstanding importance of science and
progress for the wellbeing of modern human societies, together with the consistent denition of scientic col-
laboration that is accurately documented in published research5, has given rise to a rich plethora of research
dedicated to the determination of structure and function of scientic collaboration networks612. Along the same
lines, citation networks1315, bipartite author-publication networks1619, hypergraphs of scientic output20, as well
as simplicial descriptions of publications and corresponding topological methods21,22, have also been considered
and studied in much detail.
However, despite the fact that traditional complex networks have come a long way in improving our under-
standing of economic, infrastructural, technological, as well as social and computer networks2326, the past decade
has witnessed the rise of the narrative that the majority of these networks do not exist in isolation. Rather, many
are coupled together and therefore should be best described as interdependent or multilayer networks27,28. Indeed,
it has been shown that even tiny changes or a failure in one network layer can lead to a catastrophic cascade of
much more signicant failures across many other network layers29. It was a seminal discovery, and while some
OPEN
           
Russia. 
Russia.         
China.             
            
           
            
Institute of Mathematics and Computer Science, Buryat State University, ul. Ranzhurova,
             
            
           
           
Italy.  *
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientic Reports | (2021) 11:5666 | 
www.nature.com/scientificreports/
argued that processes in dierent network layers could simply be added up and described as a conglomerate
process on a single-layer network, it soon became clear that, as is in general true for complex systems, the whole
is not simply the sum of its parts3032. Multilayer networks have since found applications for better understanding
epidemic spreading33,34, vaccination35, evolution of cooperation36, and biological organization at dierent scales37.
We here map published scientic papers in a multilayer network: scientists are nodes in all layers, and a link
between two nodes in the
jth
(
j>1
) layer stands for the participation of the corresponding two scientists in a
publication jointly written by j co-Authors. is way, single-author publications form the rst layer, two-author
publications form the second layer, three-author publications form the third layer, and so on. In doing so, the
layers themselves already hold important information about the collaboration. It is namely easy to argue that
two researchers that are the only two authors on a publication have a much stronger link than two researchers
that have co-written a paper that has several hundred authors, as is oen the case in high-energy physics publica-
tions. Multilayer collaboration networks dened in this way thus naturally take into account the problems that
are commonly associated with unweighted single-layer collaboration networks12,3841. Moreover, if we aggregate
all the layers, we simply obtain the complete scientic collaboration network, but with the added value that, as
we coalesce the layers obtained with ever larger collaboration sizes, we obtain novel microscopic insights into
the representativeness of researchers within dierent teams and their links with others, and we can follow the
maturation of topological features and the relevance each particular layer has in this process.
Another important distinction of our research to traditional scientic collaboration networks is that we
consider higher-order interactions to describe the networks. is is irrelevant for the rst and second layer, but
becomes theoretically much more convenient for the subsequent layers, where three or more coauthors are natu-
rally connected by a single higher-order link—a hyperlink – rather than a series of 2nd-order links connecting
pairs of researchers consecutively with one another. Although the value of higher-order interactions has been
recognized already in the early 70s by Atkin42,43 and Berge44, the interest peaked only recently with mounting
inability to converge on what constitutes a group or how to dene it consistently in the realm of social network
analysis4549, and the interested reader can nd a comprehensive account on the role of higher-order interactions
in networked systems in Ref.49.
Here we use the formalisms of multilayer and higher-order networks, oen also called hypergraphs, to study
the maturation of dierent topological characteristics of collaboration networks in physics, mathematics, and
computer science by using the arXiv database50. And we also consider the entire coverage of research elds in
the same database. e question that we seek to answer is, how many layers does one need to obtain a proper
and robust description of the collaboration network? Or equivalently, is it possible to describe the collaboration
network by taking into account publications with only a couple of authors, for example up to layer four or ve?
Results
We refer to the information publicly available in the arXiv (https ://arxiv .org/, https ://githu b.com/mattb ierba um/
arxiv -publi c-datas ets/) database50. Data parsing was also made according to50. From the database, metadata on
1,679,779 articles were downloaded. en, information about 1,068,043 unique authors was parsed.
Let N be the number of authors in the database. e main idea is to represent the data-set as a primal
H=(V,EH)
co-authorship hypergraph, in which
V={v1,...,vN}
is the set of nodes (authors) and
EH
is a set of
hyperedges accounting for articles. In this representation, an article co-authored by d authors corresponds then
to an hyperedge grouping the d authors of the paper, as it is schematically depicted in Fig.1a. In Fig.1a nodes
are therefore labeled with the name of the authors, whereas coloured hyperlinks are labeled by the correspond-
ing paper identier in the arXiv (with dierent colours, moreover, standing for dierent numbers of coauthors).
Notice that this representation allows to distinguish the case of two (or a limited group of ) researchers that are
the only authors of a publication and therefore they supposedly have a strong ties, from that of two (or a limited
group of) researchers that just participate in huge collaboration projects giving rise to papers that have several
hundred authors.
Moreover, the primal hypergraph
H=(V,EH)
can be associated to a dual hypergraph
H=(V,E
H)
in
which
V
is the set of articles and
E
H
groups papers written by the same author (in collaboration with others, or
individually), as schematically depicted in Fig.1b. One can also introduce a kind of “pairwise approximation” of
H, given by a undirected graph
where an edge between authors reects the existence of a joint paper
(independently on the number of coauthors). erefore, each hyperedge of H corresponds to a clique in G. With
the same spirit, the dual graph
G=(V,E
G)
is the pairwise approximation of
H
where nodes correspond to
articles and existence of an edge indicates that two articles have at least one joint author.
e hypergraph H (and its dual
H
) as well as the graphs G and
G
can be viewed as multilayer networks
with layering index dened by the number of article’s coauthors, and represented in Fig.1 by dierent colours
assigned to dierent papers (yellow denoting Manuscripts authored by a single scholar, green papers co-authored
by two scholars, etc...). en, one can operate a progressive fusion of such layers, and obtain the hypergraph
(graph, dual hypergraph and dual graph respectively) H(n) (G(n),
H(n)
,
G(n)
), where only papers with no
more than n coauthors are considered. Let
¯n
be the number of maximal layer in the statistics, and let us simplify
the notations further by writing
H(¯n)
(
G(¯n)
,
H(¯n)
,
G(¯n)
) as H (G,
H
,
G
). H, G,
H
and
G
are the “asymptotic”
graphs and they are actually the “classical” representations given to collaborations’ data, where all level of co-
authorship (as much those implying just a few scholars as those implying instead thousands of scholars) are
mixed together, and whose main properties have been largely characterized by the denition and calculation of
a wealth of topological measures.
Our idea is, instead, that such topological measures are actually maturating as one progressively fuse the
distinct layers. In other words, we suggest that there exists a given
˜n
at which each specic network’s topologi-
cal property maturates, i.e. it assumes the asymptotic value which is calculated on H, G,
H
and
G
. Obviously,
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientic Reports | (2021) 11:5666 | 
www.nature.com/scientificreports/
such maturation level may be dierent for dierent elds of cooperation (as processes of scientic collaboration
formation vary from eld to eld) and for dierent topological measures as well, and it is of great interest to
study how distinct topological properties emerge at distinct levels of fusions (i.e. taking into account only proper
subsets of the original data, where the number of coauthors of a given Manuscript is limited).
Finally, it has to be noticed that all articles in the database are related to eight main areas (physics, mathemat-
ics, quantitative biology, computer science, quantitative nance, statistics, electrical engineering and systems
science, economics) and in the present study we give our new representation of co-authorship networks for the
following elds (in parentheses we report the notation for each one of the obtained asymptotic graphs):
physics (
Hphys
,
Gphys
and the dual ones),
math (
Hmath
,
Gmath
and the dual ones),
computer science (
Hcs
,
Gcs
and the dual ones),
all eight areas together (H, G and the dual ones).
A rst characterization of the hypergraphs. A rst rough characterization of the primal and dual
graphs is shown in Fig.2, where we report the complementary cumulative distribution function (CCDF) for
H(
Hphys,Hmath ,Hcs
) in panel (a) and for G(
Gphys,Gmath ,Gcs
) in panel (b).
e CCDF is dened with the following expression:
where F(x) is the cumulative distribution function. If the tail of the distribution is tting a the power-law, then
where
xm
is a proper parameter, and
γ
can be estimated as the slope of the linear t in a log–log scale. In Fig.2a,b
we report the CCDF for nodes’ and hyperedges’ degree distributions of H,
Hphys,Hmath ,Hcs
and of their dual
graphs. From the gures it is apparent that the dierent graphs deviate from a power law in their tails. e
distributions in physics (red curves) can be seen as consisting of two dierent parts which actually seems to
correspond to dierent power law exponents. Most likely, such a property is due to experimental works in huge
collaborations. Hyperedges’ degree distributions in math and CS deviate from the power law only in tails. e
(1)
CCDF(x)=1F(x),
CCDF(x)x+1),x>xm,γ>0
Figure1. Schematic illustration of the co-authorship hypergraph (a) and of the dual hypergraph (b). In panel
(a) nodes are authors, and hyperlinks are co-authored Manuscript. e hyperlinks are labeled with letters and
colours. e legend at the bottom of the Figure reports for each letter the corresponding Manuscript’s identier
in the ArXiv. In the legend, moreover, Manuscripts are grouped in coloured boxes, and dierent colours stand
for a dierent number of coauthors: yellow papers are authored by a single Scholar, whereas green, red and blue
Manuscripts are co-authored by two, three and four Scholars, respectively. Panel (b) contains a sketch of the dual
representation, where nodes are now papers [labeled with the same colours and letters than in panel (a)], and
links are labeled with the name of the authors who participated in the co-authorship of the Manuscripts.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientic Reports | (2021) 11:5666 | 
www.nature.com/scientificreports/
distributions corresponding to the entire database display the same features as those in physics, as papers related
to this science prevail in the arXiv collection.
e results shown in Fig.2 point to the fact that there are papers with an extremely high number of coauthors.
However, as already discussed in the Introduction, real patterns of authors’ interactions are unlikely to be deter-
mined by such huge collaborations. erefore, it seems reasonable to analyse how papers with large numbers
of authors aect the network properties, or equivalently to analyse the maturation properties of the multilayer
networks dened in the previous sub-section.
The maturation process of topological features in the multilayer graph. e main objective of
our study is to compare stabilization and maturation patterns of co-authorship networks describing scientic
cooperation in dierent elds. To this purpose, we analyze how dierent topological properties change when the
layer index n changes.
Let x(n) be some property (i.e., some topological measure) of a graph G(n). To simplify the notations, we
omit the argument for the case of the maximal layer
¯n
, and we write
x=x(¯n)
. We say that the specic property
x(n) is maturated at the layer
˜n(x)
if:
where
ε
is a small constant accounting for an acceptable accuracy (i.e., a tolerable dierence). In all our calcula-
tions, we use
ε=0.05
.
In order to illustrate the concept of maturation, Fig.3 anticipates some of the major points and conclusions
of our Manuscript, and reports three panels, each one displaying the maturation behavior (or the absence of
maturation) of important topological features, as the fusion index n of layers increases.
Precisely, Fig.3a compares the behavior of the average degree
k
versus
n/¯n
for the areas of mathematics (light
red curve) and computer science (light blue curve). Normalization in the horizontal axis is needed because the
two areas have actually distinct maximum numbers
¯n
of layers. It is clearly seen that
k
maturates rather early
in the area of mathematics:
k(n/¯n)
is a monotonically increasing curve which attains its asymptotic value (the
value at
nn
) already at layer
˜n=8
. e horizontal light red bar in panel (a), indeed, stands for the (plus or
minus)
ε=0.05
error around the asymptotic value
k(¯n)
, and it is evident that the curve
k(n/¯n)
stays inside
the error area for all values of
˜nn≤¯n
. At variance, the average degree never maturates in the area of computer
science, as witnessed by the light blue line in Fig.3a: once again the horizontal light blue bar indicates the (plus
or minus)
ε=0.05
error around the asymptotic value
k(¯n)
, but now the curve
k(n/¯n)
never enters the error
area before attaining its asymptotic value at
nn
.
Dierent topological features may maturate at dierent values of
˜n
, as illustrated in panel (b) of Fig.3. Namely,
the upper (lower) part of panel (b) reports the evolution of the diameter d (of the shortest path L) in the areas
of mathematics (red curve) and computer science (light blue curve). d maturates at layer 3 in the area of math-
ematics and at layer 10 in the area of computer science; L instead maturates at layer 4 in mathematics and again
at layer 8 in computer science. It is seen, moreover, that dierent fusion stages at which maturation in dierent
(2)
˜n
(x)=arg min
n
n:∀kn−→ |x(k)x|
x
ε
,
Figure2. (a) Complementary cumulative distribution functions (CCDF, see text for denition) for the
primal graphs obtained from the data-set. e distributions are functions of the nodes’ degree distributions for
H(
Hphys,Hmath ,Hcs
) and of hyperedges’ degree distributions for the respective dual hypergraphs. (b) CCDF
for the dual graphs, which are functions of the hyperedges’ degree distribution in H(
Hphys,Hmath ,Hcs
) and of
the nodes degree distribution in the respective dual hypergraphs. Curves are coloured according to the dierent
speciality from which papers are extracted from the data-set (see the colour code at the top right of each panel).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientic Reports | (2021) 11:5666 | 
www.nature.com/scientificreports/
areas takes place is not the simple consequence of the normalization of the fusion index n to the relative maxi-
mum number of layers in the area.
Finally, panel (c) of Fig.3 anticipates another important conclusion of our study: in some cases dual graphs,
where hyperlinks connect publications instead of coauthors, may represent a better rendering of collaboration
networks, in that some topological features maturate in dual graphs, whilst they never maturate in the direct
graphs. is is illustrated with reference to the average degree
k
in the area of physics: it is clearly seen that the
curve
k(n)
for the direct graph
Gphys
(light blue line) does not display any maturation feature, whereas
k(n)
(light red line) maturates at layer 574 in
G
phys
.
It is essential to remark that the calculation of some graph’s topological measure for all layers n may have an
associated very high computational demand. erefore, the networks G(n) and
G(n)
are here analysed using a
rather sparse grid, and aer that the dependencies are interpolated using splines. e procedure, however, do
not aect (nor distorts) the conclusions which we are oering below.
General properties. e natural starting point is the analysis of the networks’ global substructures.
Let CC be a set of network’s connected components, and LCC be the set of vertices in the largest connected
component. e following notation can be introduced:
m=|EG|
,
NCC =|CC|
,
sLCC
=
|LCC|
N
.
Table1 reports the maximal number of layers (
¯n
), the number of nodes (N), the number of edges (m), the number
of connected components (
NCC
), the relative size of the largest connected component (
sLCC
), and the maturation
layer’s numbers for all these features (
˜n(·)
).
e rst signicant feature which should be noticed is the dierentiation in
¯n
for the dierent disciplines.
Namely, physics corresponds to the highest value of
¯n
(2831 layers). Moreover, the number of nodes N in physics
maturates quite late if compared with math and CS, therefore a consistent number of Scholars in this eld write
papers only in rather big collaborations. In contrast, in math one has see the smallest number of layers (67), and
not only N. Furthermore, not only N maturates early (already at level 5) in this eld, but even the edges’ number
maturates at level eight, which implies that focusing only on papers with no more than eight authors one has
an almost complete description of the graph representing the math discipline. For the other graphs, one sees
instead that the number of edges signicantly changes at all levels of the fusion process, up to the nal layers.
Another notable feature which appear from Table1 is related to the number of connected components. is
property maturates relatively early for all elds, as well as for the whole graph. erefore, besides the largest
connected component, the general backbone of the other part of the graph is formed by many clusters (con-
nected components) each one containing a relatively small number of papers. On the other hand, the largest
connected component consists of about 80% of nodes for the elds of math and CS and 93% of nodes in physics.
e relative size of the LCC in the whole graph is 90%, which means that the LCC of the whole graph contains
all authors from the LCC’s of the dierent elds’ graphs. is notion follows from the fact that if we suppose
that the smallest LCC from Table1 (the one of math) is not included into LCC of the whole graph than size of
Figure3. Illustration of the maturation process of dierent topological features. Panel (a): the average degree
k
vs. the normalized fusion index
n/¯n
(see text for denitions), for the areas of mathematics (light red curve)
and computer science (light blue curve). e horizontal light red and light blue bars stand for the (plus or
minus)
ε=0.05
errors around the respective asymptotic values
k(¯n)
. Panel (b): the upper (lower) sub-panel
reports the evolution of the diameter d (of the shortest path L) in the areas of mathematics (light red curve) and
computer science (light blue curve). d maturates at layer 3 in the area of mathematics and at layer 10 in the area
of computer science; L instead maturates at layer 4 in mathematics and again at layer 8 in computer science.
Notice that dierent topological features maturate at dierent fusion stages. Panel (c): the average degree
k
in
the area of physics vs. the fusion index n, for the direct graph
Gphys
(light blue line) and for the dual graph
G
phys
(light red line).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientic Reports | (2021) 11:5666 | 
www.nature.com/scientificreports/
LCC of the whole graph should be not more than
(1.07 0.76 ·0.21)/1.07 =0.85
. As a conclusion, there is an
important role of interdisciplinary links connecting Authors from dierent elds. In Table1 we also report the
same properties for the dual graphs. e remarkable result is that even the edge number maturates for all elds
in the dual graphs. Most likely this occurs because an extremely large number of authors is much more frequent
than an extremely large number of papers written by a particular author, and moreover such papers have to have
dierent sets of coauthors in order to contribute to the number of edges. erefore, papers with large number
of authors contribute large cliques in G but not in its dual graph, and therefore, in this context, the dual graph
constitutes a better representation of the social collaborations than its primal counterpart.
Degree distribution. e second step of our analysis is the description of the local networks’ properties,
and we start with the study of the degree distributions. Let
ki(n),i=1, ...,N(n)
be the degree of node i in G(n).
Our results show that, for all graphs analysed in the current study, the probability distribution functions (PDFs)
of the degree k are fat-tailed, with tails well described by a power law scaling with exponent
γ
:
Table2 reports the values of the mean degree in the four graphs studied (
k
) and the estimated tail exponents
γ
for the corresponding degree distributions. e only eld in which we see a maturation of the mean degree is
(3)
p
(k)
1
k
γ
,
Table 1. Maturation indices and maturation values of the main general properties of primal and dual graphs.
All notations and denitions are reported in the text. e symbol “–” reects the fact that the property does
not maturate, implying that signicant changes in the property’s value occur at all fusion indices, up to the nal
layer (the reported values are therefore the “asymptotic” ones obtained by fusing all layers).
All Math CS Phys
¯n
2831 67 427 2831
G
N,×106
1.07 0.21 0.28 0.71
˜n(N)
26 5 9 44
m,×107
4.11 0.05 0.13 3.96
˜n(m)
8 – –
NCC,×104
5.18 2.74 2.08 2.48
˜n(NCC)
11 4 6 23
sLCC
0.90 0.76 0.79 0.93
˜n(sLCC)
7 4 6 7
G
N,×106
1.68 0.44 0.26 1.08
˜n(N)
8 4 6 10
m,×107
10.11 0.82 0.59 8.27
˜n(m)
522 5 9 574
NCC,×104
5.18 2.7 2.08 2.48
˜n(NCC)
11 4 6 23
sLCC
0.94 0.86 0.86 0.95
˜n(sLCC)
4 3 5 4
Table 2. Maturation indices and maturation values of the degree distribution’s properties for the primal and
dual graphs. All notations and denitions are reported in the text. e symbol “–” reects the fact that the
property does not maturate, implying that signicant changes in the property’s value occur at all fusion indices,
up to the nal layer (the reported values are therefore the “asymptotic” ones obtained by fusing all layers).
All Math CS Phys
¯n
2831 67 427 2831
G
k
77.01 4.62 8.97 111.51
˜n(k)
8 – –
γ
1.7 3.6 2.6 1.6
˜n(γ)
498 – 1411
G
k
120.54 36.65 44.33 153.54
˜n(k)
522 5 8 574
γ
2.8 3.3 3.9 2.6
˜n(γ)
756 2 7 495
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientic Reports | (2021) 11:5666 | 
www.nature.com/scientificreports/
math, which is also characterized by the highest tail exponent. e fat-tailed nature of the degree distribution is
the most likely reason for the absence of maturation in the mean degree, as well as in the tail exponent estima-
tion. In the cases of the whole graph and of the graph of physics, one sees that maturation, however, occurs at a
very high value of the fusion index. Even if such distribution maturates, sample estimations of such values are
oen very sensitive to additional observations or data. Table2 reports also the results for the dual graphs. One
immediately sees that the mean degrees of all graphs under consideration maturate and, moreover, the exponents
of the respective power law distributions are signicantly higher. erefore, once again the dual graphs estimates
seem to provide a more accurate characterization.
Network clustering. One of the most important graph’s properties is clustering. Such a measure, indeed,
accounts for networks’ transitivity, and in the context of co-authorship graph it describes how oen two coau-
thors of one particular author are coauthors themselves in other papers. Quantication of clustering’s eects can
be obtained by measuring two dierent coecients: the global and the local clustering ones. e global cluster-
ing coecient is dened by the following expression:
where
#K3
is the number of triangles in the graph and
#P2
is the number of connected chains of length two.
e local clustering coecient of vertex i is instead calculated as
where
EG
is the set of edges of graph G,
Ni
is the set of is neighbors. I.e. local clustering coecient measures the
fraction of connected triples around node i. e overall graph clustering property
c
can be obtained by averag-
ing the local clustering coecient of Eq.(5) over all nodes:
One can easily see that the expression (Eq. (4)) can be rewritten as
FromEq. (7) it follows that in calculating the global clustering coecient the higher is the degree of the nodes
the higher its weight in the average, whereas
c
takes all nodes equivalently. erefore, the higher the dierence
between C and
c
is, the higher is the non-uniformity of clustering distribution between nodes.
Table3 shows the clustering coecients estimation and maturation for all primal and dual graphs. e rst
notable feature is that the global clustering coecient never maturates, while the averages of the local clustering
coecient always do. is naturally follows from the fact that papers from the last layers are associated with
larger numbers of additional triangles, and they also contribute a huge number of edges, thus enlarging nodes’
degrees signicantly, which are then used to calculate weights in the average of the global clustering coecient
[see Eq.(7)]. e smallest values of the clustering coecients are in the eld of math, which also can be distin-
guished for signicant dierence between C and
c
. Namely, in math global clustering is two times less than the
averaged local one. erefore in maths nodes with high degree are less clustered then the ones with small degree.
In dual graphs, both global and local clustering coecients maturate. Moreover, the averaged local clustering
coecients maturate earlier than the ones calculated in the primal graphs. Furthermore, the levels of maturation
(4)
C
=
3#K
3
#P2
,
(5)
c
i=
|{j,kE
G
:j,kN
i
}|
C2
|Ni|
,
(6)
c=
1
N
iV
ci
.
(7)
C
=iVC
2
|Ni|ci
iV
C2
|Ni|
.
Table 3. Maturation indices and maturation values of the graphs’ clustering properties. All notations and
denitions are reported in the text. e symbol “–” reects the fact that the property does not maturate,
implying that signicant changes in the property’s value occur at all fusion indices, up to the nal layer (the
reported values are therefore the “asymptotic” ones obtained by fusing all layers).
All Math CS Phys
¯n
2831 67 427 2831
G
C0.57 0.24 0.70 0.57
˜n(C)
– –
c
0.65 0.48 0.69 0.68
˜n(c)
10 5 6 15
G
C0.27 0.78 0.62 0.26
˜n(C)
546 4 7 522
c
0.72 0.76 0.68 0.70
˜n(c)
5 2 3 7
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientic Reports | (2021) 11:5666 | 
www.nature.com/scientificreports/
in the whole graph, in physics and in maths are the same as those corresponding to the maturation of the number
of edges (
˜n(m)
in Table1). In CS, maturation of the global clustering occurs at the 8-th level, while number of
edges maturates at the 9-th level. In the physics dual graph, there is a signicant dierence between values of local
and averaged global clustering: the former is more than two times less than the latter. Most likely, this property
is the consequence of the existence of collaborative papers with large degree connected with other papers writ-
ten by extremely large number of authors. However, such “connecting” authors may have not a close relation of
collaboration between each other, and therefore papers authored by them are not necessarily neighbors in the
dual graph.
Diameter and characteristic path length. e essential measure describing closeness between two par-
ticular authors (papers) is the shortest path. Based on this measure two important characteristics of a graph can
be calculated. e rst is the diameter (d)—the maximum shortest path for all pairs of nodes in the LCC. e
second is the characteristic path length (L)—the mean shortest path for all pairs of nodes in the LCC.
e maturation analysis for d and L are presented in Table4. e characteristic path length properties in
physics (and, as a consequence, in the whole graph) dier signicantly from all other elds: the value of L is
less than half those in math and CS. However, this value changes signicantly on the last layers, therefore, this
property is highly dependent on collaborative papers. Interestingly, graphs’ diameters maturate in all elds.
e maturation indices in math and CS are close to the values obtained for the number of nodes. erefore, in
these elds papers with relatively large number of authors are basically joint with those who are already in the
same community. e dierence in physics, instead, indicates that large collaborative papers may inuence the
network’s community structure.
Similar conclusions can be drawn from the results of the dual graphs, for which even in the case of physics
(and the whole network) the characteristics path length maturates. Its maturation appears quite late, but it should
be noted that it happens much earlier than edges number maturates. In CS, maturation of both the diameter and
the characteristic path length appears earlier than in the primal one. e same is true for the diameter in the
eld of math. However, characteristic path length in math dual graph maturates later than in the primal graph
of this eld.
Centrality and eciency. As nodes in the networks have very dierent importance or relevance, various
measures of nodes’ centrality have been proposed in the literature. As the distribution of nodes’ centralities in
the network (the so-called centrality vector) contains very relevant information on the graphs structure and
function, maturation of the centrality vectors is an important signal of the network maturation as a whole. We
here report the maturation properties of the mean betweenness and closeness centrality measures, which will
be dened momentarily. On the other hand, we also focus here on network’s eciency, which in real social
networks describes the so called “small-world” property—the fact that information transfer is very ecient in
such networks51.
Node is betweenness centrality
bi
is dened as
where |P(j,k)| is the total number of shortest paths between nodes j and k, and |P(j,k,i)| is the number of shortest
paths between j and k which pass through node i. Mean betweenness
b
of the graph is obtained by averaging
over all nodes, and in the paper we calculate it only for nodes belonging to the LCC.
Node is closeness centrality
qi
is dened as
(8)
b
i=
2
(N1)(N2)
j
�=
i,k
�=
i
|P(j,k,i)|
|P(j,k)|
,
Table 4. Maturation indices and maturation values of the graphs’ diameter and characteristic path length. All
notations and denitions are reported in the text. e symbol “–” reects the fact that the property does not
maturate, implying that signicant changes in the property’s value occur at all fusion indices, up to the nal
layer (the reported values are therefore the “asymptotic” ones obtained by fusing all layers).
All Math CS Phys
¯n
2831 67 427 2831
G
d21 25 26 21
˜n(d)
436 3 10 425
L3.1 7.3 6.1 2.8
˜n(L)
4 8 –
G
d21 24 26 20
˜n(d)
402 4 6 434
L5.4 8.8 5.2 4.7
˜n(L)
329 9 8 430
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientic Reports | (2021) 11:5666 | 
www.nature.com/scientificreports/
where d(i,j) is the length of the shortest path between i and j. Once again, the mean closeness
q
is obtained by
averaging over all nodes, and limiting ourselves to the set of nodes in the LCC.
Network’s eciency is dened by
Table5 shows the results for E,
q
and
b
. In co-authorship graphs of math and CS papers with extremely large
number of authors do not aect the values of the listed properties and, moreover, maturation appears relatively
early in both disciplines. is is in agreement with the results of the previous sub-section, where characteristics
path length’s maturation was analysed. Moreover, the maturation levels of eciency and betweenness for these
two elds are close to
˜n(L)
. In the dual graphs, the same conclusion can be made only for math and physics.
In the case of CS, the dual graph does not instead maturate, and this is the only case in which the dual graph
representation seems to provide a less accurate representation of the data. It has to be noticed that, for the CS
dual graph and the one for all elds, maturation of centrality and eciency (not reported here) occurs when
ε
is slightly increased (i.e., when
ε=0.1
).
Discussion
In summary, we have studied patterns of collaboration in the arXiv database by using the formalism of multilayer
higher-order networks, where each layer corresponds to the number of collaborators on publications that are
considered for that layer. For layer three, corresponding to three-author publications, and onwards, we have
also used higher-order links to connect groups of authors as a much more convenient and theoretically elegant
description of group interactions. By doing so, we were able to monitor separately how each relevant topologi-
cal feature of the network matures toward the value that was measured for the complete classical collaboration
network. We have also demonstrated that our representation reveals the true nature of collaborations among
researchers, which is fundamentally dierent when they coauthor a paper in a small group, implying an intense
and meaningful research relationship, as opposed to a collaboration in a huge group of coauthors were only very
few actually share any noteworthy contact.
In terms of implications for specic research elds, our research shows that dierent topological features
mature at dierent fusion indices for dierent research elds. Earlier for elds where the number of authors on
a particular publication is traditionally low, as in mathematics, and later for elds where large collaborations are
more common, as in physics. Either way, our representation allows us to progressively follow how the nal values
that determine the topological features of collaboration networks emerge as the fusion index, i.e., the number
of layers that have been fused together, increases. is thus oers a completely new and fresh microscopic view
into the collaboration patterns of researchers across dierent disciplines and depth of contact.
It is also worth noting that our research conrms, in line with previous research20,52, that the alternative
representation of collaboration networks, where hyperlinks connect publications instead of coauthors, yield a
better representation in that for these type of collaboration networks all topological features eventually mature
as layers are coalesced, whilst in the classical representation some topological feature never mature.
(9)
q
i=
1
j
V,j
�=
i
d(i,j)
,
(10)
E
=
1
N(N1)
i,j
V,i
�=
j
1
d(i,j)
.
Table 5. Maturation indices and maturation values of the graphs’ centrality and eciency indicators. All
notations and denitions are reported in the text. e symbol “–” reects the fact that the property does not
maturate, implying that signicant changes in the property’s value occur at all fusion indices, up to the nal
layer (the reported values are therefore the “asymptotic” ones obtained by fusing all layers).
All elds Math CS Physics
¯n
2831 67 427 2831
G
E0.40 0.14 0.17 0.43
˜n(E)
– 4 7 –
q
3.4 ·103
8.8 ·107
7.4 ·107
2.5 ·103
˜n(q)
6 15 –
b
2.4 ·105
4.0 ·105
2.3 ·105
1.9 ·105
˜n(b)
4 18 –
G
E0.24 0.13 0.21 0.27
˜n(E)
– 9 – 540
q
3.2 ·103
4.5 ·104
4.5 ·104
2.4 ·103
˜n(q)
– 9 – 444
b
4.7 ·104
9·104
4.7 ·104
3.8 ·104
˜n(b)
– 9 – 447
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol:.(1234567890)
Scientic Reports | (2021) 11:5666 | 
www.nature.com/scientificreports/
ese insights create many possible directions for future research. For example, one viable avenue worth
exploring is to customize growth models of hypergraphs that would take into account the fact that a given
topological feature must mature at a given stage of fusion. We would thereby obtain a more apt theoretical
description of scientic collaboration, which would in turn promise a better understanding of this vital process
that upkeeps modern human societies. It would also be interesting to look at the maturation of other network
properties, such as the community structure and various centrality measures. Lastly, it would also be worth while
exploring how the proposed multilayer higher-order network formalism works in other forms of documented
collaboration, such as on patents and legal proceedings. We hope our research will prove inspirational towards
this goals in the near future.
Received: 28 December 2020; Accepted: 19 February 2021
References
1. Newman, M. E. J. Networks: An Introduction (Oxford University Press, 2010).
2. Estrada, E. e Structure of Complex Networks: eory and Applications (Oxford University Press, 2011).
3. Barabási, A.-L. Network Science (Cambridge University Press, 2016).
4. Latora, V., Nicosia, V. & Russo, G. Complex Networks: Principles, Methods and Applications (Cambridge University Press, 2017).
5. Newman, M. E. J. e structure of scientic collaboration networks. Proc. Natl. Acad. Sci. U.S.A. 98, 404 (2001a).
6. Newman, M. E. J. Scientic collaboration networks. I. Network construction and fundamental results. Phys. Rev. E 64, 016131
(2001b).
7. Newman, M. E. J. Scientic collaboration networks. II. Shortest paths, weighted networks, and centrality. Phys. Rev. E 64, 016132
(2001c).
8. Newman, M. E. J. Assortative mixing in networks. Phys. Rev. Lett. 89, 208701 (2002).
9. Fan, Y. et al. Network of econophysicists: A weighted network to investigate the development of econophysics. Int. J. Mod. Phys.
B 18, 2505 (2004).
10. Perc, M. Growth and structure of Slovenia’s scientic collaboration network. J. Informetrics 4, 475 (2010).
11. Krumov, L., Fretter, C., Müller-Hannemann, M., Weihe, K. & Hütt, M.-T. Motifs in co-authorship networks and their relation to
the impact of scientic publications. Eur. Phys. J. B 84, 535 (2011).
12. Pan, R. K. & Saramäki, J. e strength of strong ties in scientic collaboration networks. EPL 97, 18007 (2012).
13. Redner, S. How popular is your paper? An empirical study of the citation distribution. Eur. Phys. J. B 4, 131 (1998).
14. Lehmann, S., Lautrup, B. & Jackson, A. D. Citation networks in high energy physics. Phys. Rev. E 68, 026113 (2003).
15. Kuhn, T., Perc, M. & Helbing, D. Inheritance patterns in citation networks reveal scientic memes. Phys. Rev. X 4, 041036 (2014).
16. Goldstein, M. L., Morris, S. A. & Yen, G. G. Group-based Yule model for bipartite author-paper networks. Phys. Rev. E 71, 026108
(2005).
17. Peltomäki, M. & Alava, M. Correlations in bipartite collaboration networks. J. Stat. Mech. 6, P01010 (2006).
18. Tian, L., He, Y., Liu, H. & Du, R. A general evolving model for growing bipartite networks. Phys. Lett. A 376, 1827 (2012).
19. Zhou, Y.-B., Lü, L. & Li, M. Quantifying the inuence of scientists and their publications: distinguishing between prestige and
popularity. New J. Phys. 14, 033033 (2012).
20. Lung, R. I., Gaskó, N. & Suciu, M. A. A hypergraph model for representing scientic output. Scientometrics 117, 1361 (2018).
21. Moore, T. J., Drost, R. J., Basu, P., Ramanathan, R., Swami, A. Analyzing collaboration networks using simplicial complexes: A
case study. In 2012 Proceedings IEEE INFOCOM Workshops 238–243 (IEEE, 2012).
22. Patania, A., Petri, G. & Vaccarino, F. e shape of collaborations. EPJ Data Sci. 6, 18 (2017).
23. Albert, R. & Barabási, A.-L. Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47 (2002).
24. Newman, M. E. J. e structure and function of complex networks. SIAM Rev. 45, 167 (2003).
25. Boccaletti, S., Latora, V., Moreno, Y., Chavez, M. & Hwang, D. Complex networks: Structure and dynamics. Phys. Rep. 424, 175
(2006).
26. Fortunato, S. Community detection in graphs. Phys. Rep. 486, 75 (2010).
27. Boccaletti, S. et al. e structure and dynamics of multilayer networks. Phys. Rep. 544, 1 (2014).
28. Kivelä, M. et al. Multilayer networks. J. Complex Netw. 2, 203 (2014).
29. Buldyrev, S. V., Parshani, R., Paul, G., Stanley, H. E. & Havlin, S. Catastrophic cascade of failures in interdependent networks.
Nature 464, 1025 (2010).
30. Gómez, S. et al. Diusion dynamics on multiplex networks. Phys. Rev. Lett. 110, 028701 (2013).
31. De Domenico, M. et al. Mathematical formulation of multilayer networks. Phys. Rev. X 3, 041022 (2013).
32. De Domenico, M., Solé-Ribalta, A., Omodei, E., Gómez, S. & Arenas, A. Ranking in interconnected multilayer networks reveals
versatile nodes. Nat. Commun. 6, 6868 (2015).
33. Pastor-Satorras, R., Castellano, C., Van Mieghem, P. & Vespignani, A. Epidemic processes in complex networks. Rev. Mod. Phys.
87, 925 (2015).
34. de Arruda, G. F., Rodrigues, F. A. & Moreno, Y. Fundamentals of spreading processes in single and multilayer complex networks.
Phys. Rep. 756, 1 (2018).
35. Wang, Z. et al. Statistical physics of vaccination. Phys. Rep. 664, 1 (2016).
36. Wang, Z., Wang, L., Szolnoki, A. & Perc, M. Evolutionary games on multilayer networks: A colloquium. Eur. Phys. J. B 88, 124
(2015).
37. Gosak, M. et al. Network science of biological systems at dierent scales: A review. Phys. Life Rev. 24, 118 (2018).
38. Barrat, A., Barthelemy, M., Pastor-Satorras, R. & Vespignani, A. e architecture of complex weighted networks. Proc. Natl. Acad.
Sci. U.S.A. 101, 3747 (2004).
39. Opsahl, T., Colizza, V., Panzarasa, P. & Ramasco, J. J. Prominence and control: e weighted rich-club eect. Phys. Rev. Lett. 101,
168702 (2008).
40. Ramasco, J. J. & Morris, S. A. Social inertia in collaboration networks. Phys. Rev. E 73, 016122 (2006).
41. Ke, Q. & Ahn, Y.-Y. Tie strength distribution in scientic collaboration networks. Phys. Rev. E 90, 032804 (2014).
42. Atkin, R. H. From cohomology in physics to Q-connectivity in social science. Int. J. Man-Mach. Stud. 4, 139 (1972).
43. Atkin, R. H. Mathematical Structure in Human Aairs (Heinemann Educational Publishers, 1974).
44. Berge, C. Graphs and Hypergraphs (North-Holland Pub. Co., 1973).
4 5. Estrada, E. & Rodríguez-Velázquez, J. A. Subgraph centrality and clustering in complex hyper-networks. Physica A 364, 581 (2006).
46. Benson, A. R., Gleich, D. F. & Leskovec, J. Higher-order organization of complex networks. Science 353, 163 (2016).
47. Perc, M. et al. Statistical physics of human cooperation. Phys. Rep. 687, 1 (2017).
48. Alvarez-Rodriguez, U. et al. Evolutionary dynamics of higher-order interactions in social networks. Nat. Hum. Behav. 1, 1 (2020)
((in press)).
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol.:(0123456789)
Scientic Reports | (2021) 11:5666 | 
www.nature.com/scientificreports/
49. Battiston, F. et al. Networks beyond pairwise interactions: structure and dynamics. Phys. Rep. 874, 1 (2020).
50. Clement, C. B., Bierbaum, M., O’Keee, K. P., & Alemi, A. A. On the Use of ArXiv as a Dataset. arXiv :1905.00075 (2019).
51. Latora, V. & Marchiori, M. Ecient behavior of small-world networks. Phys. Rev. Lett. 87, 198701 (2001).
52. Gaskó, N., Lung, R. I. & Suciu, M. A. A new network model for the study of scientic collaborations: Romanian computer science
and mathematics co-authorship networks. Scientometrics 108, 613 (2016).
Acknowledgements
E.V. acknowledges the project “Post-crisis world order: challenges and technologies, competition and coopera-
tion” supported by Ministry of Science and Higher Education of the Russian Federation (agreement number
075-15-2020-783). M.P. was supported by the Slovenian Research Agency (Grant Nos. P1-0403 and J1-2457). e
research of D. Musatov and A. M. Raigorodskii was supported by the Russian Federation Government (Grant
number 075-15-2019-1926).
Author contributions
S.B. conceived the study; D.M. and A.M.R. suggested to consider both primal and dual hypergraphs; E.V. and
A.K. performed all data analyses; E.V., A.K., and K. A.-B. made all graphical representations; K. A.-B., D.M.,
A.M.R., M.P., and S.B. discussed and analyzed the results. All authors drew the main conclusions and wrote the
manuscript.
Competing interests
e authors declare no competing interests.
Additional information
Correspondence and requests for materials should be addressed to K.A.-B.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
Open Access is article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons licence, and indicate if changes were made. e images or other third party material in this
article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.
© e Author(s) 2021
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... There, the contribution of links of different orders can be indicative of how smaller or larger author teams contribute to the connectivity and integration of global science. [52][53][54] First methodological steps towards determining the relevance of the ties of different orders were taken by Vasilyeva et al., 55 who proposed a multi-layer network representation to identify the smallest size of group interactions that contribute significantly to the network structure. Recently, a filtering procedure to remove small, large or specific orders was proposed by Landry et al., 56 to investigate how particular global and local network properties are affected when specific orders are preserved or filtered out. ...
... Moreover, the precise contribution of different orders to the network structure could vary depending on which topological network measure is considered. 55 This underlines the need of investigating the contribution of hyperlinks with different sizes to the network structure at different levels of analysis. One can focus on the contribution of different orders to the structure of the traditional pairwise (or projected) representation of the network, where each pair of nodes is connected by a link if they are connected by a hyperlink of any order. ...
... . Hypergraph H − d is equivalent to the one proposed by Vasilyeva et al. 55 and obtained by the lower or equal (LEQ) filtering of Landry et al. 56 Differently, H + d is obtained by including only hyperlinks with order d ′ > d, i.e., ...
Preprint
Higher-order networks effectively represent complex systems with group interactions. Existing methods usually overlook the relative contribution of group interactions (hyperlinks) of different sizes to the overall network structure. Yet, this has many important applications, especially when the network has meaningful node labels. In this work, we propose a comprehensive methodology to precisely measure the contribution of different orders to the overall network structure. First, we propose the order contribution measure, which quantifies the contribution of hyperlinks of different orders to the link weights (local scale), number of triangles (mesoscale) and size of the largest connected component (global scale) of the pairwise weighted network. Second, we propose the measure of order relevance, which gives insights in how hyperlinks of different orders contribute to the considered network property. Most interestingly, it enables an assessment of whether this contribution is synergistic or redundant with respect to that of hyperlinks of other orders. Third, to account for labels, we propose a metric of label group balance to assess how hyperlinks of different orders connect label-induced groups of nodes. We applied these metrics to a large-scale board interlock network and scientific collaboration network, in which node labels correspond to geographical location of the nodes. Experiments including a comparison with randomized null models reveal how from the global level perspective, we observe synergistic contributions of orders in the board interlock network, whereas in the collaboration network there is more redundancy. The findings shed new light on social scientific debates on the role of busy directors in global business networks and the connective effects of large author teams in scientific collaboration networks.
... Also in case of disease infection, one healthy unit can be infected in touch with multiple infected units, and takes the form of higher-order interactions [42,43]. One can also take the example of a collaboration network, where the dynamics of multiauthor collaboration can not be described by the combination of pairwise collaborations [44]. In such systems, the multiauthor interactions can be represented as higher-order interactions which cannot always be expressed as a sum of pairwise interactions. ...
... Now it can be easily checked that λ = 0 is the only solution of Eqs. (44) and (45) produces the solution 2λ = K 1 r a 1 a + K 2 r b+2 ...
... An example of a network that captures higher-order interactions between different entity types is the bipartite network, where nodes are partitioned by two separate groups, and edges only connect nodes from different groups. Bipartite networks are particularly suitable for modeling systems where two types of entities interact, such as authors and papers in a collaboration network [18], recommendation systems [19] where nodes represent users and the recommending items, and specifically in ...
Preprint
Full-text available
Analysis of single-cell RNA sequencing data is often conducted through network projections such as coexpression networks, primarily due to the abundant availability of network analysis tools for downstream tasks. However, this approach has several limitations: loss of higher-order information, inefficient data representation caused by converting a sparse dataset to a fully connected network, and overestimation of coexpression due to zero-inflation. To address these limitations, we propose conceptualizing scRNA-seq expression data as hypergraphs, which are generalized graphs in which the hyperedges can connect more than two vertices. In the context of scRNA-seq data, the hypergraph nodes represent cells and the edges represent genes. Each hyperedge connects all cells where its corresponding gene is actively expressed and records the expression of the gene across different cells. This hypergraph conceptualization enables us to explore multi-way relationships beyond the pairwise interactions in coexpression networks without loss of information. We propose two novel clustering methods: (1) the Dual-Importance Preference Hypergraph Walk (DIPHW) and (2) the Coexpression and Memory-Integrated Dual-Importance Preference Hypergraph Walk (CoMem-DIPHW). They outperform established methods on both simulated and real scRNA-seq datasets. The improvement brought by our proposed methods is especially significant when data modularity is weak. Furthermore, CoMem-DIPHW incorporates the gene coexpression network, cell coexpression network, and the cell-gene expression hypergraph from the single-cell abundance counts data altogether for embedding computation. This approach accounts for both the local level information from single-cell level gene expression and the global level information from the pairwise similarity in the two coexpression networks.
... In systems composed of multiple particles, interactions may go beyond pairwise relations and involve the collective action of groups of agents that cannot be decomposed. A classic example is collaboration networks, where more than two people can participate in a project or coauthor a paper [1]. In physics, the Einstein-Infeld-Hoffmann equations of motion, which incorporate small general-relativistic effects into many-body newtonian mechanics, lead to gravitational forces that are proportional to the product of several different masses [2,3]. ...
Preprint
Full-text available
Higher order interactions can lead to new equilibrium states and bifurcations in systems of coupled oscillators described by the Kuramoto model. However, even in the simplest case of 3-body interactions there are more than one possible functional forms, depending on how exactly the bodies are coupled. Which of these forms is better suited to describe the dynamics of the oscillators depends on the specific system under consideration. Here we show that, for a particular class of interactions, reduced equations for the Kuramoto order parameter can be derived for arbitrarily many bodies. Moreover, the contribution of a given term to the reduced equation does not depend on its order, but on a certain effective order, that we define. We give explicit examples where bi and tri-stability is found and discuss a few exotic cases where synchronization happens via a third order phase transition.
... Similarly, Ref. 37 investigates higher-order interactions in a memristive Rulkov model network, using master stability functions to analyze synchronization patterns, and demonstrates that incorporating higher-order interactions lowers the required coupling parameters for synchronization while also showing that larger network sizes enhance synchronization dynamics and facilitate cluster synchronization under specific coupling conditions. Many other intriguing studies on higher-order interactions [38][39][40][41][42][43][44][45][46][47][48] exist; however, most of them primarily emphasis on long-term behaviors. ...
Article
Full-text available
Understanding how species interactions shape biodiversity is a core challenge in ecology. While much focus has been on long-term stability, there is rising interest in transient dynamics—the short-lived periods when ecosystems respond to disturbances and adjust toward stability. These transitions are crucial for predicting ecosystem reactions and guiding effective conservation. Our study introduces a model that uses convex combinations to blend pairwise and higher-order interactions (HOIs), offering a more realistic view of natural ecosystems. We find that pairwise interactions slow the journey to stability, while HOIs speed it up. Employing global stability analysis and numerical simulations, we establish that as the proportion of HOIs increases, mean transient times exhibit a significant reduction, thereby underscoring the essential role of HOIs in enhancing biodiversity stabilization. Our results reveal a robust correlation between the most negative real part of the eigenvalues of the Jacobian matrix associated with the linearized system at the coexistence equilibrium and the mean transient times. This indicates that a more negative leading eigenvalue correlates with accelerated convergence to stable coexistence abundances. This insight is vital for comprehending ecosystem resilience and recovery, emphasizing the key role of HOIs in promoting stabilization. Amid growing interest in transient dynamics and its implications for biodiversity and ecological stability, our study enhances the understanding of how species interactions affect both transient and long-term ecosystem behavior. By addressing a critical gap in ecological theory and offering a practical framework for ecosystem management, our work advances knowledge of transient dynamics, ultimately informing effective conservation strategies.
... Similarly, Ref. [37] investigates higher-order interactions in a memristive Rulkov model network, using master stability functions to analyze synchronization patterns, and demonstrates that incorporating higher-order interactions lowers the required coupling parameters for synchronization while also showing that larger network sizes enhance synchronization dynamics and facilitate cluster synchronization under specific coupling conditions. Many other intriguing studies on higher-order interactions [38][39][40][41][42][43][44][45][46][47][48] exist; however, most of them primarily emphasis on long-term behaviors. ...
Preprint
Full-text available
Understanding how species interactions shape biodiversity is a core challenge in ecology. While much focus has been on long-term stability, there is rising interest in transient dynamics-the short-lived periods when ecosystems respond to disturbances and adjust toward stability. These transitions are crucial for predicting ecosystem reactions and guiding effective conservation. Our study introduces a model that blends pairwise and higher-order interactions, offering a more realistic view of natural ecosystems. We find pairwise interactions slow the journey to stability, while higher-order interactions speed it up. This model provides fresh insights into ecosystem resilience and recovery, helping improve strategies for managing species and ecological disruptions.
... However, not all interactions in complex systems are alike; they may differ in nature, type, and scope. This observation led researchers to introduce the concept of multilayer and multiplex networks (Boccaletti 2014;Kivelä 2014) (Vasilyeva 2021) and a significant potential, however, multiplex hypergraphs remain relatively unexplored, and a general set of tools for their analysis is still missing. ...
Article
Full-text available
A wide variety of complex systems are characterized by interactions of different types involving varying numbers of units. Multiplex hypergraphs serve as a tool to describe such structures, capturing distinct types of higher-order interactions among a collection of units. In this work, we introduce a comprehensive set of measures to describe structural connectivity patterns in multiplex hypergraphs, considering scales from node and hyperedge levels to the system’s mesoscale. We validate our measures with three real-world datasets: scientific co-authorship in physics, movie collaborations, and high school interactions. This validation reveals new collaboration patterns, identifies trends within and across movie subfields, and provides insights into daily interaction dynamics. Our framework aims to offer a more nuanced characterization of real-world systems marked by both multiplex and higher-order interactions.
Article
Investigating the maximum independent set in stochastic multilayer graphs provides critical insights into the structural and dynamical properties of complex networks. Recently, stochastic multilayer graphs effectively model the intricate interactions and interdependencies inherent in real-world systems, including social, biological, and transportation networks. The identification of a maximum independent set -comprising nodes without direct connections- offers a significant understanding of phenomena such as information diffusion, resource allocation, and epidemic spread within complex social networks. For instance, independent sets play a crucial role in identifying influencers -individuals who profoundly impact their peers, propagating information or opinions widely. In this paper, we introduce the stochastic version of the maximum independent set and propose five algorithms based on learning automata to identify maximum independent sets in the stochastic multilayer graphs. Our approach utilizes learning automata to provide a guided sampling from candidate independent sets of the stochastic multilayer graph, aiming to identify the independent set with the maximum expected value while utilizing fewer vertex samples than standard methods that do not incorporate the learning. In addition to proving several mathematical properties of the proposed approach, simulations conducted across diverse stochastic multilayer graphs demonstrate that our learning automata-based algorithms outperform traditional approaches, achieving higher convergence rates and requiring fewer samples.
Article
Involvement of memristive term and additive physical variables including magnetic flux and charge can enhance the physical description of the biophysical neurons. Neural circuits coupled with memristors can be built and tamed to mimic the intrinsic biophysical characteristics and dynamical properties of biological neurons, and these memristive oscillator models are effective in predicting the mode transition in neural activities and self-organization in collective electric behaviors of neural networks. Any proposal of memristive map neurons requires reliable physical description. For example, the energy definition and self-adaptive working mechanism are crucial to verify the reliability of memristive maps. A capacitive variable is useful to describe the membrane potential, while the complexity of ion channels requires careful evaluation and description by using inductive variables relative to the electromagnetic field. In this work, a charge-controlled memristor is connected to an inductor in series for building a hybrid ion channel, and then a capacitor and a nonlinear resistor are combined to couple the ion channel. As a result, a simple memristive neural circuit is designed to discern the inner effect of electric field and magnetic field synchronously. The energy function is defined and verified with theoretical proof. Furthermore, a linear transformation is applied to convert this memristive neuron into a memristive map with an exact energy description, in which its dynamics and mode transition will be controlled by an adaptive law when its energy is beyond the threshold. Additive noise is imposed to induce coherence resonance, which can be detected by using the statistical analysis and average value for Hamilton energy function during changes in noise intensity. This scheme provides guidance for energy definition in memristive maps and the intrinsic energy regulation mechanism in neural activities is explained from physical and dynamical aspects.
Article
Full-text available
We live and cooperate in networks. However, links in networks only allow for pairwise interactions, thus making the framework suitable for dyadic games, but not for games that are played in larger groups. Here, we study the evolutionary dynamics of a public goods game in social systems with higher-order interactions. First, we show that the game on uniform hypergraphs corresponds to the replicator dynamics in the well-mixed limit, providing a formal theoretical foundation to study cooperation in networked groups. Second, we unveil how the presence of hubs and the coexistence of interactions in groups of different sizes affects the evolution of cooperation. Finally, we apply the proposed framework to extract the actual dependence of the synergy factor on the size of a group from real-world collaboration data in science and technology. Our work provides a way to implement informed actions to boost cooperation in social groups.
Article
Full-text available
The complexity of many biological, social and technological systems stems from the richness of the interactions among their units. Over the past decades, a great variety of complex systems has been successfully described as networks whose interacting pairs of nodes are connected by links. Yet, in face-to-face human communication, chemical reactions and ecological systems, interactions can occur in groups of three or more nodes and cannot be simply described just in terms of simple dyads. Until recently, little attention has been devoted to the higher-order architecture of real complex systems. However, a mounting body of evidence is showing that taking the higher-order structure of these systems into account can greatly enhance our modeling capacities and help us to understand and predict their emerging dynamical behaviors. Here, we present a complete overview of the emerging field of networks beyond pairwise interactions. We first discuss the methods to represent higher-order interactions and give a unified presentation of the different frameworks used to describe higher-order systems, highlighting the links between the existing concepts and representations. We review both the measures designed to characterize the structure of these systems, and the models proposed in the literature to generate synthetic structures, such as random and growing simplicial complexes, bipartite graphs and hypergraphs. We then introduce and discuss the rapidly growing research on higher-order dynamical systems and on dynamical topology. We focus on novel emergent phenomena characterizing landmark dynamical processes, such as diffusion, spreading, synchronization and games, when extended beyond pairwise interactions. We elucidate the relations between higher-order topology and dynamical properties, and conclude with a summary of empirical applications, providing an outlook on current modeling and conceptual frontiers.
Article
Full-text available
Networks form the backbone of many complex systems, ranging from the Internet to human societies. Accordingly, not only is the range of our interactions limited and thus best described and modeled by networks, it is also a fact that the networks that are an integral part of such models are often interdependent or even interconnected. Networks of networks or multilayer networks are therefore a more apt description of social systems. This colloquium is devoted to evolutionary games on multilayer networks, and in particular to the evolution of cooperation as one of the main pillars of modern human societies. We first give an overview of the most significant conceptual differences between single-layer and multilayer networks, and we provide basic definitions and a classification of the most commonly used terms. Subsequently, we review fascinating and counterintuitive evolutionary outcomes that emerge due to different types of interdependencies between otherwise independent populations. The focus is on coupling through the utilities of players, through the flow of information, as well as through the popularity of different strategies on different network layers. The colloquium highlights the importance of pattern formation and collective behavior for the promotion of cooperation under adverse conditions, as well as the synergies between network science and evolutionary game theory.
Article
Full-text available
Spreading processes have been largely studied in the literature, both analytically and by means of large-scale numerical simulations. These processes mainly include the propagation of diseases, rumors and information on top of a given population. In the last two decades, with the advent of modern network science, we have witnessed significant advances in this field of research. Here we review the main theoretical and numerical methods developed for the study of spreading processes on complex networked systems. Specifically, we formally define epidemic processes on single and multilayer networks and discuss in details the main methods used to perform numerical simulations. Throughout the review, we classify spreading processes (disease and rumor models) into two classes according to the nature of time: (i) continuous-time and (ii) cellular automata approach, where the second one can be further divided into synchronous and asynchronous updating schemes. Our revision includes the heterogeneous mean-field, the quenched-mean field, and the pair quenched mean-field approaches, as well as their respective simulation techniques, emphasizing similarities and differences among the different techniques. The content presented here offers a whole suite of methods to study epidemic-like processes in complex networks, both for researchers without previous experience in the subject and for experts.
Article
Full-text available
The structure of scientific collaborations has been the object of intense study both for its importance for innovation and scientific advancement, and as a model system for social group coordination and formation thanks to the availability of authorship data. Over the last years, complex networks approach to this problem have yielded important insights and shaped our understanding of scientific communities. In this paper we propose to complement the picture provided by network tools with that coming from using simplicial descriptions of publications and the corresponding topological methods. We show that it is natural to extend the concept of triadic closure to simplicial complexes and show the presence of strong simplicial closure. Focusing on the differences between scientific fields, we find that, while categories are characterized by different collaboration size distributions, the distributions of how many collaborations to which an author is able to participate is conserved across fields pointing to underlying attentional and temporal constraints. We then show that homological cycles, that can intuitively be thought as hole in the network fabric, are an important part of the underlying community linking structure.
Article
Full-text available
Extensive cooperation among unrelated individuals is unique to humans, who often sacrifice personal benefits for the common good and work together to achieve what they are unable to execute alone. The evolutionary success of our species is indeed due, to a large degree, to our unparalleled other-regarding abilities. Yet, a comprehensive understanding of human cooperation remains a formidable challenge. Recent research in social science indicates that it is important to focus on the collective behavior that emerges as the result of the interactions among individuals, groups, and even societies. Non-equilibrium statistical physics, in particular Monte Carlo methods and the theory of collective behavior of interacting particles near phase transition points, has proven to be very valuable for understanding counterintuitive evolutionary outcomes. By studying models of human cooperation as classical spin models, a physicist can draw on familiar settings from statistical physics. However, unlike pairwise interactions among particles that typically govern solid-state physics systems, interactions among humans often involve group interactions, and they also involve a larger number of possible states even for the most simplified description of reality. The complexity of solutions therefore often surpasses that observed in physical systems. Here we review experimental and theoretical research that advances our understanding of human cooperation, focusing on spatial pattern formation, on the spatiotemporal dynamics of observed solutions, and on self-organization that may either promote or hinder socially favorable states.
Article
Representation and analysis of publication data in the form of a network has become a common method of illustrating and evaluating the scientific output of a group or of a scientific field. Co-authorship networks also reveal patterns and collaboration practices. In this paper we propose the use of a hypergraph model—a generalized network—to represent publication data by considering papers as hypergraph nodes. Hyperedges, connecting the nodes, represent the authors connecting all their papers. We show that this representation is more straightforward than other authorship network models. Using the hypergraph model we propose a collaboration measure of an author that reflects the influence of that author over the collaborations of its co-authors. We illustrate the introduced concepts by analyzing publishing data of computer scientists and mathematicians in Romania over a 10 year period.
Article
Network science is today established as a backbone for description of structure and function of various physical, chemical, biological, technological, and social systems. Here we review recent advances in the study of complex biological systems that were inspired and enabled by methods of network science. First, we present research highlights ranging from determination of the molecular interaction network within a cell to studies of architectural and functional properties of brain networks and biological transportation networks. Second, we focus on synergies between network science and data analysis, which enable us to determine functional connectivity patterns in multicellular systems. Until now, this intermediate scale of biological organization received the least attention from the network perspective. As an example, we review the methodology for the extraction of functional beta cell networks in pancreatic islets of Langerhans by means of advanced imaging techniques. Third, we concentrate on the emerging field of multilayer networks and review the first endeavors and novel perspectives offered by this framework in exploring biological complexity. We conclude by outlining challenges and directions for future research that encompass utilization of the multilayer network formalism in exploring intercellular communication patterns in tissues, and we advocate for network science being one of the key pillars for assessing physiological function of complex biological systems-from organelles to organs-in health and disease.