Preprint

Community detection in multi-layer networks by regularized debiased spectral clustering

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the author.

Abstract

Community detection is a crucial problem in the analysis of multi-layer networks. In this work, we introduce a new method, called regularized debiased sum of squared adjacency matrices (RDSoS), to detect latent communities in multi-layer networks. RDSoS is developed based on a novel regularized Laplacian matrix that regularizes the debiased sum of squared adjacency matrices. In contrast, the classical regularized Laplacian matrix typically regularizes the adjacency matrix of a single-layer network. Therefore, at a high level, our regularized Laplacian matrix extends the classical regularized Laplacian matrix to multi-layer networks. We establish the consistency property of RDSoS under the multi-layer stochastic block model (MLSBM) and further extend RDSoS and its theoretical results to the degree-corrected version of the MLSBM model. The effectiveness of the proposed methods is evaluated and demonstrated through synthetic and real datasets.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The development of models and methodology for the analysis of data from multiple heterogeneous networks is of importance both in statistical network theory and across a wide spectrum of application domains. Although single-graph analysis is well-studied, multiple graph inference is largely unexplored, in part because of the challenges inherent in appropriately modeling graph differences and yet retaining sufficient model simplicity to render estimation feasible. This paper addresses exactly this gap, by introducing a new model, the common subspace independent-edge multiple random graph model, which describes a heterogeneous collection of networks with a shared latent structure on the vertices but potentially different connectivity patterns for each graph. The model encompasses many popular network representations, including the stochastic blockmodel. The model is both flexible enough to meaningfully account for important graph differences, and tractable enough to allow for accurate inference in multiple networks. In particular, a joint spectral embedding of adjacency matrices-the multiple adjacency spectral embedding-leads to simultaneous consistent estimation of underlying parameters for each graph. Under mild additional assumptions, the estimates satisfy asymptotic normality and yield improvements for graph eigenvalue estimation. In both simulated and real data, the model and the embedding can be deployed for a number of subsequent network inference tasks, including dimensionality reduction, classification, hypothesis testing, and community detection. Specifically, when the embedding is applied to a data set of connectomes constructed through diffusion magnetic resonance imaging, the result is an accurate classification of brain scans by human subject and a meaningful determination of heterogeneity across scans of different individuals.
Article
Full-text available
Community detection is one of the most popular researches in a variety of complex systems, ranging from biology to sociology. In recent years, there's an increasing focus on the rapid development of more complicated networks, namely multilayer networks. Communities in a single-layer network are groups of nodes that are more strongly connected among themselves than the others, while in multilayer networks, a group of well-connected nodes are shared in multiple layers. Most traditional algorithms can rarely perform well on a multilayer network without modifications. Thus, in this paper, we offer overall comparisons of existing works and analyze several representative algorithms, providing a comprehensive understanding of community detection methods in multilayer networks. The comparison results indicate that the promoting of algorithm efficiency and the extending for general multilayer networks are also expected in the forthcoming studies.
Article
Full-text available
Finding functional modules in gene regulation networks is an important task in systems biology. Many methods have been proposed for finding communities in static networks; however, the application of such methods is limited due to the dynamic nature of gene regulation networks. In this paper, we first propose a statistical framework for detecting common modules in the Drosophila melanogaster time-varying gene regulation network. We then develop both a significance test and a robustness test for the identified modular structure. We apply an enrichment analysis to our community findings, which reveals interesting results. Moreover, we investigate the consistency property of our proposed method under a time-varying stochastic block model framework with a temporal correlation structure. Although we focus on gene regulation networks in our work, our method is general and can be applied to other time-varying networks.
Article
Full-text available
Community detection in networks is one of the most popular topics of modern network science. Communities, or clusters, are usually groups of vertices having higher probability of being connected to each other than to members of other groups, though other patterns are possible. Identifying communities is an ill-defined problem. There are no universal protocols on the fundamental ingredients, like the definition of community itself, nor on other crucial issues, like the validation of algorithms and the comparison of their performances. This has generated a number of confusions and misconceptions, which undermine the progress in the field. We offer a guided tour through the main aspects of the problem. We also point out strengths and weaknesses of popular methods, and give directions to their use.
Article
Full-text available
The transcriptional underpinnings of brain development remain poorly understood, particularly in humans and closely related non-human primates. We describe a high-resolution transcriptional atlas of rhesus monkey (Macaca mulatta) brain development that combines dense temporal sampling of prenatal and postnatal periods with fine anatomical division of cortical and subcortical regions associated with human neuropsychiatric disease. Gene expression changes more rapidly before birth, both in progenitor cells and maturing neurons. Cortical layers and areas acquire adult-like molecular profiles surprisingly late in postnatal development. Disparate cell populations exhibit distinct developmental timing of gene expression, but also unexpected synchrony of processes underlying neural circuit construction including cell projection and adhesion. Candidate risk genes for neurodevelopmental disorders including primary microcephaly, autism spectrum disorder, intellectual disability, and schizophrenia show disease-specific spatiotemporal enrichment within developing neocortex. Human developmental expression trajectories are more similar to monkey than rodent, although approximately 9% of genes show human-specific regulation with evidence for prolonged maturation or neoteny compared to monkey.
Article
Full-text available
Many complex systems can be represented as networks consisting of distinct types of interactions, which can be categorized as links belonging to different layers. For example, a good description of the full protein–protein interactome requires, for some organisms, up to seven distinct network layers, accounting for different genetic and physical interactions, each containing thousands of protein–protein relationships. A fundamental open question is then how many layers are indeed necessary to accurately represent the structure of a multilayered complex system. Here we introduce a method based on quantum theory to reduce the number of layers to a minimum while maximizing the distinguishability between the multilayer network and the corresponding aggregated graph. We validate our approach on synthetic benchmarks and we show that the number of informative layers in some real multilayer networks of protein–genetic interactions, social, economical and transportation systems can be reduced by up to 75%.
Article
Full-text available
Modern social networks frequently encompass multiple distinct types of connectivity information; for instance, explicitly acknowledged friend relationships might complement behavioral measures that link users according to their actions or interests. One way to represent these networks is as multi-layer graphs, where each layer contains a unique set of edges over the same underlying vertices (users). Edges in different layers typically have related but distinct semantics; depending on the application multiple layers might be used to reduce noise through averaging, to perform multifaceted analyses, or a combination of the two. However, it is not obvious how to extend standard graph analysis techniques to the multi-layer setting in a flexible way. In this paper we develop latent variable models and methods for mining multi-layer networks for connectivity patterns based on noisy data.
Article
Full-text available
Consider a network where the nodes split into K different communities. The community labels for the nodes are unknown and it is of major interest to estimate them (i.e., community detection). Degree Corrected Block Model (DCBM) is a popular network model. How to detect communities with the DCBM is an interesting problem, where the main challenge lies in the degree heterogeneity. We propose a new approach to community detection which we call the Spectral Clustering On Ratios-of-Eigenvectors (SCORE). Compared to classical spectral methods, the main innovation is to use the entry-wise ratios between the first leading eigenvector and each of the other leading eigenvectors for clustering. The central surprise is, the effect of degree heterogeneity is largely ancillary, and can be effectively removed by taking entry-wise ratios between the leading eigenvectors. The method is successfully applied to the web blogs data and the karate club data, with error rates of 58/1222 and 1/34, respectively. These results are much more satisfactory than those by the classical spectral methods. Also, compared to modularity methods, SCORE is computationally much faster and has smaller error rates. We develop a theoretic framework where we show that under mild conditions, the SCORE stably yields successful community detection. In the core of the analysis is the recent development on Random Matrix Theory (RMT), where the matrix-form Bernstein inequality is especially helpful.
Article
Full-text available
Firms are increasingly seeking to harness the potential of social networks for marketing purposes. Therefore, marketers are interested in understanding the antecedents and consequences of relationship formation within networks and in predicting interactivity among users. The authors develop an integrated statistical framework for simultaneously modeling the connectivity structure of multiple relationships of different types on a common set of actors. Their modeling approach incorporates several distinct facets to capture both the determinants of relationships and the structural characteristics of multiplex and sequential networks. They develop hierarchical Bayesian methods for estimation and illustrate their model with two applications: the first application uses a sequential network of communications among managers involved in new product development activities, and the second uses an online collaborative social network of musicians. The authors' applications demonstrate the benefits of modeling multiple relations jointly for both substantive and predictive purposes. They also illustrate how information in one relationship can be leveraged to predict connectivity in another relation.
Article
Full-text available
This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings. We first identify several application scenarios for the resultant 'knowledge reuse' framework that we call cluster ensembles. The cluster ensemble problem is then formalized as a combinatorial optimization problem in terms of shared mutual information. In addition to a direct maximization approach, we propose three effective and efficient techniques for obtaining high-quality combiners (consensus functions). The first combiner induces a similarity measure from the partitionings and then reclusters the objects. The second combiner is based on hypergraph partitioning. The third one collapses groups of clusters into meta-clusters which then compete for each object to determine the combined clustering. Due to the low computational costs of our techniques, it is quite feasible to use a supra-consensus function that evaluates all three approaches against the objective function and picks the best solution for a given situation. We evaluate the effectiveness of cluster ensembles in three qualitatively different application scenarios: (i) where the original clusters were formed based on non-identical sets of features, (ii) where the original clustering algorithms worked on non-identical sets of objects, and (iii) where a common data-set is used and the main purpose of combining multiple clusterings is to improve the quality and robustness of the solution. Promising results are obtained in all three situations for synthetic as well as real data-sets.
Article
Full-text available
The modern science of networks has brought significant advances to our understanding of complex systems. One of the most relevant features of graphs representing real systems is community structure, or clustering, i. e. the organization of vertices in clusters, with many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters. Such clusters, or communities, can be considered as fairly independent compartments of a graph, playing a similar role like, e. g., the tissues or the organs in the human body. Detecting communities is of great importance in sociology, biology and computer science, disciplines where systems are often represented as graphs. This problem is very hard and not yet satisfactorily solved, despite the huge effort of a large interdisciplinary community of scientists working on it over the past few years. We will attempt a thorough exposition of the topic, from the definition of the main elements of the problem, to the presentation of most methods developed, with a special focus on techniques designed by statistical physicists, from the discussion of crucial issues like the significance of clustering and how methods should be tested and compared against each other, to the description of applications to real networks. Comment: Review article. 103 pages, 42 figures, 2 tables. Two sections expanded + minor modifications. Three figures + one table + references added. Final version published in Physics Reports
Article
Full-text available
Network Notation Networks are often characterized by clusters of constituents that interact more closely with each other and have more connections to one another than they do with the rest of the components of the network. However, systematically identifying and studying such community structure in complicated networks is not easy, especially when the network interactions change over time or contain multiple types of connections, as seen in many biological regulatory networks or social networks. Mucha et al. (p. 876 ) developed a mathematical method to allow detection of communities that may be critical functional units of such networks. Application to real-world tasks—like making sense of the voting record in the U.S. Senate—demonstrated the promise of the method.
Article
Full-text available
We pursue the hypothesis that neuronal placement in animals minimizes wiring costs for given functional constraints, as specified by synaptic connectivity. Using a newly compiled version of the Caenorhabditis elegans wiring diagram, we solve for the optimal layout of 279 nonpharyngeal neurons. In the optimal layout, most neurons are located close to their actual positions, suggesting that wiring minimization is an important factor. Yet some neurons exhibit strong deviations from “optimal” position. We propose that biological factors relating to axonal guidance and command neuron functions contribute to these deviations. We capture these factors by proposing a modified wiring cost function. • Caenorhabditis elegans • optimal placement
Article
Full-text available
We consider the problem of fuzzy community detection in networks, which complements and expands the concept of overlapping community structure. Our approach allows each vertex of the graph to belong to multiple communities at the same time, determined by exact numerical membership degrees, even in the presence of uncertainty in the data being analyzed. We create an algorithm for determining the optimal membership degrees with respect to a given goal function. Based on the membership degrees, we introduce a measure that is able to identify outlier vertices that do not belong to any of the communities, bridge vertices that have significant membership in more than one single community, and regular vertices that fundamentally restrict their interactions within their own community, while also being able to quantify the centrality of a vertex with respect to its dominant community. The method can also be used for prediction in case of uncertainty in the data set analyzed. The number of communities can be given in advance, or determined by the algorithm itself, using a fuzzified variant of the modularity function. The technique is able to discover the fuzzy community structure of different real world networks including, but not limited to, social networks, scientific collaboration networks, and cortical networks, with high confidence.
Article
Spectral clustering is widely used for detecting clusters in networks for community detection, while a small change on the graph Laplacian matrix could bring a dramatic improvement. In this paper, we propose a dual regularized graph Laplacian matrix and then employ it to the classical spectral clustering approach under the degree-corrected stochastic block model. If the number of communities is known as K, we consider more than K leading eigenvectors and weight them by their corresponding eigenvalues in the spectral clustering procedure to improve the performance. The improved spectral clustering method is dual regularized spectral clustering (DRSC). Theoretical analysis of DRSC shows that under mild conditions it yields stable consistent community detection. Meanwhile, we develop a strategy by taking advantage of DRSC and Newman’s modularity to estimate the number of communities K. We compare the performance of DRSC with several spectral methods and investigate the behaviors of our strategy for estimating K by substantial simulated networks and real-world networks. Numerical results show that DRSC enjoys satisfactory performance and our strategy on estimating K performs accurately and consistently, even in cases where there is only one community in a network.
Article
We consider the problem of estimating common community structures in multi-layer stochastic block models, where each single layer may not have sufficient signal strength to recover the full community structure. In order to efficiently aggregate signal across different layers, we argue that the sum-of-squared adjacency matrices contain sufficient signal even when individual layers are very sparse. Our method uses a bias-removal step that is necessary when the squared noise matrices may overwhelm the signal in the very sparse regime. The analysis of our method relies on several novel tail probability bounds for matrix linear combinations with matrix-valued coefficients and matrix-valued quadratic forms, which may be of independent interest. The performance of our method and the necessity of bias removal is demonstrated in synthetic data and in microarray analysis about gene co-expression networks. Supplementary materials for this article are available online.
Article
Community detection, a fundamental task for network analysis, aims to partition a network into multiple sub-structures to help reveal their latent functions. Classical approaches to community detection typically utilize probabilistic graphical models and adopt a variety of prior knowledge to infer community structures. As the problems that network methods try to solve and the network data to be analyzed become increasingly more sophisticated, new approaches have also been proposed and developed, particularly those that utilize deep learning and convert networked data into low dimensional representation. Despite all the recent advancement, there is still a lack of insightful understanding of the theoretical and methodological underpinning of community detection, which will be critically important for future development of the area of network analysis. In this paper, we develop and present a unified architecture of network community-finding methods to characterize the state-of-the-art of the field of community detection. Specifically, we provide a comprehensive review of the existing community detection methods and introduce a new taxonomy that divides the existing methods into two categories, probabilistic graphical model and deep learning. We then discuss in detail the main idea behind each method in two categories. Furthermore, we highlight their applications to various network analysis tasks.
Article
We consider multi-layer network data where the relationships between pairs of elements are reflected in multiple modalities, and may be described by multivariate or even high-dimensional vectors. Under the multi-layer stochastic block model framework we derive consistency results for a least squares estimation of memberships. Our theorems show that, as compared to single-layer community detection, a multi-layer network provides much richer information that allows for consistent community detection from a much sparser network, with required edge density reduced by a factor of the square root of the number of layers. Moreover, the multi-layer framework can detect cohesive community structure across layers, which might be hard to detect by any single-layer or simple aggregation. Simulations and a data example are provided to support the theoretical results.
Article
In this article, we propose and study the performance of spectral community detection for a family of “α-normalized” adjacency matrices A, of the type D−αAD−α with D the degree matrix, in heterogeneous dense graph models. We show that the previously used normalization methods based on A or D⁻¹AD⁻¹ are in general suboptimal in terms of correct recovery rates and, relying on advanced random matrix methods, we prove instead the existence of an optimal value αopt of the parameter α in our generic model; we further provide an online estimation of αopt only based on the node degrees in the graph. Numerical simulations show that the proposed method outperforms state-of-the-art spectral approaches on moderately dense to dense heterogeneous graphs.
Article
We consider the problem of estimating a consensus community structure by combining information from multiple layers of a multi-layer network or multiple snapshots of a time-varying network. Numerous methods have been proposed in the literature for the more general problem of multi-view clustering in the past decade based on the spectral clustering or a low-rank matrix factorization. As a general theme, these "intermediate fusion" methods involve obtaining a low column rank matrix by optimizing an objective function and then using the columns of the matrix for clustering. However, the theoretical properties of these methods remain largely unexplored and most researchers have relied on the performance in synthetic and real data to assess the goodness of the procedures. In the absence of statistical guarantees on the objective functions, it is difficult to determine if the algorithms optimizing the objective will return a good community structure. We apply some of these methods for consensus community detection in multi-layer networks and investigate the consistency properties of the global optimizer of the objective functions under the multi-layer stochastic blockmodel. We derive several new asymptotic results showing consistency of the intermediate fusion techniques along with the spectral clustering of mean adjacency matrix under a high dimensional setup, where the number of nodes, the number of layers and the number of communities of the multi-layer graph grow. Our numerical study shows that in comparison to the intermediate fusion techniques, late fusion methods, namely spectral clustering on aggregate spectral kernel and module allegiance matrix, under-perform in sparse networks, while the spectral clustering of mean adjacency matrix under-performs in multi-layer networks that contain layers with both homophilic and heterophilic clusters.
Article
Multi-layer networks are networks on a set of entities (nodes) with multiple types of relations (edges) among them where each type of relation/interaction is represented as a network layer. As with single layer networks, community detection is an important task in multi-layer networks. A large group of popular community detection methods in networks are based on optimizing a quality function known as the modularity score, which is a measure of presence of modules or communities in networks. Hence a first step in community detection is defining a suitable modularity score that is appropriate for the network in question. Here we introduce several multi-layer network modularity measures under different null models of the network, motivated by empirical observations in networks from a diverse field of applications. In particular we define the multi-layer configuration model, the multi-layer expected degree model and their various modifications as null models for multi-layer networks to derive different modularities. The proposed modularities are grouped into two categories. The first category, which is based on degree corrected multi-layer stochastic block model, has the multi-layer expected degree model as their null model. The second category, which is based on multi-layer extensions of Newman-Girvan modularity, has the multi-layer configuration model as their null model. These measures are then optimized to detect the optimal community assignment of nodes. We compare the effectiveness of the measures in community detection in simulated networks and then apply them to four real networks.
Article
The performance of spectral clustering can be considerably improved via regularization, as demonstrated empirically in Amini et al. [Ann. Statist. 41 (2013) 2097a "2122]. Here, we provide an attempt at quantifying this improvement through theoretical analysis. Under the stochastic block model (SBM), and its extensions, previous results on spectral clustering relied on the minimum degree of the graph being sufficiently large for its good performance. By examining the scenario where the regularization parameter Ï., is large, we show that the minimum degree assumption can potentially be removed. As a special case, for an SBM with two blocks, the results require the maximum degree to be large (grow faster than logn) as opposed to the minimum degree. More importantly, we show the usefulness of regularization in situations where not all nodes belong to well-defined clusters. Our results rely on a bias-variance'-like trade-off that arises from understanding the concentration of the sample Laplacian and the eigengap as a function of the regularization parameter. As a byproduct of our bounds, we propose a data-driven technique DKest (standing for estimated Davis "Kahan bounds) for choosing the regularization parameter. This technique is shown to work well through simulations and on a real data set.
Article
We analyze the performance of spectral clustering for community extraction in sparse stochastic block models. We show that, under mild conditions, spectral clustering applied to the adjacency matrix of the network can consistently recover hidden communities even when the order of magnitude of the maximum expected degree is as small as logn\log n, with n the number of nodes. This result applies to some polynomial time spectral clustering algorithms and is further extended to degree corrected stochastic block models using a spherical k-median spectral clustering method. The key components of our analysis are a careful perturbation analysis of the principal subspaces of the adjacency matrix and a combinatorial bound on the spectrum of binary random matrices, which is sharper than the conventional matrix Bernstein inequality and may be of independent interest.
Article
Spectral clustering is a fast and popular algorithm for finding clusters in networks. Recently, Chaudhuri et al. (2012) and Amini et al.(2012) proposed inspired variations on the algorithm that artificially inflate the node degrees for improved statistical performance. The current paper extends the previous statistical estimation results to the more canonical spectral clustering algorithm in a way that removes any assumption on the minimum degree and provides guidance on the choice of the tuning parameter. Moreover, our results show how the "star shape" in the eigenvectors--a common feature of empirical networks--can be explained by the Degree-Corrected Stochastic Blockmodel and the Extended Planted Partition model, two statistical models that allow for highly heterogeneous degrees. Throughout, the paper characterizes and justifies several of the variations of the spectral clustering algorithm in terms of these models.
Article
A stochastic model is proposed for social networks in which the actors in a network are partitioned into subgroups called blocks. The model provides a stochastic generalization of the blockmodel. Estimation techniques are developed for the special case of a single relation social network, with blocks specified a priori. An extension of the model allows for tendencies toward reciprocation of ties beyond those explained by the partition. The extended model provides a one degree-of-freedom test of the model. A numerical example from the social network literature is used to illustrate the methods.
Article
Stochastic blockmodels have been proposed as a tool for detecting community structure in networks as well as for generating synthetic networks for use as benchmarks. Most blockmodels, however, ignore variation in vertex degree, making them unsuitable for applications to real-world networks, which typically display broad degree distributions that can significantly affect the results. Here we demonstrate how the generalization of blockmodels to incorporate this missing element leads to an improved objective function for community detection in complex networks. We also propose a heuristic algorithm for community detection using this objective function or its non-degree-corrected counterpart and show that the degree-corrected version dramatically outperforms the uncorrected one in both real-world and synthetic networks.
Article
This paper presents new probability inequalities for sums of independent, random, self-adjoint matrices. These results place simple and easily verifiable hypotheses on the summands, and they deliver strong conclusions about the large-deviation behavior of the maximum eigenvalue of the sum. Tail bounds for the norm of a sum of random rectangular matrices follow as an immediate corollary. The proof techniques also yield some information about matrix-valued martingales. In other words, this paper provides noncommutative generalizations of the classical bounds associated with the names Azuma, Bennett, Bernstein, Chernoff, Hoeffding, and McDiarmid. The matrix inequalities promise the same diversity of application, ease of use, and strength of conclusion that have made the scalar inequalities so valuable.
Article
The most promising class of statistical models for expressing structural properties of social networks observed at one moment in time is the class of exponential random graph models (ERGMs), also known as p* models. The strong point of these models is that they can represent a variety of structural tendencies, such as transitivity, that define complicated dependence patterns not easily modeled by more basic probability models. Recently, Markov chain Monte Carlo (MCMC) algorithms have been developed that produce approximate maximum likelihood estimators. Applying these models in their traditional specification to observed network data often has led to problems, however, which can be traced back to the fact that important parts of the parameter space correspond to nearly degenerate distributions, which may lead to convergence problems of estimation algorithms, and a poor fit to empirical data. This paper proposes new specifications of exponential random graph models. These specifications represent structural properties such as transitivity and heterogeneity of degrees by more complicated graph statistics than the traditional star and triangle counts. Three kinds of statistics are proposed: geometrically weighted degree distributions, alternating k-triangles, and alternating independent two-paths. Examples are presented both of modeling graphs and digraphs, in which the new specifications lead to much better results than the earlier existing specifications of the ERGM. It is concluded that the new specifications increase the range and applicability of the ERGM as a tool for the statistical analysis of social networks.
Article
The problem of comparing two different partitions of a finite set of objects reappears continually in the clustering literature. We begin by reviewing a well-known measure of partition correspondence often attributed to Rand (1971), discuss the issue of correcting this index for chance, and note that a recent normalization strategy developed by Morey and Agresti (1984) and adopted by others (e.g., Miligan and Cooper 1985) is based on an incorrect assumption. Then, the general problem of comparing partitions is approached indirectly by assessing the congruence of two proximity matrices using a simple cross-product measure. They are generated from corresponding partitions using various scoring rules. Special cases derivable include traditionally familiar statistics and/or ones tailored to weight certain object pairs differentially. Finally, we propose a measure based on the comparison of object triples having the advantage of a probabilistic interpretation in addition to being corrected for chance (i.e., assuming a constant value under a reasonable null hypothesis) and bounded between ±1.
Article
A network is said to show assortative mixing if the nodes in the network that have many connections tend to be connected to other nodes with many connections. Here we measure mixing patterns in a variety of networks and find that social networks are mostly assortatively mixed, but that technological and biological networks tend to be disassortative. We propose a model of an assortatively mixed network, which we study both analytically and numerically. Within this model we find that networks percolate more easily if they are assortative and that they are also more robust to vertex removal.
Article
We propose and study a set of algorithms for discovering community structure in networks-natural divisions of network nodes into densely connected subgroups. Our algorithms all share two definitive features: first, they involve iterative removal of edges from the network to split it into communities, the edges removed being identified using any one of a number of possible "betweenness" measures, and second, these measures are, crucially, recalculated after each removal. We also propose a measure for the strength of the community structure found by our algorithms, which gives us an objective metric for choosing the number of communities into which a network should be divided. We demonstrate that our algorithms are highly effective at discovering community structure in both computer-generated and real-world network data, and show how they can be used to shed light on the sometimes dauntingly complex structure of networked systems.
The structure and dynamics of multilayer networks
  • S Boccaletti
  • G Bianconi
  • R Criado
  • C I Del Genio
  • J Gómez-Gardenes
  • M Romance
  • I Sendina-Nadal
  • Z Wang
  • M Zanin
Boccaletti, S., Bianconi, G., Criado, R., Del Genio, C. I., Gómez-Gardenes, J., Romance, M., Sendina-Nadal, I., Wang, Z., & Zanin, M. (2014). The structure and dynamics of multilayer networks. Physics Reports, 544, 1-122.
Regularized spectral methods for clustering signed networks
  • M Cucuringu
  • A V Singh
  • D Sulem
  • H Tyagi
Cucuringu, M., Singh, A. V., Sulem, D., & Tyagi, H. (2021). Regularized spectral methods for clustering signed networks. Journal of Machine Learning Research, 22, 1-79.
Strong consistency, graph laplacians, and the stochastic block model
  • S Deng
  • S Ling
  • T Strohmer
Deng, S., Ling, S., & Strohmer, T. (2021). Strong consistency, graph laplacians, and the stochastic block model. Journal of Machine Learning Research, 22, 1-44.
Consistent estimation of dynamic and multi-layer block models
  • Q Han
  • K Xu
  • E Airoldi
Han, Q., Xu, K., & Airoldi, E. (2015). Consistent estimation of dynamic and multi-layer block models. In International Conference on Machine Learning (pp. 1511-1520). PMLR.
  • M Magnani
  • B Micenkova
  • L Rossi
Magnani, M., Micenkova, B., & Rossi, L. (2013). Combinatorial analysis of multiple networks. arXiv preprint arXiv:1303.4986,.
Community detection by spectral methods in multi-layer networks
  • H Qing
Qing, H. (2024). Community detection by spectral methods in multi-layer networks. arXiv preprint arXiv:2403.12540,.