Article

Clustering and Community Detection in Directed Networks: A Survey

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Networks (or graphs) appear as dominant structures in diverse domains, including sociology, biology, neuroscience and computer science. In most of the aforementioned cases graphs are directed - in the sense that there is directionality on the edges, making the semantics of the edges non symmetric. An interesting feature that real networks present is the clustering or community structure property, under which the graph topology is organized into modules commonly called communities or clusters. The essence here is that nodes of the same community are highly similar while on the contrary, nodes across communities present low similarity. Revealing the underlying community structure of directed complex networks has become a crucial and interdisciplinary topic with a plethora of applications. Therefore, naturally there is a recent wealth of research production in the area of mining directed graphs - with clustering being the primary method and tool for community detection and evaluation. The goal of this paper is to offer an in-depth review of the methods presented so far for clustering directed networks along with the relevant necessary methodological background and also related applications. The survey commences by offering a concise review of the fundamental concepts and methodological base on which graph clustering algorithms capitalize on. Then we present the relevant work along two orthogonal classifications. The first one is mostly concerned with the methodological principles of the clustering algorithms, while the second one approaches the methods from the viewpoint regarding the properties of a good cluster in a directed network. Further, we present methods and metrics for evaluating graph clustering results, demonstrate interesting application domains and provide promising future research directions.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Most of these existing methods discover higher-order communities by leveraging motif connection density, which is dense motif connections within the same community and sparse motif connections between different communities. However, another important property of directed networks, namely edge directionality [25], is largely ignored. ...
... The edge directionality determines the direction of information or influence transfer from one node to another, thus shaping the dynamic information flow within the network [28], [29]. This information flow can lead to the emergence of specific movement and propagation patterns, which can be identified as pattern-based communities [25]. For example, a circular link pattern of flow (named flow-cycle) forms a typical pattern-based community, where the majority of flow through the links stays within the community instead of flowing out [25]. ...
... This information flow can lead to the emergence of specific movement and propagation patterns, which can be identified as pattern-based communities [25]. For example, a circular link pattern of flow (named flow-cycle) forms a typical pattern-based community, where the majority of flow through the links stays within the community instead of flowing out [25]. Such special and complex pattern-based communities usually cannot be fully captured by density-based characteristics alone. ...
Article
Full-text available
Higher-order community detection in real-life networks has recently gained significant attention, because motif-based communities reflect not only higher-order mesoscale structures but also functional characteristics. However, motif-based communities detected by existing methods for directed networks often disregard edge directionality (non-reciprocal directional arcs), so they typically fail to comprehensively reveal intrinsic characteristics of higher-order topology and information flow. To address this issue, firstly, we model higher-order directed community detection as a bi-objective optimization problem, aiming to provide high-quality and diverse compromise partitions that capture both characteristics. Secondly, we introduce a Multi-Objective Genetic Algorithm based on Motif density and Information flow (MOGA-MI) to approximate the Pareto optimal higher-order directed community partitions. On the one hand, an Arc-and-Motif Neighbor based Genetic generator (AMNGA) is developed to generate high-quality and diverse offspring individuals; on the other hand, a Higher-order Directed Neighbor Community Modification (HD-NCM) operation is designed to further improve generated partitions by modifying easily-confused nodes into more appropriate motif-neighbor communities. Finally, experimental results demonstrate that the proposed MOGA-MI outperforms state-of-the-art algorithms in terms of higher-order topology and information flow indicators, while providing more diverse community information.
... Contrary to clustering, which classifies data sets based on similarities and dissimilarities between their data, community detection algorithms are used to identify subcommunities within a graph of links. Thus, clustering algorithms, such as K-means or Hierarchical clustering [15] or clustering analysis performed by programs such as the SPSS statistical package, attempt to group together the objects that share the same characteristics, while community detection tries to find communities of closely connected and less densely interconnected nodes [46]. In this way, community detection is more suitable for understanding and analyzing the structure of large and complex networks, which depend on a single attribute type called edges. ...
... This worrying perception is also seen in the fact that men and women are equally paid in STEM, with a high number of students (around 40%) agreeing with this statement. The latest figures say otherwise, with men in the European technology sector earning 19% more than women [27] and in engineering sectors, the gender pay gap ranges from 21% to 36% [46,67]. Indeed, in Sterling et al. [77] authors argue that the pay gap is a reality and is due to cultural beliefs about the worth of women in STEM professions or their motivation and self-esteem regarding their self-efficacy. ...
Article
Digital societies require professionals in the Technology and Engineering sectors, but their lack, particularly of women, requires a thorough understanding of this gender gap. This research analyzes the beliefs and opinions of university engineering students about the gender gap in their professional fields by means of a community detection algorithm to identify groups of students with similar belief patterns. This study leverages a community detection algorithm to analyze the beliefs of 590 engineering students regarding the gender gap in their field, together with a correlational and explanatory design using a quantitative paradigm. A validated questionnaire focusing on the professional dimension was used. The algorithm identified three student communities, two gender‐sensitive and one gender‐insensitive. The study uncovered a concerning lack of awareness regarding the gender gap among engineering students. Many participants did not recognize the importance of increasing the representation of professional women, maintained the belief that the gender gap affects only women, and assumed that men and women are equally paid. However, women show a higher level of awareness, while men perceive the gender gap as a passing trend, which is worrying. Students recognize the importance of integrating a gender perspective into university and engineering curricula. It is worrying that many students doubt the existence of the gender gap and that both genders lack knowledge about gender gap issues. Finally, community detection algorithms could efficiently and automatically analyze gender gap issues or other unrelated topics.
... Degree heterogeneity refers to the variation in the number of connections each node has (Estrada 2019). Clustering measures the tendency of nodes to form clusters or tightly connected groups (Malliaros and Vazirgiannis 2013). Transitivity quantifies the likelihood that if the connection of nodes can be transitive to other nodes (Burda et al. 2004). ...
... However, in social network analysis, network clustering refers to grouping nodes into clusters according to their characteristics and attributes. The simplest and most common definition is that a cluster corresponds to a set of nodes, and there are more connections within the cluster and fewer connections between the clusters and rest of the graph (Malliaros and Vazirgiannis 2013). ...
Article
Full-text available
Network reduction and clustering are important techniques for analyzing large-scale network. This study proposed an analytic framework that considers degree distribution, clustering coefficient distribution, K-core, KS-statistic, and normalized adjusted ratio sampling (NARS) to measure and compare the social network dataset before and after reduction. The proposed NARS is to ensure that the comparison metric can obtain a fair share of nodes based on cluster size. To evaluate the framework, 20 datasets of undirected networks were examined. Results show that the proposed framework can provide multiple aspects of measurements to evaluate the reduced network and original network. The study also found that random walk, one of network reduction method, and its improved version, induced subgraph random walk methods, seems to perform equivalently if considering multiple metrics although random walk has faster computational time.
... The impact of graph partition is deep across many domains. In social networks, it enables the identification of groups with shared interests or behaviors, which can enhance recommendation systems and targeted marketing strategies [10], [11]. In biological networks, graph partition reveals functional modules or protein complexes, and contributes to a better understanding of cellular processes and disease mechanisms [12], [13]. ...
Preprint
Full-text available
We introduce Quantum Hamiltonian Descent as a novel approach to solve the graph partition problem. By reformulating graph partition as a Quadratic Unconstrained Binary Optimization (QUBO) problem, we leverage QHD's quantum-inspired dynamics to identify optimal community structures. Our method implements a multi-level refinement strategy that alternates between QUBO formulation and QHD optimization to iteratively improve partition quality. Experimental results demonstrate that our QHD-based approach achieves superior modularity scores (up to 5.49\%) improvement with reduced computational overhead compared to traditional optimization methods. This work establishes QHD as an effective quantum-inspired framework for tackling graph partition challenges in large-scale networks.
... An interesting feature that real networks present is the clustering or community structure property, under which a graph topology is organized into modules commonly called communities or clusters [24]. These communities are subgraphs that have more connections internally than externally [23] [25] [26]. ...
Article
Full-text available
An interesting feature that networks present is the community structure property, under which a graph topology is organized into modules called communities. This paper proposes a co-authorship graph-based methodology for studying thematic communities. The graph is divided into thematic communities to identify their basic characteristics, such as research thematic direction, researcher count, community count and relationship between researchers. This information could be used to make incentives policies that would support pertinent research areas. The proposed methodology was tested using data retrieved from mathematical portal Math-Net.Ru. The findings of the tests indicate that more research in robots and robotic systems, combustion/explosion, and data protection techniques/systems need to be promoted. It was demonstrated that the mathematical models employed were adequate and applicable to different research domains. However, this would need complete and reliable basic bibliographic data on research co-authorship within the relevant discipline over a long enough period.
... The connections between the nodes of a network represent the type and strength of their relationship. Clusters of nodes that have a larger number of inner connections between them and fewer connections to other clusters form communities within the graph (Malliaros and Vazirgiannis 2013). Community detection refers to finding these clusters by detecting some underlying labeling function for those nodes based on their similarity that encodes them into Atheer Abdullah Alshahrani ahaalshahrani@kku.edu.sa ...
Article
Full-text available
In this paper, we proposed VGASOM, a neural network approach for community detection. Community detection refers to discovering similar nodes in a graph that form a community having similar features or attributes as opposed to nodes from other communities. The proposed approach combines the capabilities of auto-encoder neural networks, specifically a Variational graph auto-encoder (VGAE) with self-organizing maps (SOM) clustering. VGAEs have achieved great success in learning the latent representation of graphs and therefore encoding them into lower-dimensional embeddings. The self-organizing map based on competitive learning is used to find communities in the graphs’ embeddings obtained by the VGAE model which further reduces its dimensionality and divides the input space into clusters that correspond to the communities in the graph. We conducted experiments to evaluate our model compared to several baseline models, our model shows promising results for the community detection task. It outperforms the state-of-the-art methods by 3.29% in terms of the accuracy and 9% in terms of F1 metric.
... In a study Malliaros and Vazirgiannis describe the community detection in the directed graph or network. There they describe properly the various types of communities and their structure in the directed graph as well as various graph theories [11]. However, the research done by Bedi and Sharma gives more detailed insight in clusters among the social networks. ...
Thesis
Full-text available
One of the alarming and uprising issues of the world is gender inequality in recent decades. It is a widespread problem that affects people all around the world, albeit its manifestations and severity vary depending on society and culture. The research gives a thorough investigation of gender bias in online social networks utilizing community clustering and graph data mining approaches. The research methodology includes gathering Twitter data about gender bias using some specific keywords and utilizing networkX to build a graph representation. To divide the graph into different communities, three well-known community detection algorithms—Louvain, Girvan- Newman, and Walktrap—are used. These algorithms’ effectiveness is assessed using extrinsic metrics like V-measure and normalized mutual information (NMI), as well as intrinsic metrics like F1 score, recall, and precision. The characteristics of the selected communities are also studied using descriptive statistics and visualization methods. Four communities on gender biasness: Male Biased, Female Biased, Feminism and Neutral people are presented here. The research advances knowledge of gender biases in online social networks and can guide initiatives to advance equality and inclusivity. The goal of this study is to create a solid framework for identifying and examining communities that show neutrality, feminism, neutrality, and male and female prejudice.
... In our scenario, as demonstrated above, we were analysing a directed weighted graph. If we were to have performed a naive directed-to-undirected graph transformation, disregarding edge directionality and treating the graph as undirected, two main biases could have arisen (Malliaros & Vazirgiannis, 2013): data ambiguity (i.e., deviations in the clustering results) and biases in the clustering results (by discarding edge directionality, valuable information would not be used in the clustering process, generating deviations in the results). To explain these biases, consider a network in which the connections represent universities. ...
Article
Full-text available
Over recent years, scholarly interest in universities’ allocation and effective utilisation of financial resources has been growing. When used efficiently, financial resources may improve universities’ quality of research and teaching, and therefore their positions in world university rankings. However, despite the relevance of financial efficiency to university placement in academic rankings, universities’ total available financial resources appear much more significant. In the present study, we propose an innovative methodology to determine realistic ranking targets for individual universities, based on their available financial resources. In particular, we combine data envelopment analysis, as developed by Banker et al. (Manag Sci 30(9):1078–1092, 1984), and a directed Louvain community detection algorithm to examine 318 universities across five countries, considering their ARWU scores alongside key financial indicators (i.e., long-term physical capital, total operating revenues). We identify clusters of universities with similar financial profiles and corresponding ARWU scores, as well as universities that have optimised their use of financial resources, representing benchmarks for similar universities to emulate. The approach is subsequently applied to Italian universities, as a specific national case. The findings may be useful for policy makers and university managers seeking reliable strategies for climbing academic rankings, particularly in countries with limited public investment in higher education.
... Semantics opens up new perspectives and allows the interpretation of high-order social relations. Malliaros et al. [11] handled community detection by directed graphs and presented a methodology-based taxonomy of different approaches. They presented the relevant work along two orthogonal classifications; the first one concerned with the methodological principles of the clustering algorithms and the second one the methods from the viewpoint regarding the properties of a good cluster in a directed network. ...
Article
Full-text available
One of the important issues in social networks is the social communities which are formed by interactions between its members. Three types of community including overlapping, non-overlapping, and hidden are detected by different approaches. Regarding the importance of community detection in social networks, this paper provides a systematic mapping of machine learning-based community detection approaches. The study aimed to show the type of communities in social networks along with the algorithms of machine learning that have been used for community detection. After carrying out the steps of mapping and removing useless references, 246 papers were selected to answer the questions of this research. The results of the research indicated that unsupervised machine learning-based algorithms with 41.46% (such as k means) are the most used categories to detect communities in social networks due to their low processing overheads. On the other hand, there has been a significant increase in the use of deep learning since 2020 which has sufficient performance for community detection in large-volume data. With regard to the ability of NMI to measure the correlation or similarity between communities, with 53.25%, it is the most frequently used metric to evaluate the performance of community identifications. Furthermore, considering availability, low in size, and lack of multiple edge and loops, dataset Zachary’s Karate Club with 26.42% is the most used dataset for community detection research in social networks.
... In the second, pattern-based clustering approach, communities are formed based on the similarity of nodes. This similarity can be defined by the network profile of industries (e.g., centrality measures) or external criteria (Malliaros and Vazirgiannis, 2013;Newman, 2018). ...
Preprint
Full-text available
The analysis of production networks plays a crucial role in understanding the economic landscape and addressing development issues. This paper applies network theory to analyze the Moroccan production network, examining its holistic structure, industries and cluster. It uses national accounting data and OECD datasets from 66 countries. Our investigation reveals a "butterfly" structure, with upstream and downstream sectors connected by a middle core of highly connected sectors. The network exhibits lower density compared to other countries, raising questions about growth and efficiency. Our analysis underscores the need for policies that foster domestic input production and reduce import dependence. Moreover, it emphasizes the importance of providing a more granular and frequent input-output table to inform more nuanced insights.
... An undirected edge between two papers cannot convey the correct relationship as explained in [2]. Additionally, for directed graphs, graph properties like density (defined by the ratio of total number edges in a graph to the maximum possible edges) are either not defined or behave differently depending on the type of network, such as clustering coefficient [3]. As a result, when it comes to community detection, directed graphs should be addressed independently. ...
Preprint
Full-text available
This paper presents a heterogeneous technique for detecting communities from social graphs called as DrSbChain. It operates in two phases; wherein, it gathers local information from the graph and processes one node at a time (i.e., seed nodes) in the first phase. It employs simple topological properties of the graph to determine the best neighbors nodes for the seed nodes in the graph and merges them to build a chain, referred to as a snowball-chain inspired by the concept of snowball sampling. In each successive iteration, the chains identified in the previous iteration can be combined to form larger snowballs until all the nodes are covered. Finally, the algorithm’s initial phase yields local communities. The second phase merges the local communities using a merge criteria, resulting in the final set of communities. The novelty of DrSbChain is that it can work on both, directed and undirected social graphs, and detect overlapping as well as disjoint communities by just setting a binary parameter value. The overlapping communities produced by DrSbChain are evaluated using link modularity and overlapping normalized mutual information (ONMI) and compared with several state of the art methods. Based on the empirical results, DrSbChain is found to produce better results for link modularity and at par ONMI values. Also, disjoint communities are evaluated using another parameter called the newman modularity along with the previous two measures. This paper also describes an application of the proposed DrSbChain technique to identify communities based on user check-ins from the location-based social networking site Brightkite that can be prove to be useful for influential node identification.
... These methods offer a probabilistic interpretation of community detection results, aiding in the assessment of result reliability and uncertainty; however, they involve complex mathematical models and computational processes, leading to high computational costs and unsuitability for ultra-large networks. Since community detection is a technique for revealing the clustering of network nodes into communities, some studies have also applied clustering algorithms to community detection [25], giving rise to clustering-based methods. Representative methods are the Girvan-Newman (GN) algorithm based on hierarchical clustering [26] and the FastQ algorithm based on cohesive clustering [27], which can be applied to a variety of network types and adapted to different community shapes and sizes; however, they rely on preset parameters (e.g., the number of clusters), the selection of which requires domain knowledge. ...
Article
Full-text available
Spatial community detection is a method that divides geographic spaces into several sub-regions based on spatial interactions, reflecting the regional spatial structure against the background of human mobility. In recent years, spatial community detection has attracted extensive research in the field of geographic information science. However, mining the community structures and their evolutionary patterns from spatial interaction data remains challenging. Most existing methods for spatial community detection rely on representing spatial interaction networks in Euclidean space, which results in significant distortion when modeling spatial interaction networks; since spatial community detection has no ground truth, this results in the detection and evaluation of communities being difficult. Furthermore, most methods usually ignore the dynamics of these spatial interaction networks, resulting in the dynamic evolution of spatial communities not being discussed in depth. Therefore, this study proposes a framework for community detection and evolutionary analysis for spatial interaction networks. Specifically, we construct a spatial interaction network based on network science theory, where geographic units serve as nodes and interaction relationships serve as edges. In order to fully learn the structural features of the spatial interaction network, we introduce a hyperbolic graph convolution module in the community detection phase to learn the spatial and non-spatial attributes of the spatial interaction network, obtain vector representations of the nodes, and optimize them based on a graph generation model to achieve the final community detection results. Considering the dynamics of spatial interactions, we analyze the evolution of the spatial community over time. Finally, using taxi trajectory data as an example, we conduct relevant experiments within the fifth ring road of Beijing. The empirical results validate the community detection capabilities of the proposed method, which can effectively describe the dynamic spatial structure of cities based on human mobility and provide an effective analytical method for urban spatial planning.
... For example, in a social network graph, nodes show people and edges show connections between them. All nodes connected to the vertex are called its neighborhood set and are denoted by ( ) [1]. ...
Article
Full-text available
Community detection is crucial for analyzing the structure of social networks and extracting hidden information from them. The goal is to find groups of nodes (communities) with high intra-group and low inter-group communications. This problem is NP-hard, and most existing algorithms are global with high computational complexity, especially for large networks. Recently, local methods with acceptable computational complexity have been developed, but many have low accuracy and are non-deterministic. This paper introduces a new local algorithm, LCD-SN, which identifies communities based on first- and second-degree neighbor nodes. Unlike other local algorithms, LCD-SN is highly accurate, definitive, and not dependent on initial seed nodes. Additionally, a new index is proposed to determine the importance of network nodes, using their local characteristics (first- and second-degree neighbors). Using this index, LCD-SN first identifies important nodes, forms initial communities with these nodes and their first-degree neighbors, and then obtains final communities through post-processing. Experiments show that LCD-SN is effective in identifying communities in social networks.
... In order to assess the quality of extracted track candidates, the pvalue is computed from the χ 2 -based KF track fit quality and must be greater than 0.01. If a subgraph does not meet the criteria to qualify as a good track candidate, a Community Detection algorithm [11] is applied in order to further partition the set of nodes. Community detection is a generalisation of CCA and works by using a distance metric, typically modularity, in order to label nodes as 'closely connected'. ...
Preprint
Full-text available
Future upgrades to modern high-energy particle detectors will pose considerable challenges for traditional particle track reconstruction methods. Within the past two years, architectures that operate on Graph Neural Networks (GNN) have shown high degrees of promise. Presented here is a novel approach to GNN-based track finding which uses Gaussian mixture techniques to describe track parameter estimates. Unlike traditional methodologies whereby Multi-Layered Perceptrons (MLPs) are employed, the proposed GNN architecture leverages the Kalman filter as a mechanism for information aggregation, in order to iteratively improve the precision of track parameters, as well as for extraction of track candidates compatible with particle motion model. The excitation and inhibition rules of individual edge connections are designed to facilitate the “simple-to-complex” approach for “hits-to-tracks” association, such that the network starts with low hit density regions of an event and gradually progresses towards more complex areas. This paper focuses on the track finding algorithm development and its application on the publicly available dataset designed for the Kaggle TrackML challenge. The preliminary results related to track reconstruction efficiency and purity metrics are presented and discussed. The ultimate aim of this work is to develop a realistic GNN-based algorithm for fast track finding that can be deployed in future high-luminosity phases of particle detector experiments.
... As a form of symmetric interaction, the bidirectional interaction mode simplifies analysis and provides a straightforward understanding of how population structure allows for the spread of cooperation [26]. However, the bidirectional mode overlooks the widespread presence of asymmetric relationships in social interactions [28], which could be observed in various fields. For example, in cybersecurity [23], there exists an asymmetric situation between attackers and defenders. ...
Article
Full-text available
In this letter, we introduce the payoff-based view radii into evolutionary prisoner's dilemma games performed in a two-dimensional plane and study how the adaptive view radii affect cooperation. Two types of feedback are considered, positive feedback and negative one. In the case of positive feedback, high-payoff (low-payoff) agents have large (small) view radii. In the case of negative feedback, the things are opposite. Meanwhile, three different interaction modes are considered, one-way visual interaction, proactive visual interaction, and two-way visual interaction. Our results show that the payoff-based view radii could promote cooperation effectively in all cases. Especially, there exist optimal behaviors of cooperation for both positive feedback and negative feedback. When agents are allowed to move, we find that the cooperation level could be further improved by slow migration. Our results shed light on the promotion of cooperation by the adaptive view radii and suggest different ways to adjust view radii to achieve high cooperation levels in different interaction modes.
... Here, one of the most popular inference tasks is clustering [RM05, SPG + 17], where one aims to partition a set of vertices into different groups such that vertices in the same group are similar. Often, one assumes the vertices have a ground truth partition into some z many true/underlying communities, and the goal is to recover a partition as close as possible to the hidden partition [BDG + 07, Abb18,MV13]. ...
Preprint
Full-text available
Community and core-periphery are two widely studied graph structures, with their coexistence observed in real-world graphs (Rombach, Porter, Fowler \& Mucha [SIAM J. App. Math. 2014, SIAM Review 2017]). However, the nature of this coexistence is not well understood and has been pointed out as an open problem (Yanchenko \& Sengupta [Statistics Surveys, 2023]). Especially, the impact of inferring the core-periphery structure of a graph on understanding its community structure is not well utilized. In this direction, we introduce a novel quantification for graphs with ground truth communities, where each community has a densely connected part (the core), and the rest is more sparse (the periphery), with inter-community edges more frequent between the peripheries. Built on this structure, we propose a new algorithmic concept that we call relative centrality to detect the cores. We observe that core-detection algorithms based on popular centrality measures such as PageRank and degree centrality can show some bias in their outcome by selecting very few vertices from some cores. We show that relative centrality solves this bias issue and provide theoretical and simulation support, as well as experiments on real-world graphs. Core detection is known to have important applications with respect to core-periphery structures. In our model, we show a new application: relative-centrality-based algorithms can select a subset of the vertices such that it contains sufficient vertices from all communities, and points in this subset are better separable into their respective communities. We apply the methods to 11 biological datasets, with our methods resulting in a more balanced selection of vertices from all communities such that clustering algorithms have better performance on this set.
... For the exemplary network, the condensed version is shown on the right-hand side in Fig. 5, and the respective adjacency matrix cond ∈ R 5×5 is presented in Eq. (12). This visualization scheme is related to community detection algorithms [57], and allows for a better overview over the connections and symmetries within the graph. ...
... The community structure with the higher modularity score was selected. Modularity is a widely used metric that expresses the quality of a network's division into communities 59 . It depends only on the choice of the communities (or clusters) in the network. ...
Article
Full-text available
We aimed to (a) investigate the interplay between depression, symptoms and level of functioning, and (b) understand the paths through which they influence health related quality of life (QOL) during the first year of rehabilitation period of early breast cancer. A network analysis method was used. The population consisted of 487 women aged 35-68 years, who had recently completed adjuvant chemotherapy or started endocrine therapy for early breast cancer. At baseline and at the first year from randomization QOL, symptomatology and functioning by the EORTC QLQ-C30 and BR-23 questionnaires, and depression by the Finnish version of Beck's 13-item depression scale, were collected. The multivariate interplay between the related scales was analysed via regularized partial correlation networks (graphical LASSO). The median global quality of life (gQoL) at baseline was 69.9 ± 19.0 (16.7-100) and improved to 74.9 ± 19.0 (0-100) after 1 year. Scales related to mental health (emotional functioning, cognitive functioning, depression, insomnia, body image, future perspective) were clustered together at both time points. Fatigue was mediated through a different route, having the strongest connection with physical functioning and no direct connection with depression. Multiple paths existed connecting symptoms and functioning types with gQoL. Factors with the strongest connections to gQoL included: social functioning, depression and fatigue at baseline; emotional functioning and fatigue at month 12. Overall, the most important nodes were depression, gQoL and fatigue. The graphical LASSO network analysis revealed that scales related to fatigue and emotional health had the strongest associations to the EORTC QLQ-C30 gQoL score. When we plan interventions for patients with impaired QOL it is important to consider both psychological support and interventions that improve fatigue and physical function like exercise. Trial registration: http:// www. clini caltr ials. gov/ (identifier number NCT00639210). Abbreviations BDI Finnish modified version of Beck's 13-item depression scale BREX Breast cancer and exercise trial CS-coefficient Correlation stability coefficient gQoL EORTC C30 global health status / quality of life scale QOL Health related quality of life (in general)
... The main reason that research to date has tended to focus on undirected unweighted graphs rather than the other types of graphs is that many problems of community detection in the different types of graphs can be translated into problems of community detection in unweighted undirected graphs. Indeed, a simple common way to deal with directed graph when searching for communities is to ignore directionality and assume edge symmetry (Malliaros and Vazirgiannis 2013). In the context of community detection, a bipartite graph can be transformed into a non-bipartite graph by multiplying the corresponding adjacency matrix by its transpose. ...
Article
Full-text available
This paper extensively reviews the literature of community detection in complex networks and proposes a general classification describing the main models used for this purpose. Besides, a statistical study of the distribution of the recent relevant literature has been realized to picture the tendency of the models used by the main works published in the context of community detection. This mainly helped the understanding of the suitable community model to be used in each real-world network application. Furthermore, we establish a critical study of the state-of-the-art approaches according to the proposed classification. Moreover, we investigate the relevant applications of communities in networks and we establish a statistical study to illustrate the distribution of research works in the field of community detection. Finally, we discuss several open issues and future research directions of approaches and applications that would be worth investigating in the area of community detection.
... Modularity ranges between −1 and 1 and measures the propensity of a network to split into sub-networks, which can be interpreted as communities (Clauset et al. 2004). The computation of Louvain's algorithm is only available for undirect networks in 'igraph' due to its complexity in the case of direct networks (Blondel et al. 2008;Malliaros and Vazirgiannis 2013). Thus, we converted the trust network from direct to undirect to estimate the Louvain communities. ...
... Από τα πλέον βασικά ανοιχτά προβλήματα στην περιοχή είναι ο εντοπισμός κοινοτήτων σε γράφους [5] [20]. Καθώς δεν υπάρχει ένας μόνο ορισμός του τι συνιστά κοινότητα σε ένα γράφο [36][1], έχουν αναπτυχθεί διάφορες μεθοδολογίες όπως για παράδειγμα με γενετικούς αλγορίθμους [17], προοδευτική κατασκευή κοινοτήτων προς τα πάνω [41] οι οποίες μπορούν να γίνουν και κατανεμημένες [21], φασματική παραγοντοποίηση του πίνακα γειτνίασης του γράφου [42], εμβυθίσεις γράφων με διαφορετικά διανύσματα χαρακτηριστικών [52][61], και μηχανική μάθηση σε συνδυασμό με μετρικές απόστασης κορυφών βασισμένες σε τανυστές [13]. Ανάλογα με τον ορισμό των κοινοτήτων μπορεί να υπάρχει επικάλυψη μεταξύ των κοινοτήτων [57]. ...
Thesis
Full-text available
The thesis focuses on the analysis of social networks, emphasizing their role and importance both in everyday life and in the way they affect society. At the same time, it examines the widespread use of graph structured data, revealing how much other real-world data takes a similar form. In this context, the different aspects of the presence of networks in our society are analyzed. Graph Neural Networks (GNNs) are then extensively discussed, highlighting their potential and their applications in network analysis. A comparison is made with other graph analysis techniques, and their operation is discussed. Some of their most basic architectures are also highlighted, showing how these models exploit the structure of graphs for higher performance. In general, it is described how these models can perform deep learning tasks for social data analysis, enhancing the understanding of structural features and relationships in social networks. Finally, in the context of the research, Graph Convolutional Networks (GCNs) and Graph Attention Networks (GATs) are used to classify communities on the social networking platform ’Reddit’, taking into account functional and psychological characteristics. Furthermore, an extensive testing of these models on additional two networks of different characteristics is performed, determining their performance and recording the significant results obtained from their practical application.
... Chung 48 defined a symmetric normalized Laplace operator for strongly connected directed graphs. Notably, the normalized directed graph Laplacian, which has been used in spectral clustering 81 , graph embedding 82 , and classification applications 83 , may represent the graph's directionality and edge density by leveraging the random walk operator. ...
Article
Full-text available
Deciphering the complex relationship between neuroanatomical connections and functional activity in primate brains remains a daunting task, especially regarding the influence of monosynaptic connectivity on cortical activity. Here, we investigate the anatomical-functional relationship and decompose the neuronal-tracing connectome of marmoset brains into a series of eigenmodes using graph signal processing. These cellular connectome eigenmodes effectively constrain the cortical activity derived from resting-state functional MRI, and uncover a patterned cellular-functional decoupling. This pattern reveals a spatial gradient from coupled dorsal-posterior to decoupled ventral-anterior cortices, and recapitulates micro-structural profiles and macro-scale hierarchical cortical organization. Notably, these marmoset-derived eigenmodes may facilitate the inference of spontaneous cortical activity and functional connectivity of homologous areas in humans, highlighting the potential generalizing of the connectomic constraints across species. Collectively, our findings illuminate how neuronal-tracing connectome eigenmodes constrain cortical activity and improve our understanding of the brain’s anatomical-functional relationship.
... In contrast to hierarchical clustering, 'graph partitioning' may be used to determine sub-network properties but only is feasible when community structure is essentially known ex ante[60].29 Also note that other divisive techniques for identifying communities in networks have been proposed[85,86], but we adopt Newman & Girvan's[60] approach, because of its simplicity and computational efficiency. See also Jackson[87] and Jackson & Rogers[88].30 ...
Article
Full-text available
From at least the early twentieth century, legal scholars have recognized that rights and other legal relations inhere between individual legal actors, forming a vast and complex social network. Yet, no legal scholar has used the mathematical machinery of network theory to formalize these relationships. Here, we propose the first such approach by modelling a rudimentary, static set of real property relations using network theory. Then, we apply our toy model to measure the level of modularity—essentially, the community structure—among aggregations of these real property relations and associated actors. In so doing, we show that even for a very basic set of relations and actors, law may employ modular structures to manage complexity. Property, torts, contracts, intellectual property, and other areas of the law arguably reduce information costs in similar, quantifiable ways by chopping up the world of interactions between parties into manageable modules that are semi-autonomous. We also posit that our network science approach to jurisprudential issues can be adapted to quantify many other important aspects of legal systems. This article is part of the theme issue 'A complexity science approach to law and governance'.
... Due to their ease of implementation, computational speed, and ability to handle non-linear cluster shapes, spectral clustering and NCut are well-suited for a wide range of graph networks (Nascimento & de Carvalho, 2011;van den Heuvel et al., 2008;Van Lierde et al., 2020). In addition, spectral clustering and normalized cut (NCut) methods have been extended for directed networks (Malliaros & Vazirgiannis, 2013). ...
Article
Full-text available
Directed networks appear in an expanding array of applications, for example, the world wide web, social networks, transaction networks, and citation networks. A critical task in analyzing directed networks is clustering, where the goal is partitioning the network's nodes based on their similarities while accounting for the direction of relationships between nodes. Non-negative matrix factorization (NMF) and its variations have been used to cluster the nodes in directed networks by approximating their adjacency matrices efficaciously. The differences between the corresponding entries of the actual and approximate adjacency matrices are considered as errors, which are assumed to follow Gaussian distributions. However, these errors could deviate from Gaussian distributions in various real-world networks. In this work, we propose a robust asymmetric non-negative matrix factorization method to cluster the nodes in directed networks. Recognizing that the errors do not follow Gaussian distributions in real-world networks, the proposed method assumes that the errors follow a Cauchy distribution, which resembles the Gaussian distribution but has heavier tails. Experiments using real-world as well as artificial networks show that the proposed method outperforms existing NMF methods and other representative work in clustering in various settings.
... Alternativement, une communauté peut être déterminée en se basant sur la similarité de ses branches (Pattern-based clusters). Cette ressemblance est basée sur leur profil dans le réseau et/ou sur des critères externes (Malliaros et Vazirgiannis, 2013;Newman, 2018). ...
Article
Full-text available
Dans ce rapport, nous proposons d’utiliser une nouvelle approche pour analyser les structures productives au Maroc. Nous appliquons les outils de la théorie des réseaux aux tableaux des échanges inter-industriels de l'économie marocaine. Nos résultats montrent que le réseau national de production présente des caractéristiques intuitives dans le sens qu’elles sont liées aux processus de production. La structure de ce réseau peut être représentée sous la forme d'un papillon, avec les branches en amont d'un côté et les branches en aval de l'autre, reliées par un nœud composé de branches très connectées qui offrent des produits de base. Au sein de cette structure, nous avons mis en lumière deux clusters qui se distinguent clairement : le premier concerne les produits agricoles et alimentaires, tandis que le second est associé à l'écosystème de la construction. En comparaison avec un échantillon étendu de pays, le réseau de production au Maroc se révèle peu dense et peu polarisé. Les branches d'activités y sont, en effet, moins asymétriques en termes de centralité, en particulier au cours de la période récente.
Article
Existing electroencephalography (EEG) studies predominantly involve participants in stationary positions, which presents challenges in accurately capturing EEG data during physical activities due to motion-induced noise and artifacts. This study aims to assess and validate the efficacy of the Soft Dynamic Time Warping (Soft-DTW) clustering method for analyzing EEG data collected during physical activity, focusing on an oddball auditory task performed while walking. Employing a mobile active bio-amplifier, the study captures brain activity and assesses auditory event-related potentials (ERPs) under dynamic conditions. The comparative performance of five clustering techniques, k-shape, kernels, k-means, Dynamic Time Warping, and Soft-DTW, in terms of their effectiveness in artifact reduction, was analyzed. Results indicated a significant difference between target and non-target auditory stimuli, with the target stimuli exhibiting a positive (positive) potential, although of smaller magnitude. This outcome suggests that, despite significant artifact interference from walking, Soft-DTW facilitates extracting differences in cognitive processes for the oddball task from the EEG data.
Article
The links of a physical network cannot cross, which often forces the network layout into nonoptimal entangled states. Here we define a network fabric as a two-dimensional projection of a network and propose the average crossing number as a measure of network entanglement. We analytically derive the dependence of the average crossing number on network density, average link length, degree heterogeneity, and community structure and show that the predictions accurately estimate the entanglement of both network models and of real physical networks.
Chapter
A very large number of community detection techniques have been developed. We survey some of these, focusing on those that use both structure (the explicit similarities from relationships) and attributes (the similarities based on common interests). Most techniques look for disjoint communities, but it is also plausible to allow communities to overlap, that is to allow objects to belong to more than one community. Community detection techniques range from those that define a community quality measure and then find communities that optimise this measure, to hybrid techniques that use a wide variety of algorithmic pieces including, recently, deep learning. The literature is large and disparate, so we concentrate on significant examples.
Article
In recent years, designing fairness-aware methods has received much attention in various domains, including machine learning, natural language processing, and information retrieval. However, in social network analysis (SNA), designing fairness-aware methods for various research problems by considering structural bias and inequalities of large-scale social networks has not received much attention. In this work, we highlight how the structural bias of social networks impacts the fairness of different SNA methods. We further discuss fairness aspects that should be considered while proposing network structure-based solutions for different SNA problems, such as link prediction, influence maximization, centrality ranking, and community detection. This survey-cum-vision clearly highlights that very few works have considered fairness and bias while proposing solutions; even these works are mainly focused on some research topics, such as link prediction, influence maximization, and PageRank. However, fairness has not yet been addressed for other research topics, such as influence blocking and community detection. We review state-of-the-art for different research topics in SNA, including the considered fairness constraints, their limitations, and our vision. This survey also covers evaluation metrics, available datasets, and synthetic network generating models used in such studies. Finally, we highlight various open research directions that require researchers’ attention to bridge the gap between fairness and SNA.
Article
Graph clustering is a fundamental technique in data analysis with applications in many different fields. While there is a large body of work on clustering undirected graphs, the problem of clustering directed graphs is much less understood. The analysis is more complex in the directed graph case for two reasons: the clustering must preserve directional information in the relationships between clusters, and directed graphs have non-Hermitian adjacency matrices whose properties are less conducive to traditional spectral methods. Here, we consider the problem of partitioning the vertex set of a directed graph into k≥2 clusters so that edges between different clusters tend to follow the same direction. We present an iterative algorithm based on spectral methods applied to new Hermitian representations of directed graphs. Our algorithm performs favourably against the state-of-the-art, both on synthetic and real-world data sets. Additionally, it can identify a ‘meta-graph’ of k vertices that represents the higher-order relations between clusters in a directed graph. We showcase this capability on data sets about food webs, biological neural networks, and the online card game Hearthstone.
Article
The Densest Subgraph Problem requires to find, in a given graph, a subset of vertices whose induced subgraph maximizes a measure of density. The problem has received a great deal of attention in the algorithmic literature over the last five decades, with many variants proposed and many applications built on top of this basic definition. Recent years have witnessed a revival of research interest in this problem with several important contributions, including some groundbreaking results, published in 2022 and 2023. This survey provides a deep overview of the fundamental results and an exhaustive coverage of the many variants proposed in the literature, with a special attention to the most recent results. The survey also presents a comprehensive overview of applications and discusses some interesting open problems for this evergreen research topic.
Chapter
Communities are often defined as sets of nodes that are more densely connected to each other than to those outside the community, i.e., high-modularity partitions. It seems obvious that isolating high-modularity communities is a good way to prevent the spread of cascading failures. Here we develop a heuristic approach informed by Moore-Shannon network reliability that focuses on dynamics rather than topology. It defines communities directly in terms of the size of cascades they allow. We demonstrate that isolating communities defined this way may control cascading failure better. Moreover, this approach is sensitive to the values of dynamical parameters and allows for problem-specific constraints such as cost.
Article
Full-text available
Despite its increasing role in communication, the world wide web remains the least controlled medium: any individual or institution can create websites with unrestricted number of documents and links. While great efforts are made to map and characterize the Internet's infrastructure, little is known about the topology of the web. Here we take a first step to fill this gap: we use local connectivity measurements to construct a topological model of the world wide web, allowing us to explore and characterize its large scale properties. Comment: 5 pages, 1 figure, updated with most recent results on the size of the www
Article
Full-text available
Recently, a number of researchers have investigated a class of graph partitioning algorithms that reduce the size of the graph by collapsing vertices and edges, partition the smaller graph, and then uncoarsen it to construct a partition for the original graph (Bui and Jones, Proc. of the 6th SIAM Conference on Parallel Processing for Scientific Computing, 1993, 445-452; Hen- drickson and Leland, A Multilevel Algorithm for Partitioning Graphs, Tech. report SAND 93-1301, Sandia National Laboratories, Albuquerque, NM, 1993). From the early work it was clear that multilevel techniques held great promise; however, it was not known if they can be made to con- sistently produce high quality partitions for graphs arising in a wide range of application domains. We investigate the effectiveness of many different choices for all three phases: coarsening, partition of the coarsest graph, and refinement. In particular, we present a new coarsening heuristic (called heavy-edge heuristic) for which the size of the partition of the coarse graph is within a small factor of the size of the final partition obtained after multilevel refinement. We also present a much faster variation of the Kernighan-Lin (KL) algorithm for refining during uncoarsening. We test our scheme on a large number of graphs arising in various domains including finite element methods, linear pro- gramming, VLSI, and transportation. Our experiments show that our scheme produces partitions that are consistently better than those produced by spectral partitioning schemes in substantially smaller time. Also, when our scheme is used to compute fill-reducing orderings for sparse matrices, it produces orderings that have substantially smaller fill than the widely used multiple minimum degree algorithm.
Article
Full-text available
The proposed survey discusses the topic of community detection in the context of Social Media. Community detection constitutes a significant tool for the analysis of complex networks by enabling the study of mesoscopic structures that are often associated with organizational and functional characteristics of the underlying networks. Community detection has proven to be valuable in a series of domains, e.g. biology, social sciences, bibliometrics. However, despite the unprecedented scale, complexity and the dynamic nature of the networks derived from Social Media data, there has only been limited discussion of community detection in this context. More specifically, there is hardly any discussion on the performance characteristics of community detection methods as well as the exploitation of their results in the context of real-world web mining and information retrieval scenarios. To this end, this survey first frames the concept of community and the problem of community detection in the context of Social Media, and provides a compact classification of existing algorithms based on their methodological principles. The survey places special emphasis on the performance of existing methods in terms of computational complexity and memory requirements. It presents both a theoretical and an experimental comparative discussion of several popular methods. In addition, it discusses the possibility for incremental application of the methods and proposes five strategies for scaling community detection to real-world networks of huge scales. Finally, the survey deals with the interpretation and exploitation of community detection results in the context of intelligent web applications and services.
Conference Paper
Full-text available
We shall start with a presentation of some typical and well-known real-life networks. After introducing the fundamental concepts of network analysis the following topics will be presented:-mode to 1-mode networks; -clustering and blockmodeling. In examples we shall use program Pajek.
Chapter
Full-text available
In this chapter we describe patterns that occur in the structure of social networks, represented as graphs. We describe two main classes of properties, static properties, or properties describing the structure of snapshots of graphs; and dynamic properties, properties describing how the structure evolves over time. These properties may be for unweighted or weighted graphs, where weights may represent multi-edges (e.g. multiple phone calls from one person to another), or edge weights (e.g. monetary amounts between a donor and a recipient in a political donation network). KeywordPower laws-network structure-weighted graphs
Article
http://deepblue.lib.umich.edu/bitstream/2027.42/117072/1/ecy201091102941.pdf
Article
Networks are widely used in the biological, physical, and social sciences as a concise mathematical representation of the topology of systems of interacting components. Understanding the structure of these networks is one of the outstanding challenges in the study of complex systems. Here we describe a general technique for detecting structural features in large-scale network data that works by dividing the nodes of a network into classes such that the members of each class have similar patterns of connection to other nodes. Using the machinery of probabilistic mixture models and the expectation–maximization algorithm, we show that it is possible to detect, without prior knowledge of what we are looking for, a very broad range of types of structure in networks. We give a number of examples demonstrating how the method can be used to shed light on the properties of real-world networks, including social and information networks. • clustering • graph • likelihood
Article
In this survey we overview the definitions and methods for graph clustering, that is, finding sets of ''related'' vertices in graphs. We review the many definitions for what is a cluster in a graph and measures of cluster quality. Then we present global algorithms for producing a clustering for the entire vertex set of an input graph, after which we discuss the task of identifying a cluster for a specific seed vertex by local computation. Some ideas on the application areas of graph clustering algorithms are given. We also address the problematics of evaluating clusterings and benchmarking cluster algorithms.
Article
Consider data consisting of pairwise measurements, such as presence or absence of links between pairs of objects. These data arise, for instance, in the analysis of protein interactions and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing pairwise measurements with probabilistic models requires special assumptions, since the usual independence or exchangeability assumptions no longer hold. Here we introduce a class of variance allocation models for pairwise measurements: mixed membership stochastic blockmodels. These models combine global parameters that instantiate dense patches of connectivity (blockmodel) with local parameters that instantiate node-specific variability in the connections (mixed membership). We develop a general variational inference algorithm for fast approximate posterior inference. We demonstrate the advantages of mixed membership stochastic blockmodels with applications to social networks and protein interaction networks.
Article
Real-world physical and abstract data objects are interconnected, forming gigantic, interconnected networks. By structuring these data objects and interactions between these objects into multiple types, such networks become semi-structured heterogeneous information networks. Most real-world applications that handle big data, including interconnected social media and social networks, scientific, engineering, or medical information systems, online e-commerce systems, and most database systems, can be structured into heterogeneous information networks. Therefore, effective analysis of large-scale heterogeneous information networks poses an interesting but critical challenge. In this book, we investigate the principles and methodologies of mining heterogeneous information networks. Departing from many existing network models that view interconnected data as homogeneous graphs or networks, our semi-structured heterogeneous information network model leverages the rich semantics of typed nodes and links in a network and uncovers surprisingly rich knowledge from the network. This semi-structured heterogeneous network modeling leads to a series of new principles and powerful methodologies for mining interconnected data, including: (1) rank-based clustering and classification; (2) meta-path-based similarity search and mining; (3) relation strength-aware mining, and many other potential developments. This book introduces this new research frontier and points out some promising research directions. Table of Contents: Introduction / Ranking-Based Clustering / Classification of Heterogeneous Information Networks / Meta-Path-Based Similarity Search / Meta-Path-Based Relationship Prediction / Relation Strength-Aware Clustering with Incomplete Attributes / User-Guided Clustering via Meta-Path Selection / Research Frontiers
Chapter
Bibliographic databases contain a huge amount of information on the dissemination of scientific knowledge and the relationships between papers, authors, and scientific work. Large-scale citation networks can be generated from these databases in order to provide a systems-level perspective on the processes at the root of the spreading of ideas, theories, and results in science. Citation networks are therefore one of the main proxies for our understanding of knowledge dynamics as well as invaluable systems for the quantitative analysis of the impact of specific scientific contributions, the emergence of technical and scientific areas, and the ranking of journals, institutions, and scientists. This chapter reviews recent developments made in the study of citation networks, ranging from empirical analyses of real systems and mathematical models of them, to the study of dynamic processes taking place in them and their potential applications. Furthermore, studying citation datasets with the tools of network theory opens new avenues towards a quantitative understanding of the dynamics of popularity with respect to papers, journals, and scientists, possibly leading to novel measures of impact and ranking.
Article
In this report, we examine the generalization of the Laplacian of a graph due to Fan Chung. We show that Fan Chung's generalization reduces to examining one particular symmetrization of the adjacency matrix for a directed graph. From this result, the directed Cheeger bounds trivially follow. Additionally, we implement and examine the benefits of directed hierarchical spectral clustering empirically on a dataset from Wikipedia. Finally, we examine a set of competing heuristic methods on the same dataset.
Article
Given a large social graph, like a scientific collaboration network, what can we say about its robustness? Can we estimate a robustness index for a graph quickly? If the graph evolves over time, how these properties change? In this work, we are trying to answer the above questions studying the expansion properties of large social graphs. First, we present a measure which characterizes the robustness properties of a graph, and serves as global measure of the community structure (or lack thereof). We study how these properties change over time and we show how to spot outliers and anomalies over time. We apply our method on several diverse real networks with millions of nodes. We also show how to compute our measure efficiently by exploiting the special spectral properties of real-world networks. 1 Introduction Over the last few years, social networks and graphs in general, have received a considerable interest from the research community. Several kind of data arising from many diverse disciplines can be naturally represented as graphs (or networks). Characteristic examples are technological and information networks (e.g., the Web, the Internet, e-mail exchange networks), collaboration and citation networks (e.g., the DBLP co-authorship network), as well as social networks from online social networking and social media applications, like Facebook and Youtube [34]. A large amount of research work has been devoted on understanding the structure, the organization and the evolution of these networks, with many interesting results [9].
Article
An important step in unveiling the relation between network structure and dynamics defined on networks is to detect communities, and numerous methods have been developed separately to identify community structure in different classes of networks, such as unipartite networks, bipartite networks, and directed networks. We show that both unipartite and directed networks can be represented as bipartite networks, and their modularity is completely consistent with that for bipartite networks, the detection of modular structure on which can be reformulated as modularity maximization. To optimize the bipartite modularity, we develop a modified adaptive genetic algorithm (MAGA), which is shown to be especially efficient for community structure detection. The high efficiency of the MAGA is based on the following three improvements we make. First, we introduce a different measure for the informativeness of a locus instead of the standard deviation, which can exactly determine which loci mutate. This measure is the bias between the distribution of a locus over the current population and the uniform distribution of the locus, i.e., the Kullback-Leibler divergence between them. Second, we develop a reassignment technique for differentiating the informative state a locus has attained from the random state in the initial phase. Third, we present a modified mutation rule which by incorporating related operation can guarantee the convergence of the MAGA to the global optimum and can speed up the convergence process. Experimental results show that the MAGA outperforms existing methods in terms of modularity for both bipartite and unipartite networks.
Article
Many networked systems, including physical, biological, social, and technological networks, appear to contain ``communities'' -- groups of nodes within which connections are dense, but between which they are sparser. The ability to find such communities in an automated fashion could be of considerable use. Communities in a web graph for instance might correspond to sets of web sites dealing with related topics, while communities in a biochemical network or an electronic circuit might correspond to functional units of some kind. We present a number of new methods for community discovery, including methods based on ``betweenness'' measures and methods based on modularity optimization. We also give examples of applications of these methods to both computer-generated and real-world network data, and show how our techniques can be used to shed light on the sometimes dauntingly complex structure of networked systems.
Article
Network models can be used to represent interacting subsystems in the brain or other biological systems. These subsystems can be identified by partitioning a graph representation of the network into highly connected modules. In this paper we describe a modularity-based partitioning method based on a Gaussian model of a directed graph. Using the degrees of each node, we first compute the conditional expected value of the connection weights. The resulting adjacency matrix forms a null model for the network which does not favor any particular partition. By comparing this null model to the true adjacency graph, we can perform a statistically optimal partitioning that maximizes modularity. Similarly to other modularity-based partitioning methods, the solution is found using spectral matrix decomposition. The process can be iterated to find multiple subgraphs. We demonstrate this approach through simulations and application to standard biological and other network data.
Article
How do real graphs evolve over time? What are normal growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs , identifying properties in a single snapshot of a large network or in a very small number of snapshots; these include heavy tails for in- and out-degree distributions, communities, small-world phenomena, and others. However, given the lack of information about network evolution over long periods, it has been hard to convert these findings into statements about trends over time. Here we study a wide range of real graphs, and we observe some surprising phenomena. First, most of these graphs densify over time with the number of edges growing superlinearly in the number of nodes. Second, the average distance between nodes often shrinks over time in contrast to the conventional wisdom that such distance parameters should increase slowly as a function of the number of nodes (like O (log n ) or O (log(log n )). Existing graph generation models do not exhibit these types of behavior even at a qualitative level. We provide a new graph generator, based on a forest fire spreading process that has a simple, intuitive justification, requires very few parameters (like the flammability of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study. We also notice that the forest fire model exhibits a sharp transition between sparse graphs and graphs that are densifying. Graphs with decreasing distance between the nodes are generated around this transition point. Last, we analyze the connection between the temporal evolution of the degree distribution and densification of a graph. We find that the two are fundamentally related. We also observe that real networks exhibit this type of relation between densification and the degree distribution.
Article
Holland and Leinhardt (1981) proposed the p1 model for the analysis of binary directed graph data in network studies. Such a model provides information about the “attractiveness” and “expansiveness” of the individual nodes in the network, as well as the tendency of a pair of nodes to reciprocate relational ties. When the nodes are a priori partitioned into subgroups based on attributes such as race and sex, the density of ties from one subgroup to another can differ considerably from that relating another pair of subgroups, thus creating a situation called blocking in social networks. The p1 model completely ignores this extra piece of information and is, therefore, unable to explain the block structure. Blockmodels that are simple extensions of the p1 model are proposed specifically for such data. An iterative scaling algorithm is presented for fitting the model parameters by maximum likelihood. The methodology is illustrated in detail on two empirical examples.
Article
A stochastic model is proposed for social networks in which the actors in a network are partitioned into subgroups called blocks. The model provides a stochastic generalization of the blockmodel. Estimation techniques are developed for the special case of a single relation social network, with blocks specified a priori. An extension of the model allows for tendencies toward reciprocation of ties beyond those explained by the partition. The extended model provides a one degree-of-freedom test of the model. A numerical example from the social network literature is used to illustrate the methods.
Article
The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of "authoritative" information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of "hub pages" that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for link-based analysis.
Article
Community structure has been found to exist ubiquitously in many different kinds of real world complex networks. Most of the previous literature ignores edge directions and applies methods designed for community finding in undirected networks to find communities. Here, we address the problem of finding communities in directed networks. Our proposed method uses PageRank random walk induced network embedding to transform a directed network into an undirected one, where the information on edge directions is effectively incorporated into the edge weights. Starting from this new undirected weighted network, previously developed methods for undirected network community finding can be used without any modification. Moreover, our method improves on recent work in terms of community definition and meaning. We provide two simulated examples, a real social network and different sets of power law benchmark networks, to illustrate how our method can correctly detect communities in directed networks.
Article
The network structure of a hypcrlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of "authoritative" information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of "hub pages" that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for link-based analysis.
Article
In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.
Article
The community structure of two real-world financial networks, namely the board network and the ownership network of the firms of the Italian Stock Exchange, is analyzed by means of the maximum modularity approach. The main result is that both networks exhibit a strong community structure and, moreover, that the two structures overlap significantly. This is due to a number of reasons, including the existence of pyramidal groups and directors serving in several boards. Overall, this means that the ''small world'' of listed companies is actually split into well identifiable ''continents'' (i.e., the communities).
Book
This concise and self-contained introduction builds up the spectral theory of graphs from scratch, with linear algebra and the theory of polynomials developed in the later parts. The book focuses on properties and bounds for the eigenvalues of the adjacency, Laplacian and effective resistance matrices of a graph. The goal of the book is to collect spectral properties that may help to understand the behavior or main characteristics of real-world networks. The chapter on spectra of complex networks illustrates how the theory may be applied to deduce insights into real-world networks. The second edition contains new chapters on topics in linear algebra and on the effective resistance matrix, and treats the pseudoinverse of the Laplacian. The latter two matrices and the Laplacian describe linear processes, such as the flow of current, on a graph. The concepts of spectral sparsification and graph neural networks are included.
Article
Community structures are found to exist ubiquitously in real-world complex networks. We address here the problem of community detection in directed networks. Most of the previous literature ignores edge directions and applies methods designed for community detection in undirected networks, which discards valuable information and often fails when different communities are defined on the basis of incoming and outgoing edges. We suggest extracting information about edge directions using a PageRank random walk and translating such information into edge weights. After extraction we obtain a new weighted directed network in which edge directions can then be safely ignored. We thus transform community detection in directed networks into community detection in reweighted undirected networks. Such an approach can benefit directly from the large volume of algorithms for the detection of communities in undirected networks already developed, since it is not obvious how to extend these algorithms to account for directed networks and the procedure is often difficult. Validations on synthetic and real-world networks demonstrate that the proposed framework can effectively detect communities in directed networks.
Article
We study the problem of clustering probabilistic graphs. Similar to the problem of clustering standard graphs, probabilistic graph clustering has numerous applications, such as finding complexes in probabilistic protein-protein interaction networks and discovering groups of users in affiliation networks. We extend the edit-distance based definition of graph clustering to probabilistic graphs. We establish a connection between our objective function and correlation clustering to propose practical approximation algorithms for our problem. A benefit of our approach is that our objective function is parameter-free. Therefore, the number of clusters is part of the output. We also develop methods for testing the statistical significance of the output clustering and study the case of noisy clusterings. Using a real protein-protein interaction network and ground-truth data, we show that our methods discover the correct number of clusters and identify established protein relationships. Finally, we show the practicality of our techniques using a large social network of Yahoo! users consisting of one billion edges.
Article
In this paper we extend and generalize the standard spectral graph theory (or random-walk theory) on undirected graphs to digraphs. In particular, we introduce and define a normalized digraph Laplacian (Diplacian for short) Γ for digraphs, and prove that (1) its Moore–Penrose pseudoinverse is the discrete Green’s function of the Diplacian matrix as an operator on digraphs, and (2) it is the normalized fundamental matrix of the Markov chain governing random walks on digraphs. Using these results, we derive a new formula for computing hitting and commute times in terms of the Moore–Penrose pseudoinverse of the Diplacian, or equivalently, the singular values and vectors of the Diplacian. Furthermore, we show that the Cheeger constant defined in [Chung 057. Chung , [Chung 05] F. R. K. 2005. “Laplacians and the Cheeger Inequality for Directed Graphs”. Annals of Combinatorics, 9: 1–19. [CrossRef], [Web of Science ®]View all references] is intrinsically a quantity associated with undirected graphs. This motivates us to introduce a metric, the largest singular value of the skewed Laplacian ∇=(Γ−Γ T )/2, to quantify and measure the degree of asymmetry in a digraph. Using this measure, we establish several new results, such as a tighter bound than that in [Chung 057. Chung , [Chung 05] F. R. K. 2005. “Laplacians and the Cheeger Inequality for Directed Graphs”. Annals of Combinatorics, 9: 1–19. [CrossRef], [Web of Science ®]View all references] on the Markov chain mixing rate, and a bound on the second-smallest singular value of Γ.
Article
This paper exploits recent contributions to the notions of modularity and autocatalytic sets to identify the functional and structural units that define the strongest systematic and self-sustaining channels of knowledge transfer and accumulation within the network of knowledge flows between technology fields. Our analysis reconstructs the architecture of the empirical knowledge pattern based on the United States Patent and Trademark Office (USPTO) patent citation data at the level of resolution of three-digit technology classes, for the period 1975-99. Copyright © 2009 The Authors. Journal compilation © 2009 Blackwell Publishing Ltd.
Chapter
In this chapter, we will provide a survey of clustering algorithms for graph data. We will discuss the different categories of clustering algorithms and recent efforts to design clustering methods for various kinds of graphical data. Clustering algorithms are typically of two types. The first type consists of node clustering algorithms in which we attempt to determine dense regions of the graph based on edge behavior. The second type consists of structural clustering algorithms, in which we attempt to cluster the different graphs based on overall structural behavior. We will also discuss the applicability of the approach to other kinds of data such as semi-structured data, and the utility of graph mining algorithms to such representations.
Article
We consider Laplacians for directed graphs and examine their eigenvalues. We introduce a notion of a circulation in a directed graph and its connection with the Rayleigh quotient. We then define a Cheeger constant and establish the Cheeger inequality for directed graphs. These relations can be used to deal with various problems that often arise in the study of non-reversible Markov chains including bounding the rate of convergence and deriving comparison theorems.
Article
In the last few years many real-world networks have been found to show a so-called community structure organization. Much effort has been devoted in the literature to develop methods and algorithms that can efficiently highlight this hidden structure of the network, traditionally by partitioning the graph. Since network representation can be very complex and can contain different variants in the traditional graph model, each algorithm in the literature focuses on some of these properties and establishes, explicitly or implicitly, its own definition of community. According to this definition it then extracts the communities that are able to reflect only some of the features of real communities. The aim of this survey is to provide a manual for the community discovery problem. Given a meta definition of what a community in a social network is, our aim is to organize the main categories of community discovery based on their own definition of community. Given a desired definition of community and the features of a problem (size of network, direction of edges, multidimensionality, and so on) this review paper is designed to provide a set of approaches that researchers could focus on.
Conference Paper
A local partitioning algorithm finds a set with small conductance near a specified seed vertex. In this paper, we present a generalization of a local partitioning algorithm for undirected graphs to strongly connected directed graphs. In particular, we prove that by computing a personalized PageRank vector in a directed graph, starting from a single seed vertex within a set S that has conductance at most α, and by performing a sweep over that vector, we can obtain a set of vertices S′ with conductance FM(S¢) = O(Ö{alog|S|})\Phi_{M}(S')= O(\sqrt{\alpha \log |S|}) . Here, the conductance function Φ M is defined in terms of the stationary distribution of a random walk in the directed graph. In addition, we describe how this algorithm may be applied to the PageRank Markov chain of an arbitrary directed graph, which provides a way to partition directed graphs that are not strongly connected.