Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Networks have become a key approach to understanding systems of interacting objects, unifying the study of diverse phenomena including biological organisms and human society. One crucial step when studying the structure and dynamics of networks is to identify communities: groups of related nodes that correspond to functional subunits such as protein complexes or social spheres. Communities in networks often overlap such that nodes simultaneously belong to several groups. Meanwhile, many networks are known to possess hierarchical organization, where communities are recursively grouped into a hierarchical structure. However, the fact that many real networks have communities with pervasive overlap, where each and every node belongs to more than one group, has the consequence that a global hierarchy of nodes cannot capture the relationships between overlapping groups. Here we reinvent communities as groups of links rather than nodes and show that this unorthodox approach successfully reconciles the antagonistic organizing principles of overlapping communities and hierarchy. In contrast to the existing literature, which has entirely focused on grouping nodes, link communities naturally incorporate overlap while revealing hierarchical organization. We find relevant link communities in many networks, including major biological networks such as protein-protein interaction and metabolic networks, and show that a large social network contains hierarchically organized community structures spanning inner-city to regional scales while maintaining pervasive overlap. Our results imply that link communities are fundamental building blocks that reveal overlap and hierarchical organization in networks to be two aspects of the same phenomenon.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The objective function directs the random search of OCDPSO-Net. In overlapping protein complexes, one protein is allowed to belong to more than one complex, which makes the conventional complex (or community) definitions unreasonable [36,43]. As a result, a new definition of community has been adopted and employed in many nature-inspired algorithms to detect overlapping communities. ...
... As a result, a new definition of community has been adopted and employed in many nature-inspired algorithms to detect overlapping communities. Ahn et al. [43] proposed the partition density for overlapping communities, which evaluates the edge density within communities. For a network with links, suppose = { 1 , … . ...
... Note that ∑ = =1 and ∑ =1 ≥ (assuming no unconnected nodes). Accordingly, refers to the link density of subset , Equation (1) shows its definition [36,43]: ...
Article
Full-text available
In today's world, the science of bioinformatics is developing rapidly, especially with regard to the analysis and study of biological networks. Scientists have used various nature-inspired algorithms to find protein complexes in protein-protein interaction (PPI) networks. These networks help scientists guess the molecular function of unknown proteins and show how cells work regularly. It is very common in PPI networks for a protein to participate in multiple functions and belong to many complexes, and as a result, complexes may overlap in the PPI networks. However, developing an efficient and reliable method to address the problem of detecting overlapping protein complexes remains a challenge since it is considered a complex and hard optimization problem. One of the main difficulties in identifying overlapping protein complexes is the accuracy of the partitioning results. In order to accurately identify the overlapping structure of protein complexes, this paper has proposed an overlapping complex detection algorithm termed OCDPSO-Net, which is based on PSO-Net (a well-known modified version of the particle swarm optimization algorithm). The framework of the OCDPSO-Net method consists of three main steps, including an initialization strategy, a movement strategy for each particle, and enhancing search ability in order to expand the solution space. The proposed algorithm has employed the partition density concept for measuring the partitioning quality in PPI network complexes and tried to optimize the value of this quantity by applying the line graph concept of the original graph representing the protein interaction network. The OCDPSO-Net algorithm is applied to a Collins PPI network and the obtained results are compared with different state-of-the-art algorithms in terms of precision ( ), recall ( ), and F-measure ( ). Experimental results confirm that the proposed algorithm has good clustering performance and has outperformed most of the existing recent overlapping algorithms. .
... Despite this substantial progress, previous connectome development studies have focused primarily on modular structure without spatial overlap (i.e., hard assignment), implicitly assuming that each brain node belongs to 1 single functional module. This assumption may be problematic because the modular structures of real-world networks, such as cooperation networks in social systems and protein networks in nature, generally show overlapping properties [29,30]. This overlapping modular framework in complex networks provides important insights into the potential diverse functional roles of nodes in the network. ...
... All MR images used here underwent strict quality control (see Materials and methods). We identified the overlapping modules in individualand group-level functional networks using an edge-centric module detection algorithm [30,44] (Fig 1B). Briefly, we first constructed a traditional functional network comprising nodal regions and interregional connections (i.e., edges). ...
... (ii) Edge graph corresponding to a given functional network. In this graph, each node denotes an edge in the functional network, and each link is defined as the similarity between edges in the connectivity profiles using Tanimoto coefficient [30,46]. For 2 given edges e ik and e jk that share a common node k, the interedge similarity was estimated as the similarity of connectivity profiles between node i and node j, wherein a i represents the modified connectivity profile of node i, a i � a j represents the dot product of 2 vectors a i and a j , and |a i | 2 denotes the sum of the squared weights of all connections of node i. (iii) Edge-centric module detection. ...
Article
Full-text available
The modular structure of functional connectomes in the human brain undergoes substantial reorganization during development. However, previous studies have implicitly assumed that each region participates in one single module, ignoring the potential spatial overlap between modules. How the overlapping functional modules develop and whether this development is related to gray and white matter features remain unknown. Using longitudinal multimodal structural, functional, and diffusion MRI data from 305 children (aged 6 to 14 years), we investigated the maturation of overlapping modules of functional networks and further revealed their structural associations. An edge-centric network model was used to identify the overlapping modules, and the nodal overlap in module affiliations was quantified using the entropy measure. We showed a regionally heterogeneous spatial topography of the overlapping extent of brain nodes in module affiliations in children, with higher entropy (i.e., more module involvement) in the ventral attention, somatomotor, and subcortical regions and lower entropy (i.e., less module involvement) in the visual and default-mode regions. The overlapping modules developed in a linear, spatially dissociable manner, with decreased entropy (i.e., decreased module involvement) in the dorsomedial prefrontal cortex, ventral prefrontal cortex, and putamen and increased entropy (i.e., increased module involvement) in the parietal lobules and lateral prefrontal cortex. The overlapping modular patterns captured individual brain maturity as characterized by chronological age and were predicted by integrating gray matter morphology and white matter microstructural properties. Our findings highlight the maturation of overlapping functional modules and their structural substrates, thereby advancing our understanding of the principles of connectome development.
... Networked systems tend to organize nodes into cohesive modules or communities, but identifying these communities is a challenging task in network research with broad applications in biological networks, social network modeling, and communication pattern analysis [1][2][3][4][5][6][7]. Protein-protein molecular interactions (PPIs) in every organism are regularly organized as networks, noted as protein-protein interaction networks (PPINs). ...
... The literature encompasses different complex detection methods based on meta-heuristic algorithms, mainly evolutionary algorithms (EAs). The EA-based complex detection methods are proved to be more reliable than their counterpart local-based complex detection methods such as Molecular Complex Detection (MCODE) [3], Purification of the bait proteins [4], Denseneighborhood Extraction using Connectivity and conFidence Features (DECAFF) [5], Repeated Random Walk (RRW) [6], Clustering-based on maximal cliques (CMC) [7], and Hierarchical Link Clustering [7,12]. ...
... The literature encompasses different complex detection methods based on meta-heuristic algorithms, mainly evolutionary algorithms (EAs). The EA-based complex detection methods are proved to be more reliable than their counterpart local-based complex detection methods such as Molecular Complex Detection (MCODE) [3], Purification of the bait proteins [4], Denseneighborhood Extraction using Connectivity and conFidence Features (DECAFF) [5], Repeated Random Walk (RRW) [6], Clustering-based on maximal cliques (CMC) [7], and Hierarchical Link Clustering [7,12]. ...
Preprint
Full-text available
Evolutionary algorithms are better than heuristic algorithms at finding protein complexes in protein-protein interaction networks (PPINs). Many of these algorithms depend on their standard frameworks, which are based on topology. Further, many of these algorithms have been exclusively examined on networks with only reliable interaction data. The main objective of this paper is to extend the design of the canonical and topological-based evolutionary algorithms suggested in the literature to cope with noisy PPINs. The design of the evolutionary algorithm is extended based on the functional domain of the proteins rather than on the topological domain of the PPIN. The gene ontology annotation in each molecular function, biological process, and cellular component is used to get the functional domain. The reliability of the proposed algorithm is examined against the algorithms proposed in the literature. To this end, a yeast protein-protein interaction dataset is used in the assessment of the final quality of the algorithms. To make fake negative controls of PPIs that are wrongly informed and are linked to the high-throughput interaction data, different noisy PPINs are created. The noisy PPINs are synthesized with a different and increasing percentage of misinformed PPIs. The results confirm the effectiveness of the extended evolutionary algorithm design to utilize the biological knowledge of the gene ontology. Feeding EA design with GO annotation data improves reliability and produces more accurate detection results than the counterpart algorithms.
... In network science, community detection is used to find groups of densely connected nodes and is widely exploited in topics like selective exposure, echo chambers, and polarization. To capture the multi-scale and hierarchical organization of real-world social networks [42,43], we conduct multi-scale scale community detection [44,45] on the influencer network (see details in the Materials and Method and section 4 in SI). We detect five significant levels (or scales). ...
... However, most studies detect communities using single-scale approaches, such as modularity optimization [53]. But real-world social networks are usually multi-scale and hierarchical [42,43]. Multi-scale community detection approaches allow for the discovery of communities at different resolution scales. ...
Preprint
Full-text available
Selective exposure, individuals' inclination to seek out information that supports their beliefs while avoiding information that contradicts them, plays an important role in the emergence of polarization. In the political domain, selective exposure is usually measured on a left-right ideology scale, ignoring finer details. Here, we combine survey and Twitter data collected during the 2022 Brazilian Presidential Election and investigate selective exposure patterns between the survey respondents and political influencers. We analyze the followship network between survey respondents and political influencers and find a multilevel community structure that reveals a hierarchical organization more complex than a simple split between left and right. Moreover, depending on the level we consider, we find different associations between network indices of exposure patterns and 189 individual attributes of the survey respondents. For example, at finer levels, the number of influencer communities a survey respondent follows is associated with several factors, such as demographics, news consumption frequency, and incivility perception. In comparison, only their political ideology is a significant factor at coarser levels. Our work demonstrates that measuring selective exposure at a single level, such as left and right, misses important information necessary to capture this phenomenon correctly.
... Complex urban systems achieve a shift from geographically multiscale to functionally multilevel characteristics as demographic, social, and economic linkages become more nested in cities (Bai et al., 2017;Tang et al., 2021). This perspective implies that each community does not serve a single function in isolation but shares multiple functions with others (Ahn, Bagrow, & Lehmann, 2010). Although cities often fulfill multiple global roles, such as cultural centers, economic hubs, or transportation nodes, urban studies frequently approach these functions in isolation, with limited exploration of their multilevel overlapping dynamics. ...
... Protein can be associated with complexes or functions, and in this way, protein belonging to divergent communities establishes large and complicated networks. It is evident that these overlapping communities exhibit a functionally multi-level network structure than their non-overlapping counterparts, which facilitates the investigation of complex network's nested patterns (Ahn et al., 2010). In light of this, the exploration of overlapping communities emerges as a pivotal refinement to conventional methodologies, aiming to capture the nuanced interplay of multiple identities and roles within urban networks. ...
Article
Full-text available
Overlapping structures, often overlooked, are crucial in shaping comprehensive urban development and broader megaregional strategies. To address the gap, this study conducts the overlapping communities analysis in the Pearl River Delta (PRD), a megaregion in South China, using big geospatial data from 2018. A novel Overlapping Community Detection based on Density Peaks (OCDDP) is employed to generate multiple communities with diverse functions for different nodes in the commuting network of 60 sub-city divisions. We identify eight overlapping communities in PRD characterized by two categories of communities predominantly centered around Shenzhen and Guangzhou, revealing a bicentric spatial structure. Notably, central sub-cities are characterized by a low-overlap attribute, while peripheral sub-cities manifest a high-overlap tendency. Furthermore, the study investigates the driving forces behind these communities through ridge regression to analyze the impacts of various spatial flows, including policies, investment amount and times, branch funding and number, travel cost, and travel distance, co-patenting, and search index. This part found that four Shenzhen-centric communities are primarily driven by travel cost, co-patenting, branch funding, and number, while the four Guangzhou-centric communities are influenced by co-patenting, investment amount, and times. This study emphasizes differentiated functional linkages and the need for precise policy positioning and resource allocation, paving the way for a coordinated and holistic approach to megaregional development.
... If the links that connect to a node here are part of more than one cluster, that node is said to be overlapping. Ahn et al. (2010) introduced the notion of link communities, which are sets of links that share similar connectivity patterns. They developed a method to identify link communities and demonstrated its applicability in capturing multiscale organizations in networks. ...
... Community detection techniques are essential for comprehending the complex dynamics and organizational structures of online social networks because they provide information about user interactions and behavior. One use is to identify unique social cliques or groupings inside these networks, Table 9 Illustrating how the computational cost of distinct methodologies correlates with parameters such as n (node count), m (link count), c (community count), s (community size), and k (average degree) in LFR benchmark graphs Algorithm (citation) Computational cost Girvan and Newman (2002) O(m 2 n) Fortunato (2010) O(m 3 n) Newman and Girvan (2004) O(m 2 n) O(n log 2 n) Radicchi et al. (2004) O(m 4 ∕n 2 ) Clauset et al. (2004) O(n log 2 N) Blondel et al. (2008) O(m) Pons and Latapy (2005) O(n 2 m) Pons-Latapy (Sparse) Pons and Latapy (2005) O(n 2 log N) Rosvall and Bergstrom (2008) O(m) Ahn et al. (2010) O(nk 2 max ) Guimera and Nunes Amaral (2005) parameter dependent Baumes ( which reveals hidden connections and common interests (Leskovec et al. 2008). The ability to identify powerful users who may drive trends and mold ideas inside communities is also made possible by these technologies, which help with influencer identification (Cha et al. 2009). ...
Article
Full-text available
Network science has made tremendous advances, allowing the modeling of complex real-world systems. Although networks include sophisticated community structures by definition, discovering and understanding these communities remains a challenging endeavor, compounded by the need to navigate a multidisciplinary terrain. This comprehensive survey serves as a contemporary guide and systematically presents an up-to-date survey of community detection methods, meticulously categorizing them for a comprehensive understanding. Through critical analysis, we assess the strengths, weaknesses, and performance metrics of various algorithms, ranging from classical techniques to cutting-edge methods designed to address the complexities of overlapping community detection. Additionally, we explore emerging trends in dynamic community detection, including techniques like Temporal Motif Analysis and Continuous-Time Models. We additionally present an in-depth evaluation of these methods, examining their performance on both artificial benchmarks and real-world networks. This review also sheds light on the diverse applications of community detection across domains such as sociology, biology, education, and technology. By presenting a holistic view of community detection, we aim to facilitate researchers’ and practitioners’ access to this crucial field, addressing the theoretical and practical application gaps, and contributing to the continued evolution of network science.
... Furthermore, the statistical significance of the predicted complexes is as- in (Ahn et al., 2010) and (Solava et al., 2012). These hCD efforts have been adopted in many research works. ...
... The first level of comparison is to make a competition between the proposed ECDs and some of the well-known local or hCD algorithms. The names and references of these hCDs are MCODE (Bader & Hogue, 2003), RNSC (King et al., 2004), CPM (Palla et al., 2005), LC (Ahn et al., 2010), MCL (Ray et al., 2016), OCG (Becker et al., 2012), ELC (Huang et al., 2013), and NDOCD (Ding et al., 2016). Performance evaluation in terms of complex-level detection ability (i.e. , , and ) is reported in this section. ...
... The second step, which was the adjustment to the original workflow and will be explained in more detail below, corresponds to the Differential Co-expression Network (DCN) construction (denoted as A) using Lasso regression (Tibshirani, 1996) with the LFC expression data. The third step was to identify the overlapping gene modules (denoted as F) using Hierarchical Link Clustering (HLC) (Ahn et al., 2010). The fourth and final step was to identify the modules closely related to the LFC in phenotypic traits using Lasso regression. ...
... Two approaches were tried to define the Pearson threshold (Aoki et al., 2007), but both resulted in particularly high threshold values. These high thresholds led to the formation of very dense networks, which, in turn, significantly increased the computational complexity of the subsequent workflow step: overlapping clustering using HLC (Ahn et al., 2010). ...
Article
Full-text available
Sugarcane, a prominent global crop utilized for sugar, bioethanol, and renewable bioenergy production, holds significant importance in Colombia. In 2022, it contributed to the production of 2.1 million tons of sugar, 347 million liters of bioethanol, and 1745 GWh of electrical energy. The cultivation of sugarcane worldwide faces vulnerability to drought stress induced by climate change, which significantly affects yields. Understanding how plants respond to drought involves complex interactions among genes, morphology, physiology, and biochemistry. However, these factors are often analyzed separately using methods such as Differentially Expressed Genes, comparative physiology, or metabolomics, thereby restricting the potential for broader insights that could arise from their further integration. This paper uses an improved version of Control-Stress data Integration with Overlapping Clustering (CSI-OC), a methodology that provides a comprehensive perspective by integrating diverse data types, in which a Lasso-based network helps in pinpointing stress-responsive genes. The objective of this study is to utilize CSI-OC to identify genes relevant to drought stress in Colombian sugarcane cultivars. To accomplish this goal, the study analyzes both leaf and root expression data, alongside four physiological parameters associated with leaf responses across different levels of drought stress. Computational evaluation indicates that the datasets are effectively processed using the CSI-OC workflow. This methodology demonstrates good performance in identifying genes strongly correlated with both the stress condition and the considered phenotypic traits. As stress levels increase, the number of genes selected by CSI-OC displays a contrasting pattern between leaves and roots. This observation implies a coordinated cascade of gene responses from leaves to roots with escalating stress levels, indicating a holistic adaptation strategy within the plant. Overall, the findings of this study underscore the effectiveness of CSI-OC as a comprehensive approach for identifying pertinent genes linked to drought stress in sugarcane.
... Despite this substantial progress, previous connectome development studies have focused primarily on modular structure without spatial overlap (i.e., hard assignment), implicitly assuming that each brain node belongs to one and only one functional module. This hypothesis might be problematic because the modular structures of real-world networks, such as cooperation networks in social systems and protein networks in nature, generally show overlapping properties [29,30]. This modular overlapping framework in complex networks provides important insight into the potential diverse functional roles of nodes in the network. ...
... For simplicity, we considered only directly connected edges that shared at least one common node, and the similarity between edges without common nodes was assumed to be zero. The Tanimoto coefficient [30,83] was used to incorporate the edge weight information. For a pair of edges eik and ejk that share a common node k, their similarity was defined based on the similarity in the connection profiles between nodes i and j: ...
Preprint
Full-text available
Developmental connectomic studies have shown that the modular organization of functional networks in the human brain undergoes substantial reorganization with age to support cognitive growth. However, these studies implicitly assume that each brain region belongs to one and only one specific network module, ignoring the potential spatial overlap between functional modules. How the overlapping functional modular architecture develops and whether this development is related to structural signatures remain unknown. Using longitudinal multimodal structural, functional, and diffusion MRI data from 305 children (aged 6–14 years), we investigated the development of the overlapping modular architecture of functional networks, and further explored their structural associations. Specifically, an edge-centric network model was used to identify the overlapping functional modules, and the nodal overlap in module affiliations was quantified using the entropy measure. We showed a remarkable regional inhomogeneity in module overlap in children, with higher entropy in the ventral attention, somatomotor, and subcortical networks and lower entropy in the visual and default-mode networks. Furthermore, the overlapping modules developed in a linear, spatially dissociable manner from childhood to adolescence, with significantly reduced entropy in the prefrontal cortex and putamen and increased entropy in the parietal lobules. Personalized overlapping modular patterns capture individual brain maturity as characterized by brain age. Finally, the overlapping functional modules can be significantly predicted by integrating gray matter morphology and white matter network properties. Our findings highlight the maturation of overlapping network modules and their structural substrates, thereby advancing our understanding of the principles of connectome development.
... Several approaches to community detection were proposed, which can be organized into two main classes: overlapping and non-overlapping methods. Overlapping techniques, such as those suggested by Shen et al. [22], Rees et al. [19], and Palla et al. [1], focused on recognizing communities where nodes may belong to multiple groups. For instance, the EAGLE algorithm by Shen et al. [22] identified maximal cliques as initial communities and then merged the most similar ones. ...
Article
Community detection in social networks is a significant area of research within Artificial Intelligence and social network analysis. The agglomerative method is a well-known approach used for detecting communities. This technique relies on local similarities, to form clusters between pairs of nodes in the graph. The process involves merging each pair of nodes that exhibits the highest similarity and then computing a quality function for the current clustering results; each such merge, along with the computation of the quality of results, constitutes a step or called iteration. The choice of a similarity function can play a crucial role in determining the number of iterations, which in turn affects the running time of an agglomerative method. This raises a fundamental question when employing an agglomerative approach: a worth-asking question is how to determine which similarity function to choose and why. To address this question, our paper delves into the comparison of two well-known similarity functions: structural similarity and hub-promoted similarity. We conducted a deep comprehensive theoretical analysis followed by an extensive experimentation on several datasets, focusing on the computational aspects of these functions. Notably, our findings highlight that in specific cases within the graph, the hub-promoted similarity function is faster compared to structural similarity.
... Our model is benchmarked against seven widely used clustering algorithms, each differing in analytical focus: 1) Kmeans Clustering; 2) Hierarchical Clustering [21]; 3) Link Clustering [26]; 4) Clique Percolation [25]; 5) Low-Rank Embedding [47]; 6) Multi-Assignment Clustering [48]. 7) Spectral Clustering; 8) Density-Based Clustering; and 9) Our model. ...
Article
Full-text available
In the realm of higher education, accurately assessing and ranking the quality of educational offerings is crucial. The paper introduces “Education Quality Ranker,” an innovative framework designed to rank colleges and universities on an Internet-scale by leveraging graph-learning techniques and a multi-view quality model. This model effectively categorizes and evaluates teaching approaches and preferences by identifying peer circles of instructors with similar attributes within educational contexts. The primary challenges addressed include the need for a model flexible enough to adapt to various instructional features across different educational environments and the significant variability in data availability concerning each instructor’s documented teaching practices. To overcome these challenges, the framework incorporates a geometry-based feature selector that identifies high-quality features indicative of each instructor’s teaching genre. Utilizing a sophisticated probabilistic model, it represents each instructor’s attributes as a distribution within a latent space, enabling a nuanced understanding of instructional styles. Furthermore, the framework constructs a graph that mirrors the instructional similarities among educators, facilitating the identification of densely connected subgraphs or “circles” of instructors with shared teaching attributes. By mining these instructor circles, the Education Quality Ranker can not only score each university’s education performance accurately but also optimize educational quality holistically. The efficacy of this approach is underscored by experiments conducted on a dataset encompassing a vast number of education instructors from 33 well-known colleges/universities, demonstrating the model’s capability to delineate distinct instructional genres accurately and enhance university rankings.
... To validate our approach, we compare our model against seven renowned clustering algorithms, each emphasizing different analytical dimensions: 1) K-means Clustering; 2) Hierarchical Clustering [45]; 3) Link Clustering [51]; 4) Clique Percolation [50]; 5) Low-Rank Embedding [52]; 6) Multi-Assignment Clustering [53], with each configured to identify 20 clusters using identical features to those employed in our VOLUME 11, 2023 7 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and Table 2, reveal: 1) Our model outperforms the other clustering methods in 18 of the 20 genres, evidenced by superior BER scores. ...
Article
Full-text available
We have developed a classification method that segments a large group of Amazon or TikTok users into distinct categories, or “genres,” based on their shared purchasing behaviors, such as preferences for consumer electronics or household items. Our approach uses a geometry-based feature selection strategy to accurately capture each user's buying patterns, which are characterized by a range of features, including those learned under weak supervision. These features are then refined through two feature selection processes tailored to different application needs. We also use a probabilistic model to represent each user's buying preferences as a distribution within a hidden feature space. To map the purchasing connections between users, we construct a graph and apply a specialized algorithm to identify tightly connected subgroups. These subgroups reflect shared purchasing habits, allowing us to categorize users into specific genres. Finally, a ranking model is used to recommend products to users based on these genres. We validated the effectiveness of our recommendation system using a dataset of over one million Amazon users, showing that it accurately identifies and classifies distinct purchasing genres.
... In this analysis, our approach is compared with seven established clustering techniques, each varying in their emphasis on network structure, doctor profile data, or a combination of both: 1) K-means Clustering; 2) Hierarchical Clustering [45]; 3) Link Clustering [46]; 4) Clique Percolation [47]; 5) Low-Rank Embedding [48]; 6) Multi-Assignment Clustering [49]. All algorithms were tested with the number of clusters set to 20 and using the same low and high-level visual features as our model. ...
Article
Full-text available
The study of genre preferences of different doctors by leveraging their handled computed tomography (CT) images from multiple hospital databases offers a compelling field of AI-based medical treatment, especially in terms of deciphering and categorizing different diseases and the diseased regions. This research introduces a technique to categorize a large number of medical images into defined groups or "circles" based on common genre preferences, such as “heart diseases” or “osteoarthritis”. Our findings underscore two primary factors: 1) the necessity for an adaptable genre model that can adjust to varied visual characteristics depending on the database, and 2) the inconsistency in the volume of CT images per hospital, with some possessing very few image collection for some particular diseases (for example, a heart hospital has very few joint fracture CT images). To tackle these issues, we propose a regularized latent probabilistic model to depict each doctor’s genre-relevant features as a distribution within a latent manifold space. We subsequently develop a graph that reflects the genre similarities among different doctors. Employing an advanced technique for densely connected graph discovery, we are able to cluster doctors with similar genre preferences into circles. Our experimental outcomes, based on a review of CT image datasets from over 6000 hospitals, validate the effectiveness of our method in precisely distinguishing doctors/users between different genre types. Based on this, various applications can be enhanced.
... To compare these measurements, the study uses synthetic networks with regulated parameters and many real-world networks. Ahn et al. (2010) proposed a method where links are partitioned via hierarchical clustering of edge similarity. Lancichinetti et al. ...
Article
Full-text available
Influential node detection is crucial for understanding and managing real-world networks, such as social, biological, and information networks, as it helps identify key participants and network dynamics. This paper introduces a novel approach, the Internal-External Overlapping Community Detection method, which aims to uncover overlapping communities within networks to identify influential nodes. We propose a new metric, the Influence Detection Factor, designed to pinpoint nodes that significantly impact network behavior and evolution. By examining state transitions in time-varying networks, our approach provides valuable insights into how these influential nodes drive changes over time, contributing to a deeper understanding of network resilience and adaptation. As a case study, we analyze the spread of COVID-19 in India, where provinces are represented as nodes and number of cases as edge weights. We compare our method against traditional centrality measures, such as degree, closeness, and betweenness centrality, demonstrating that our approach aligns more closely with real-world epidemiological data and offers superior modularity, and higher F1 scores. Our experimental results underscore the efficacy of the proposed method in capturing and forecasting network behavior, making it a powerful tool for dynamic social network analysis and providing actionable insights for public health interventions and epidemic management strategies.
... Therefore, accurate quantification of node complexity can reveal the role of nodes within a network well and help to predict network emergent phenomena [6]. In addition, the complexity of a node can be defined as the extent to which it participates in organized, structured interactions [7]. ...
Article
Full-text available
Accurately quantifying the complexity of nodes in a network is crucial for revealing their roles and network complexity, as well as predicting network emergent phenomena. In this paper, we propose three novel complexity metrics for nodes to reflect the extent to which they participate in organized, structured interactions in higher-order networks. Our higher-order network is built using the BuildHON+ model, where communities are detected using the Infomap algorithm. Since a physical node may contain one or more higher-order nodes in higher-order networks, it may simultaneously exist in one or more communities. The complexity of a physical node is defined by the number and size of the communities to which it belongs, as well as the number of higher-order nodes it contains within the same community. Empirical flow datasets are used to evaluate the effectiveness of the proposed metrics, and the results demonstrate their efficacy in characterizing node complexity in higher-order networks.
... To ensure the identified research topics are meaningful 13,52,60 , we include only topic communities with at least 10 nodes in this study. Following this procedure, the co-citation network also retains the majority of papers with citations from both the mentor and mentee ( Supplementary Fig. S2). ...
Preprint
Full-text available
In science, mentees often follow their mentors' career paths, but exceptional mentees frequently break from this routine, sometimes even outperforming their mentors. However, the pathways to independence for these excellent mentees and their interactions with mentors remain unclear. We analyzed the careers of over 500,000 mentees in Chemistry, Neuroscience, and Physics over the past 60 years to examine the strategies mentees employ in selecting research topics relative to their mentors, how these strategies evolve, and their resulting impact. Utilizing co-citation network analysis and a topic-specific impact allocation algorithm, we mapped the topic territory for each mentor-mentee pair and quantified their academic impact accrued within the topic. Our findings reveal mentees tend to engage with their mentors' less-dominated topics and explore new topics at the same time, and through this exaptive process, they begin to progressively establish their own research territories. This trend is particularly pronounced among those who outperform their mentors. Moreover, we identified an inverted U-shaped curve between the extent of topic divergence and the mentees' long-term impact, suggesting a moderate divergence from the mentors' research focus optimizes the mentees' academic impact. Finally, along the path to independence, increased coauthorship with mentors impedes the mentees' impact, whereas extending their collaboration networks with the mentors' former collaborators proves beneficial. These findings fill a crucial gap in understanding how mentees' research topic selection strategies affect academic success and offer valuable guidance for early-career researchers on pursuing independent research paths.
... Putting emphasis on the edge scores is favorable, because then common network-based algorithms for community detection can be used. Within the PICASO framework several community detection methods are already implemented, such as connected components or greedy modularity communities, as well as methods like Louvain (for positive and positive+negative edges), Link clustering [35], Markov clustering [36] or infomap [37,38]. The PICASO network being implemented as a networkx graph, any community detection method able to operate on networkx graphs can be used for this purpose. ...
Preprint
Full-text available
Various single-cell modalities covering transcriptomics, epigenetic and spatio-temporal changes in health and disease phenotypes are used in an exploratory way to understand biological systems at single-cell resolution. However, the vast amount of such single-cell data is not systematically linked to existing biomedical data. Networks have previously been used to represent harmonized biomedical data. Integrating various resources of biomedical data in networks has recently received increasing attention. These aggregated networks can provide additional insight into the biology of complex human diseases at cell-type level, however, lack inclusion of single cell expression data. Here, we present the PICASO framework, which incorporates single-cell gene expression data as an additional layer to represent associations between cell types, disease phenotypes, drugs and genes. The PICASO network includes several standardized biomedical databases such as STRING, Uniprot, GeneOntology, Reactome, OmniPath and OpenTargets. Using multiple cell type-specific instances of the framework, each annotated and scored with their respective expression data, comparisons between disease states can be made by computing respective sub-networks and comparing the expression scores between conditions. Ultimately, these group-specific networks will allow the identification of relevant genes, processes and potentially druggable targets, as well as the comparison of different measured groups and thus the identification of group-specific communities and interactions.
... It involves the identification of densely connected groups of nodes, often called communities or clusters, within a larger network. These communities often represent functional units, such as groups of friends in social networks, related proteins in biological networks, or coherently connected web pages in the World Wide Web [9,1,18]. Uncovering communities is crucial for understanding the structure and function of networks, for instance to infer hidden patterns in social and biological systems but also to identify structural patterns hindering diffusion on networks [16]. ...
Preprint
Full-text available
In this article, we consider the problem of community detection in signed networks. We propose SignedLouvain, an adaptation of the Louvain method to maximise signed modularity, efficiently taking advantage of the structure induced by signed relations. We begin by identifying the inherent limitations of applying the standard Louvain algorithm to signed networks, before introducing a novel variant specifically engineered to overcome these challenges. Through extensive experiments on real-world datasets, we demonstrate that the proposed method not only maintains the speed and scalability of its predecessor but also significantly enhances accuracy in detecting communities within signed networks.
... Overlapping muscle clusters (e.g. edge colours in Fig.2.2(A) represent different functional muscle groups and illustrate muscles affiliated with more than one group) were then identified for each layer in the multiplex networks using a link-based community detection protocol that assigns each network connection to a cluster with the aim of maximising the number of possible links within each cluster ( Fig.2.2(B)) (Ahn et al., 2010). A consensus partition was found across kinematics and participants by aggregating the clusterings from each layer into a single adjacency matrix and applying a conventional network community detection approach based on the Louvain algorithm (Blondel et al., 2008;Rubinov & Sporns, 2010). ...
Preprint
Full-text available
Current clinical assessment tools don’t fully capture the genuine neural deficits experienced by chronic stroke survivors and, consequently, they don’t fully explain motor function throughout everyday life. Towards addressing this problem, here we aimed to characterise post-stroke alterations in upper-limb control from a novel perspective to the muscle synergy by applying, for the first time, a computational approach that quantifies diverse types of functional muscle interactions (i.e. functionally-similar (redundant), -complementary (synergistic) and -independent (unique)). From single-trials of a simple forward pointing movement, we extracted networks of functionally diverse muscle interactions from chronic stroke survivors and unimpaired controls, identifying shared and group-specific modules across each interaction type (i.e redundant, synergistic and unique). Reconciling previous studies, we found evidence for both the concurrent preservation of healthy functional modules post-stroke and muscle network structure alterations underpinned by systemic muscle interaction reweighting and functional reorganisation. Cluster analysis of stroke survivors revealed two distinct patient subgroups from each interaction type that all distinguished less impaired individuals who were able to adopt novel motor patterns different to unimpaired controls from more severely impaired individuals who did not. Our work here provides a nuanced account of post-stroke functional impairment and, in doing so, paves new avenues towards progressing the clinical use case of muscle synergy analysis.
... In the context of networks, the notion of structural synergy is something of an inversion of the usual approach to analysing complex networks. Typically, analyses focus on individual nodes (first-order structures), or pairwise edge-centric perspectives 50,51 (second order structures). In contrast, the structural synergy takes a top-down approach, describing the irreducible structure in the whole as a function of the joint-state of all of the parts. ...
Article
Full-text available
In the last decade, there has been an explosion of interest in the field of multivariate information theory and the study of emergent, higher-order interactions. These “synergistic” dependencies reflect information that is in the “whole” but not any of the “parts.” Arguably the most successful framework for exploring synergies is the partial information decomposition (PID). Despite its considerable power, the PID has a number of limitations that restrict its general applicability. Subsequently, other heuristic measures, such as the O-information, have been introduced, although these measures typically only provide a summary statistic of redundancy/synergy dominance, rather than direct insight into the synergy itself. To address this issue, we present an alternative decomposition that is synergy-first, scales much more gracefully than the PID, and has a straightforward interpretation. We define synergy as that information encoded in the joint state of a set of elements that would be lost following the minimally invasive perturbation on any single element. By generalizing this idea to sets of elements, we construct a totally ordered “backbone” of partial synergy atoms that sweeps the system’s scale. This approach applies to the entropy, the Kullback-Leibler divergence, and by extension, to the total correlation and the single-target mutual information (thus recovering a “backbone” PID). Finally, we show that this approach can be used to decompose higher-order interactions beyond information theory by showing how synergistic combinations of edges in a graph support global integration via communicability. We conclude by discussing how this perspective on synergistic structure can deepen our understanding of part-whole relationships in complex systems.
... Link partitioning algorithms try to partition links to find community structure. Ahn et al. [27] proposed a method where links are partitioned via hierarchical clustering of edge similarity. Lancichinetti et al. ...
Preprint
Full-text available
Influential node detection is a crucial task in real-world networks, such as social networks, biological networks, or information networks. It helps us to understand network dynamics as well as identify important participants. A commonly used method for detecting important nodes involves the utilisation of community detection, which seeks to divide the network into clusters of nodes that reflect high levels of connectivity. The Overlapping Community Detection approach named as Internal-External overlapping Community Detection method has been proposed to find different communities. We have introduced a new factor namely Influence Detection Factor that is used to determine influential node of a network. State transition of every node occurred in a time-varying network. State transition based on the influential node can be useful for tracking different frames of every influential node with time. Dynamic Social network analysis in disease spreading is an emerging field of research. Disease Network can be considered a real-world application in various contexts. We have considered COVID-19 India data as a case study. Each node and edge is depicted as a province/district and the number of COVID-19 cases, in the network respectively. Edge Weight is calculated using our proposed metric. We have used different centrality measures, such as degree centrality, closeness centrality, betweenness centrality, etc., to compare our metrics over real-world datasets. The observations obtained through our proposed method demonstrate a significant correlation with the real-time occurrences. Experimental results of our proposed methodology demonstrate better results in comparison to other existing approaches.
... • Linkcomm (Ahn, Bagrow and Lehmann, 2010) -A link partitioning algorithm, which discovers community structures by partitioning links in the network. The Linkcomm algorithm partitions links based on their edge similarities and then uses hierarchical clustering method to yield links for different communities. ...
Preprint
Scientific collaboration is a significant behavior in knowledge creation and idea exchange. To tackle large and complex research questions, a trend of team formation has been observed in recent decades. In this study, we focus on recognizing collaborative teams and exploring inner patterns using scholarly big graph data. We propose a collaborative team recognition (CORE) model with a "core + extension" team structure to recognize collaborative teams in large academic networks. In CORE, we combine an effective evaluation index called the collaboration intensity index with a series of structural features to recognize collaborative teams in which members are in close collaboration relationships. Then, CORE is used to guide the core team members to their extension members. CORE can also serve as the foundation for team-based research. The simulation results indicate that CORE reveals inner patterns of scientific collaboration: senior scholars have broad collaborative relationships and fixed collaboration patterns, which are the underlying mechanisms of team assembly. The experimental results demonstrate that CORE is promising compared with state-of-the-art methods.
... The metrics of the first group analyze the quality of disjoint community structures. The considered metrics of this group are modularity [39], ExtD [58], coverage [59], AVI [60], and density [61]. On the other hand, generalized softmodularity [41] (the only metric of the second group) measures the quality of the overlapping community structures. ...
Article
Fuzzy community detection (FCD) aims to reveal the community structure by allocating quantitative values to nodes across different communities. This article proposes a fast FCD approach called the Expandable Local Community based Fuzzy Community (XLoCoFC) detection method based on max-membership degree propagation (max-MDP) and normalized peripheral similarity index ( n PSI). Initially, nodes having comparatively higher n PSI values are considered as topologically dominating nodes and selected as seeds. For an initial community, called local community, seed’s n PSI values from the respective neighbors’ peripheries are utilized as the neighbors’ membership degrees. Then an iterative process propagates max-membership degrees from nodes to nodes, and n PSI values are used as factors in the propagation. In this propagation, local communities having more dominating nodes expand and others contract. The propagation process converges very quickly. Such simplicity in its design makes our proposed XLoCoFC approach to be very fast in finding community structures on large networks. Time complexity of the proposed approach is O ( nd 2 xlog 2 d +klq) which is significantly less than the majority of the FCD algorithms, for whom it is either O ( n 2 ) or more. Moreover, XLoCoFC has no dependence on any network feature. It does not require tuning of any parameter which may impact its output. To demonstrate the working of the proposed XLoCoFC approach, we conduct extensive performance analysis comparatively by executing a set of existing approaches on several popular real-life and synthetic networks with number of nodes ranging from 24 to 1134 890. Evaluation of the results considering the accuracy and quality metrics as well as a group MCDM technique clearly establishes the superiority of our approach over others.
... A key feature of a networked system is the general tendency toward organizing nodes hierarchically into multiple cohesive modules or communities. However, identifying such communities is a challenging problem in network research, with applications in biological networks, social network modeling, and communication pattern analysis [1][2][3][4][5][6][7]. Proteins that control and mediate many biological activities by regulating and supporting one another through their interactions form biological networks [1,8]. ...
... A key feature of a networked system is the general tendency toward organizing nodes hierarchically into multiple cohesive modules or communities. However, identifying such communities is a challenging problem in network research, with applications in biological networks, social network modeling, and communication pattern analysis [1][2][3][4][5][6][7]. Proteins that control and mediate many biological activities by regulating and supporting one another through their interactions form biological networks [1,8]. ...
Article
Full-text available
One of the recent significant but challenging research studies in computational biology and bioinformatics is to unveil protein complexes from protein-protein interaction networks (PPINs). However, the development of a reliable algorithm to detect more complexes with high quality is still ongoing in many studies. The main contribution of this paper is to improve the effectiveness of the well-known modularity density ( ) model when used as a single objective optimization function in the framework of the canonical evolutionary algorithm (EA). To this end, the design of the EA is modified with a gene ontology-based mutation operator, where the aim is to make a positive collaboration between the modularity density model and the proposed gene ontology-based mutation operator. The performance of the proposed EA to have a high quantity and quality of the detected complexes is assessed on two yeast PPINs and compared with two benchmarking gold complex sets. The reported results reveal the ability of modularity density to be more productive in detecting more complexes with high quality when teamed up with a gene ontology-based mutation operator.
... Another area for future work is to adapt LS to find halo nodes residing at the boundary of two or more communities (e.g., node d in Fig. 1), detect overlapping communities 13 potentially by producing line graphs [86][87][88] or clique graphs 58 , and identify critical link responsible for the merging or splitting dynamics of communities 61 . Another point that could be improved is when two or more local leaders are equivalent on both degree and distance to a node. ...
Article
Full-text available
Clusters or communities can provide a coarse-grained description of complex systems at multiple scales, but their detection remains challenging in practice. Community detection methods often define communities as dense subgraphs, or subgraphs with few connections in-between, via concepts such as the cut, conductance, or modularity. Here we consider another perspective built on the notion of local dominance, where low-degree nodes are assigned to the basin of influence of high-degree nodes, and design an efficient algorithm based on local information. Local dominance gives rises to community centers, and uncovers local hierarchies in the network. Community centers have a larger degree than their neighbors and are sufficiently distant from other centers. The strength of our framework is demonstrated on synthesized and empirical networks with ground-truth community labels. The notion of local dominance and the associated asymmetric relations between nodes are not restricted to community detection, and can be utilised in clustering problems, as we illustrate on networks derived from vector data.
... 4) Hierarchy Clustering (H-C) [28], which constructs nest clustering points through combining/dividing samples successively. Link Clustering (L-C) [29], focusing on the clustering of links rather than nodes in a network. 5) Clique Percolation (C-P) [30], a technique identifying clusters based on overlapping cliques. ...
Article
Full-text available
Investigating credit risk in contemporary entrepreneurial debt analysis is the key for maintaining financial stability. Entrepreneurial debt management, facilitated by innovative financing models, is significant in fostering economic as well as social development. However, due to certain imperfections in these financial systems, this model is exposed to considerable risks. To precisely predict the risk level of each entrepreneurial debt, this paper presents a novel methodology combining a massive-scale affinity graph construction and efficient subgraph mining for categorizing each entrepreneurial debt into one of the multiple debt-level subgraphs. This approach marks a substantial contribution to the advancement and generalization of entrepreneurial debt management systems. For extensively testifying our approach, we collected a huge data set containing entrepreneurial debt information from nearly 1.41 million company with different sizes throughout the world. Our experimental results showed that our designed method achieves an impressive 97.93% accuracy in predict different risk levels in entrepreneurial debt management, highlighting its efficacy and potential for broad implementation.
... Different variants of perturbation operators and objective functions are suggested in the literature. Examples of some well-known heuristic-based complex detection algorithms are: Molecular Complex Detection (MCODE) [4], purification of the bait proteins [5], denseneighborhood extraction using connectivity and confidence features (DECAFF) [6], repeated random walks (RRW) [7], clustering-based on maximal cliques (CMC) [8], and hierarchical link clustering [9,10]. Although these approaches have been widely adopted, they have limited accuracy and inferior performance as compared with evolutionary-based complex detection algorithms. ...
Article
Full-text available
Binary relations or interactions among bio-entities, such as proteins, set up the essential part of any living biological system. Protein-protein interactions are usually structured in a graph data structure called "protein-protein interaction networks" (PPINs). Analysis of PPINs into complexes tries to lay out the significant knowledge needed to answer many unresolved questions, including how cells are organized and how proteins work. However, complex detection problems fall under the category of non-deterministic polynomial-time hard (NP-Hard) problems due to their computational complexity. To accommodate such combinatorial explosions, evolutionary algorithms (EAs) are proven effective alternatives to heuristics in solving NP-hard problems. The main aim of this study is to make a close examination of the performance of the EAs where modularity and modularity density are selected as two different objective functions. Topology-based modularity and topology-based modularity density are designed to examine the detection ability of the EAs and to compare their performance. To conduct the experiments, two yeast Saccharomyces cerevisiae PPINs are used and evaluated under nine evaluation metrics. The results reveal the potential impact of the topology-based modularity density to outperform the counterpart modularity functions in almost all evaluation metrics.
... The connectivity matrix of the mouse brain atlas (213 × 213 size) is clustered into sub-clusters for an easier simulation. The Tanimoto clustering algorithm is selected as the main method to group all connections (Ahn et al., 2010;Kalinka and Tomancak, 2011), which could be concluded as the following Equation 1, where S(e i,k , e j,k ) represents the similarity between links e i,k and e j,k that share a node k: ...
Article
Full-text available
The brain topology highly reflects the complex cognitive functions of the biological brain after million-years of evolution. Learning from these biological topologies is a smarter and easier way to achieve brain-like intelligence with features of efficiency, robustness, and flexibility. Here we proposed a brain topology-improved spiking neural network (BT-SNN) for efficient reinforcement learning. First, hundreds of biological topologies are generated and selected as subsets of the Allen mouse brain topology with the help of the Tanimoto hierarchical clustering algorithm, which has been widely used in analyzing key features of the brain connectome. Second, a few biological constraints are used to filter out three key topology candidates, including but not limited to the proportion of node functions (e.g., sensation, memory, and motor types) and network sparsity. Third, the network topology is integrated with the hybrid numerical solver-improved leaky-integrated and fire neurons. Fourth, the algorithm is then tuned with an evolutionary algorithm named adaptive random search instead of backpropagation to guide synaptic modifications without affecting raw key features of the topology. Fifth, under the test of four animal-survival-like RL tasks (i.e., dynamic controlling in Mujoco), the BT-SNN can achieve higher scores than not only counterpart SNN using random topology but also some classical ANNs (i.e., long-short-term memory and multi-layer perception). This result indicates that the research effort of incorporating biological topology and evolutionary learning rules has much in store for the future.
... To cater to practical demands, numerous methodologies have been developed to effectively uncover the underlying community structure. However, in the real world, communities often exhibit overlap [2], where nodes can belong to multiple groups simultaneously. This overlapping nature introduces complexity in analyzing the edge structure of nodes, rendering the detection of overlapping communities a formidable challenge. ...
Article
Full-text available
Over the past two decades, community detection has been extensively explored. Yet, the problem of identifying overlapping communities has not been fully solved. In this paper, we introduce a novel approach, called the generalized stochastic block model, to address this issue by allowing nodes to belong to multiple communities. This approach extends the traditional representation of nodal community assignment from a single community label to a label vector, with each element indicating the membership of a node in a specific community. We develop a Markov chain Monte Carlo algorithm to tackle the model. Through numerical experiments conducted on synthetic and empirical networks, we demonstrate the efficacy of the proposed framework in accurately detecting overlapping communities.
... These functions are used to iteratively evaluate how the generated solution should improve. For example, the cost function is based on the numbers of intra-cluster and inter-cluster connections in [2], local neighborhood density, and partition density in [3], [4], [5], and [6]. In these methods, scholars focused on a narrow aspect of the heuristic framework to only underscore the merits of topological characteristics to generate interconnected sub-graphs. ...
Article
By definition, the detection of protein complexes that form protein-protein interaction networks (PPINs) is an NP-hard problem. Evolutionary algorithms (EAs), as global search methods, are proven in the literature to be more successful than greedy methods in detecting protein complexes. However, the design of most of these EA-based approaches relies on the topological information of the proteins in the PPIN. Biological information, as a key resource for molecular profiles, on the other hand, acquired a little interest in the design of the components in these EA-based methods. The main aim of this paper is to redesign two operators in the EA based on the functional domain rather than the graph topological domain. The perturbation mechanism of both crossover and mutation operators is designed based on the direct gene ontology annotations and Jaccard similarity coefficients for the proteins. The results on yeast Saccharomyces cerevisiae PPIN provide a useful perspective that the functional domain of the proteins, as compared with the topological domain, is more consistent with the true information reported in the Munich Information Center for Protein Sequence (MIPS) catalog. The evaluation at both complex and protein levels reveals that feeding the components of the EA with biological information will imply more accurate complex structures, whereas topological information may mislead the algorithm towards a faulty structure.
... A recent contribution uses multi-objective Genetic Algorithm and Fuzzy theory [31]. A link clustering algorithm for overlapping communities has also been proposed by Ahn [3,71]. Evans has suggested that any algorithm that is capable of producing a partition of nodes, may be used for producing a partition of links [15]. ...
Article
Full-text available
Given the nature of time series and their vast applications, it is essential to find clustering algorithms that depict their real-life properties. Among the features that can hugely effect the options available for time series are overlapping and hierarchical properties. In this paper a novel approach to analyze time series with such features is introduced. Using the two concepts of network construction and link community detection, we have attempted to analyze and identify the mentioned properties of time series using data that is often gathered first hand. The proposed algorithm has been applied using both recent and common similarity measures on ten synthetic time series with hierarchal and overlapping features, alongside various distance measures. When testing the proposed approach, the element-centric measure of similarity indicated a clear increased accuracy for this algorithm, showing the highest accuracy when used alongside the Dynamic Time Warping distance measure. Moreover, the proposed algorithm has been very successful in identifying and forming communities for both large and small time series, thus solving another one of the main issues previous algorithms tended to have.
... In complex network theory, there are overlaps among communities in many networks, and some vertices belong to more than one community (Ahn et al. 2010;Palla et al. ( ...
Article
Full-text available
The rationality of product module partition is crucial to the success of modular design. The correlations between components of complex products are complex, increasing the difficulty of module partition. Thus, many existing methods of module partition have difficulty realizing this process effectively for complex products with a large number of components. This paper proposes a module partition method for complex products based on stable overlapping community detection and overlapping component allocation. The correlations between components are analyzed to obtain a comprehensive correlation strength matrix. The undirected weighted network is used to represent components and the correlations between them. A stable overlapping community detection algorithm based on the improved judgement of within-community Shapley values is proposed to generate multiple preliminary schemes of module partition. Overlapping components among modules are allocated to the most suitable modules by adopting a genetic algorithm (GA). The scheme with the largest modularity measure Q is selected as the final scheme of module partition. The proposed method is applied to a computer numerical control (CNC) grinding machine. The proposed module partition method for complex products is demonstrated to be superior to other effective methods.
Article
Much of the complexity of social, biological, and engineering systems arises from the complicated interactions among the entities in the corresponding networks. A number of network analysis tools have been successfully used to discover latent structures termed communities in such networks. However, some communities with relatively weak structures can be difficult to uncover because they are obscured by other stronger connections. To cope with this situation, our previous work proposes an algorithm called HICODE to detect and amplify the dominant and hidden community structures. In this work, we conduct a comprehensive and systematic theoretical analysis on the impact of hidden community structure and the efficacy of the HICODE algorithm, as well as provide illustrations of the detection process and results. Specifically, we define a multi-layer stochastic block model, and use this model to explain why the existence of hidden structure makes the detection of dominant structure harder than equivalent random noises, which can also explain why many community detection algorithms only focusing on the dominant structure do not work well as expected. We then provide theoretical analysis that the iterative reducing methods could help to enhance the discovery of hidden structure as well as the dominant structure in the multi-layer stochastic block model for the two cases of accurate and inaccurate detection. Finally, visual simulations and experimental results are presented to show the process of HICODE algorithm and the impact of different number of layers on the detection quality.
Article
Community detection is a data analysis method used to reveal the aggregation behavior of the network. This paper improves the COPRA algorithm and proposes a PECOPRA algorithm with better performance to solve the problem. R-mcl similarity coefficient matrix is calculated in the pre-processing operation, the Pearson correlation matrix representing the node relationship is calculated, and the Pearson correlation matrix is filtered to obtain the result matrix. On this basis, the COPRA algorithm is used to calculate and map the community, and the extended modularity redivides the boundary nodes to improve the community partition accuracy. PECOPRA improves the accuracy of community detection quality and has better performance.
Article
In recent years, brain signal complexity has gained attention as an indicator of brain well-being and a predictor of disease and dysfunction. Brain entropy quantifies this complexity. Assessment of functional network centrality and connectivity reveals that information communication induces neural signal oscillations in certain brain regions. However, their relationship is uncertain. This work studied brain signal complexity, network centrality, and connectivity in both healthy and depressed individuals. The current work comprised a sample of 124 first-episode drug-naïve patients with major depressive disorder (MDD) and 105 healthy controls (HC). Six functional networks were created for each person using resting-state functional magnetic resonance imaging. For each network, entropy, centrality, and connectivity were computed. Using structural equation modeling, this study examined the associations between brain network entropy, centrality, and connectivity. The findings demonstrated substantial correlations of entropy with both centrality and connectivity in HC and these correlation patterns were disrupted in MDD. Compared to HC, MDD exhibited higher entropy in four networks and demonstrated changes in centralities across all networks. The structural equation modeling showed that network centralities, connectivity, and depression severity had impacts on brain entropy. Nevertheless, no impacts were observed in the opposite directions. This study indicated that the complexity of brain signals was influenced not only by the interactions among different areas of the brain but also by the severity level of depression. These findings enhanced our comprehension of the associations of brain entropy with its influential factors.
Preprint
Full-text available
As networks grow in size and complexity, backbones become an essential network representation. Indeed, they provide a simplified yet informative overview of the underlying organization by retaining the most significant and structurally influential connections within a network. Network heterogeneity often results in complex and intricate structures, making it challenging to identify the backbone. In response, we introduce the Multilevel Backbone Extraction Framework, a novel approach that diverges from conventional backbone methodologies. This generic approach prioritizes the mesoscopic organization of networks. First, it splits the network into homogeneous-density components. Second, it extracts independent backbones for each component using any classical Backbone technique. Finally, the various backbones are combined. This strategy effectively addresses the heterogeneity observed in network groupings. Empirical investigations on real-world networks underscore the efficacy of the Multilevel Backbone approach in preserving essential network structures and properties. Experiments demonstrate its superiority over classical methods in handling network heterogeneity and enhancing network integrity. The framework is adaptable to various types of networks and backbone extraction techniques, making it a versatile tool for network analysis and backbone extraction across diverse network applications.
Article
Full-text available
As networks grow in size and complexity, backbones become an essential network representation. Indeed, they provide a simplified yet informative overview of the underlying organization by retaining the most significant and structurally influential connections within a network. Network heterogeneity often results in complex and intricate structures, making it challenging to identify the backbone. In response, we introduce the Multilevel Backbone Extraction Framework, a novel approach that diverges from conventional backbone methodologies. This generic approach prioritizes the mesoscopic organization of networks. First, it splits the network into homogeneous-density components. Second, it extracts independent backbones for each component using any classical Backbone technique. Finally, the various backbones are combined. This strategy effectively addresses the heterogeneity observed in network groupings. Empirical investigations on real-world networks underscore the efficacy of the Multilevel Backbone approach in preserving essential network structures and properties. Experiments demonstrate its superiority over classical methods in handling network heterogeneity and enhancing network integrity. The framework is adaptable to various types of networks and backbone extraction techniques, making it a versatile tool for network analysis and backbone extraction across diverse network applications.
Article
Full-text available
Graph representation learning methods, such as node embeddings, are powerful approaches to map nodes into a latent vector space, allowing their use for various graph learning tasks. Despite their success, these techniques are inherently black-boxes and few studies have focused on investigating local explanations of node embeddings for specific instances. Moreover, explaining the overall behavior of unsupervised embedding models remains an unexplored problem, limiting global interpretability and debugging potentials. We address this gap by developing human-understandable explanations for latent space dimensions in node embeddings. Towards that, we first develop new metrics that measure the global interpretability of embeddings based on the marginal contribution of the latent dimensions to predicting graph structure. We say an embedding dimension is more interpretable if it can faithfully map to an understandable sub-structure in the input graph - like community structure. Having observed that standard node embeddings have low interpretability, we then introduce Dine (Dimension-based Interpretable Node Embedding). This novel approach can retrofit existing node embeddings by making them more interpretable without sacrificing their task performance. We conduct extensive experiments on synthetic and real-world graphs and show that we can simultaneously learn highly interpretable node embeddings with effective performance in link prediction and node classification.
Article
In the digital era, social media platforms have become the focal point for public discourse, with a significant impact on shaping societal narratives. However, they are also rife with mis- and disinformation, which can rapidly disseminate and influence public opinion. This paper investigates the propagation of mis- and disinformation on X, a social media platform formerly known as Twitter. We employ a multidimensional analytical approach, integrating sentiment analysis, wavelet analysis, and network analysis to discern the patterns and intensity of misleading information waves. Sentiment analysis elucidates the emotional tone and subjective context within which information is framed. Wavelet analysis reveals the temporal dynamics and persistence of disinformation trends over time. Network analysis maps the intricate web of information flow, identifying key nodes and vectors of virality. The results offer a granular understanding of how false narratives are constructed and sustained within the digital ecosystem. This study contributes to the broader field of digital media literacy by highlighting the urgent need for robust analytical tools to navigate and neutralize the infodemic in the age of social media.
Article
A primary goal of neuroscience is to understand the relationship between the brain and behavior. While magnetic resonance imaging (MRI) examines brain structure and function under controlled conditions, digital phenotyping via portable automatic devices (PAD) quantifies behavior in real‐world settings. Combining these two technologies may bridge the gap between brain imaging, physiology, and real‐time behavior, enhancing the generalizability of laboratory and clinical findings. However, the use of MRI and data from PADs outside the MRI scanner remains underexplored. Herein, we present a Preferred Reporting Items for Systematic Reviews and Meta‐Analysis systematic literature review that identifies and analyzes the current state of research on the integration of brain MRI and PADs. PubMed and Scopus were automatically searched using keywords covering various MRI techniques and PADs. Abstracts were screened to only include articles that collected MRI brain data and PAD data outside the laboratory environment. Full‐text screening was then conducted to ensure included articles combined quantitative data from MRI with data from PADs, yielding 94 selected papers for a total of N = 14,778 subjects. Results were reported as cross‐frequency tables between brain imaging and behavior sampling methods and patterns were identified through network analysis. Furthermore, brain maps reported in the studies were synthesized according to the measurement modalities that were used. Results demonstrate the feasibility of integrating MRI and PADs across various study designs, patient and control populations, and age groups. The majority of published literature combines functional, T1‐weighted, and diffusion weighted MRI with physical activity sensors, ecological momentary assessment via PADs, and sleep. The literature further highlights specific brain regions frequently correlated with distinct MRI‐PAD combinations. These combinations enable in‐depth studies on how physiology, brain function and behavior influence each other. Our review highlights the potential for constructing brain–behavior models that extend beyond the scanner and into real‐world contexts.
Book
Full-text available
This book presents a simple geometric model of voting as a tool to analyze parliamentary roll call data. Each legislator is represented by one point and each roll call is represented by two points that correspond to the policy consequences of voting Yea or Nay. On every roll call each legislator votes for the closer outcome point, at least probabilistically. These points form a spatial map that summarizes the roll calls. In this sense a spatial map is much like a road map because it visually depicts the political world of a legislature. The closeness of two legislators on the map shows how similar their voting records are, and the distribution of legislators shows what the dimensions are. These maps can be used to study a wide variety of topics including how political parties evolve over time, the existence of sophisticated voting and how an executive influences legislative outcomes.
Article
Full-text available
We construct a connected network of 3.9 million nodes from mobile phone call records, which can be regarded as a proxy for the underlying human communication network at the societal level. We assign two weights on each edge to reflect the strength of social interaction, which are the aggregate call duration and the cumulative number of calls placed between the individuals over a period of 18 weeks. We present a detailed analysis of this weighted network by examining its degree, strength, and weight distributions, as well as its topological assortativity and weighted assortativity, clustering and weighted clustering, together with correlations between these quantities. We give an account of motif intensity and coherence distributions and compare them to a randomized reference system. We also use the concept of link overlap to measure the number of common neighbours any two adjacent nodes have, which serves as a useful local measure for identifying the interconnectedness of communities. We report a positive correlation between the overlap and weight of a link, thus providing strong quantitative evidence for the weak ties hypothesis, a central concept in social network analysis. The percolation properties of the network are found to depend on the type and order of removed links, and they can help understand how the local structure of the network manifests itself at the global level. We hope that our results will contribute to modelling weighted large-scale social networks, and believe that the systematic approach followed here can be adopted to study other weighted networks.
Article
Full-text available
In this paper, we develop the idea to partition the edges of a weighted graph in order to uncover overlapping communities of its nodes. Our approach is based on the construction of different types of weighted line graphs, i.e. graphs whose nodes are the links of the original graph, that encapsulate differently the relations between the edges. Weighted line graphs are argued to provide an alternative, valuable representation of the system's topology, and are shown to have important applications in community detection, as the usual node partition of a line graph naturally leads to an edge partition of the original graph. This identification allows us to use traditional partitioning methods in order to address the long-standing problem of the detection of overlapping communities. We apply it to the analysis of different social and geographical networks. Comment: 8 Pages. New title and text revisions to emphasise differences from earlier papers
Article
Full-text available
The modern science of networks has brought significant advances to our understanding of complex systems. One of the most relevant features of graphs representing real systems is community structure, or clustering, i. e. the organization of vertices in clusters, with many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters. Such clusters, or communities, can be considered as fairly independent compartments of a graph, playing a similar role like, e. g., the tissues or the organs in the human body. Detecting communities is of great importance in sociology, biology and computer science, disciplines where systems are often represented as graphs. This problem is very hard and not yet satisfactorily solved, despite the huge effort of a large interdisciplinary community of scientists working on it over the past few years. We will attempt a thorough exposition of the topic, from the definition of the main elements of the problem, to the presentation of most methods developed, with a special focus on techniques designed by statistical physicists, from the discussion of crucial issues like the significance of clustering and how methods should be tested and compared against each other, to the description of applications to real networks. Comment: Review article. 103 pages, 42 figures, 2 tables. Two sections expanded + minor modifications. Three figures + one table + references added. Final version published in Physics Reports
Article
Full-text available
The Gene Ontology (GO) project (http://www.geneontology.org/) provides a set of structured, controlled vocabularies for community use in annotating genes, gene products and sequences (also see http://www.sequenceontology.org/). The ontologies have been extended and refined for several biological areas, and improvements to the structure of the ontologies have been implemented. To improve the quantity and quality of gene product annotations available from its public repository, the GO Consortium has launched a focused effort to provide comprehensive and detailed annotation of orthologous genes across a number of ‘reference’ genomes, including human and several key model organisms. Software developments include two releases of the ontology-editing tool OBO-Edit, and improvements to the AmiGO browser interface.
Article
Full-text available
In this paper, we use a partition of the links of a network in order to uncover its community structure. This approach allows for communities to overlap at nodes so that nodes may be in more than one community. We do this by making a node partition of the line graph of the original network. In this way we show that any algorithm that produces a partition of nodes can be used to produce a partition of links. We discuss the role of the degree heterogeneity and propose a weighted version of the line graph in order to account for this.
Article
Full-text available
We show that a complex network of phase oscillators may display interfaces between domains (clusters) of synchronized oscillations. The emergence and dynamics of these interfaces are studied for graphs composed of either dynamical domains (influenced by different forcing processes), or structural domains (modular networks). The obtained results allow us to give a functional definition of overlapping structures in modular networks, and suggest a practical method able to give information on overlapping clusters in both artificially constructed and real world modular networks.
Article
Full-text available
In complex network research clique percolation, introduced by Palla, Derényi, and Vicsek [Nature (London) 435, 814 (2005)], is a deterministic community detection method which allows for overlapping communities and is purely based on local topological properties of a network. Here we present a sequential clique percolation algorithm (SCP) to do fast community detection in weighted and unweighted networks, for cliques of a chosen size. This method is based on sequentially inserting the constituent links to the network and simultaneously keeping track of the emerging community structure. Unlike existing algorithms, the SCP method allows for detecting k -clique communities at multiple weight thresholds in a single run, and can simultaneously produce a dendrogram representation of hierarchical community structure. In sparse weighted networks, the SCP algorithm can also be used for implementing the weighted clique percolation method recently introduced by Farkas [New J. Phys. 9, 180 (2007)]. The computational time of the SCP algorithm scales linearly with the number of k -cliques in the network. As an example, the method is applied to a product association network, revealing its nested community structure.
Article
Full-text available
Current yeast interactome network maps contain several hundred molecular complexes with limited and somewhat controversial representation of direct binary interactions. We carried out a comparative quality assessment of current yeast interactome data sets, demonstrating that high-throughput yeast two-hybrid (Y2H) screening provides high-quality binary interaction information. Because a large fraction of the yeast binary interactome remains to be mapped, we developed an empirically controlled mapping framework to produce a “second-generation” high-quality, high-throughput Y2H data set covering ∼20% of all yeast binary interactions. Both Y2H and affinity purification followed by mass spectrometry (AP/MS) data are of equally high quality but of a fundamentally different and complementary nature, resulting in networks with different topological and biological properties. Compared to co-complex interactome models, this binary map is enriched for transient signaling interactions and intercomplex connections with a highly significant clustering between essential proteins. Rather than correlating with essentiality, protein connectivity correlates with genetic pleiotropy.
Article
Full-text available
Spatially or chemically isolated functional modules composed of several cellular components and carrying discrete functions are considered fundamental building blocks of cellular organization, but their presence in highly integrated biochemical networks lacks quantitative support. Here, we show that the metabolic networks of 43 distinct organisms are organized into many small, highly connected topologic modules that combine in a hierarchical manner into larger, less cohesive units, with their number and degree of clustering following a power law. Within Escherichia coli, the uncovered hierarchical modularity closely overlaps with known metabolic functions. The identified network architecture may be generic to system-level cellular organization.
Article
Full-text available
A fast community detection algorithm based on a q-state Potts model is presented. Communities (groups of densely interconnected nodes that are only loosely connected to the rest of the network) are found to coincide with the domains of equal spin value in the minima of a modified Potts spin glass Hamiltonian. Comparing global and local minima of the Hamiltonian allows for the detection of overlapping ("fuzzy") communities and quantifying the association of nodes with multiple communities as well as the robustness of a community. No prior knowledge of the number of communities has to be assumed.
Article
Full-text available
Preexisting word knowledge is accessed in many cognitive tasks, and this article offers a means for indexing this knowledge so that it can be manipulated or controlled. We offer free association data for 72,000 word pairs, along with over a million entries of related data, such as forward and backward strength, number of competing associates, and printed frequency. A separate file contains the 5,019 normed words, their statistics, and thousands of independently normed rhyme, stem, and fragment cues. Other files provide n x n associative networks for more than 4,000 words and a list of idiosyncratic responses for each normed word. The database will be useful for investigators interested in cuing, priming, recognition, network theory, linguistics, and implicit testing applications. They also will be useful for evaluating the predictive value of free association probabilities as compared with other measures, such as similarity ratings and co-occurrence norms. Of several procedures for measuring preexisting strength between two words, the best remains to be determined. The norms may be downloaded from www.psychonomic.org/archive/.
Article
Full-text available
Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. Here we report the first genome-wide screen for complexes in an organism, budding yeast, using affinity purification and mass spectrometry. Through systematic tagging of open reading frames (ORFs), the majority of complexes were purified several times, suggesting screen saturation. The richness of the data set enabled a de novo characterization of the composition and organization of the cellular machinery. The ensemble of cellular proteins partitions into 491 complexes, of which 257 are novel, that differentially combine with additional attachment proteins or protein modules to enable a diversification of potential functions. Support for this modular organization of the proteome comes from integration with available data on expression, localization, function, evolutionary conservation, protein structure and binary interactions. This study provides the largest collection of physically determined eukaryotic cellular machines so far and a platform for biological data integration and modelling.
Article
Full-text available
Identification of protein-protein interactions often provides insight into protein function, and many cellular processes are performed by stable protein complexes. We used tandem affinity purification to process 4,562 different tagged proteins of the yeast Saccharomyces cerevisiae. Each preparation was analysed by both matrix-assisted laser desorption/ionization-time of flight mass spectrometry and liquid chromatography tandem mass spectrometry to increase coverage and accuracy. Machine learning was used to integrate the mass spectrometry scores and assign probabilities to the protein-protein interactions. Among 4,087 different proteins identified with high confidence by mass spectrometry from 2,357 successful purifications, our core data set (median precision of 0.69) comprises 7,123 protein-protein interactions involving 2,708 proteins. A Markov clustering algorithm organized these interactions into 547 protein complexes averaging 4.9 subunits per complex, about half of them absent from the MIPS database, as well as 429 additional interactions between pairs of complexes. The data (all of which are available online) will help future studies on individual proteins as well as functional genomics and systems biology.
Article
Full-text available
Detecting community structure is fundamental for uncovering the links between structure and function in complex networks and for practical applications in many disciplines such as biology and sociology. A popular method now widely used relies on the optimization of a quantity called modularity, which is a quality index for a partition of a network into communities. We find that modularity optimization may fail to identify modules smaller than a scale which depends on the total size of the network and on the degree of interconnectedness of the modules, even in cases where modules are unambiguously defined. This finding is confirmed through several examples, both in artificial and in real social, biological, and technological networks, where we show that modularity optimization indeed does not resolve a large number of modules. A check of the modules obtained through modularity optimization is thus necessary, and we provide here key elements for the assessment of the reliability of this community detection method. • complex networks • modular structure • metabolic networks • social networks
Article
Full-text available
The rich set of interactions between individuals in society results in complex community structure, capturing highly connected circles of friends, families or professional cliques in a social network. Thanks to frequent changes in the activity and communication patterns of individuals, the associated social and communication network is subject to constant evolution. Our knowledge of the mechanisms governing the underlying community dynamics is limited, but is essential for a deeper understanding of the development and self-optimization of society as a whole. We have developed an algorithm based on clique percolation that allows us to investigate the time dependence of overlapping communities on a large scale, and thus uncover basic relationships characterizing community evolution. Our focus is on networks capturing the collaboration between scientists and the calls between mobile phone users. We find that large groups persist for longer if they are capable of dynamically altering their membership, suggesting that an ability to change the group composition results in better adaptability. The behaviour of small groups displays the opposite tendency-the condition for stability is that their composition remains unchanged. We also show that knowledge of the time commitment of members to a given community can be used for estimating the community's lifetime. These findings offer insight into the fundamental differences between the dynamics of small groups and large institutions.
Article
Full-text available
Electronic databases, from phone to e-mails logs, currently provide detailed records of human communication patterns, offering novel avenues to map and explore the structure of social and communication networks. Here we examine the communication patterns of millions of mobile phone users, allowing us to simultaneously study the local and the global structure of a society-wide communication network. We observe a coupling between interaction strengths and the network's local structure, with the counterintuitive consequence that social networks are robust to the removal of the strong ties but fall apart after a phase transition if the weak ties are removed. We show that this coupling significantly slows the diffusion process, resulting in dynamic trapping of information in communities and find that, when it comes to information diffusion, weak and strong ties are both simultaneously ineffective. • complex systems • complex networks • diffusion and spreading • phase transition • social systems
Article
Full-text available
An updated genome-scale reconstruction of the metabolic network in Escherichia coli K-12 MG1655 is presented. This updated metabolic reconstruction includes: (1) an alignment with the latest genome annotation and the metabolic content of EcoCyc leading to the inclusion of the activities of 1260 ORFs, (2) characterization and quantification of the biomass components and maintenance requirements associated with growth of E. coli and (3) thermodynamic information for the included chemical reactions. The conversion of this metabolic network reconstruction into an in silico model is detailed. A new step in the metabolic reconstruction process, termed thermodynamic consistency analysis, is introduced, in which reactions were checked for consistency with thermodynamic reversibility estimates. Applications demonstrating the capabilities of the genome-scale metabolic model to predict high-throughput experimental growth and gene deletion phenotypic screens are presented. The increased scope and computational capability using this new reconstruction is expected to broaden the spectrum of both basic biology and applied systems biology studies of E. coli metabolism.
Article
Full-text available
The investigation of community structures in networks is an important issue in many domains and disciplines. This problem is relevant for social tasks (objective analysis of relationships on the web), biological inquiries (functional studies in metabolic and protein networks), or technological problems (optimization of large infrastructures). Several types of algorithms exist for revealing the community structure in networks, but a general and quantitative definition of community is not implemented in the algorithms, leading to an intrinsic difficulty in the interpretation of the results without any additional nontopological information. In this article we deal with this problem by showing how quantitative definitions of community are implemented in practice in the existing algorithms. In this way the algorithms for the identification of the community structure become fully self-contained. Furthermore, we propose a local algorithm to detect communities which outperforms the existing algorithms with respect to computational cost, keeping the same level of reliability. The algorithm is tested on artificial and real-world graphs. In particular, we show how the algorithm applies to a network of scientific collaborations, which, for its size, cannot be attacked with the usual methods. This type of local algorithm could open the way to applications to large-scale technological and biological systems.
Article
Full-text available
This paper develops a procedure for estimating the basic dimensions underlying a set of issue or attribute scales. A simple Hinich-Ordeshook spatial theory of voting is used to model Converse's fundamental insight that individuals' positions on issues are bundled together, and the knowledge of one or two issue positions makes the remaining positions very predictable. The model assumes that individuals' positions on a set issue or attribute dimensions are determined by the individuals' positions on a small number of underlying evaluative or basic dimensions. The procedure developed in this paper for estimating these basic dimensions is, in effect, a method of performing singular value decomposition of a matrix with missing elements. Monte Carlo testing shows that the procedure reliably reproduces the missing elements. Because of this reliability, the estimation procedure can be used to produce Eckart-Young matrix lower rank approximations. A number of applications to political data are...
Article
Full-text available
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks. We explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. We employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the "best" possible community--according to the conductance measure--over a wide range of size scales. We study over 100 large real-world social and information networks. Our results suggest a significantly more refined picture of community structure in large networks than has been appreciated previously. In particular, we observe tight communities that are barely connected to the rest of the network at very small size scales; and communities of larger size scales gradually "blend into" the expander-like core of the network and thus become less "community-like." This behavior is not explained, even at a qualitative level, by any of the commonly-used network generation models. Moreover, it is exactly the opposite of what one would expect based on intuition from expander graphs, low-dimensional or manifold-like graphs, and from small social networks that have served as testbeds of community detection algorithms. We have found that a generative graph model, in which new edges are added via an iterative "forest fire" burning process, is able to produce graphs exhibiting a network community profile plot similar to what we observe in our network datasets.
Article
Full-text available
The combination of the compactness of networks, featuring small diameters, and their complex architectures results in a variety of critical effects dramatically different from those in cooperative systems on lattices. In the last few years, researchers have made important steps toward understanding the qualitatively new critical phenomena in complex networks. We review the results, concepts, and methods of this rapidly developing field. Here we mostly consider two closely related classes of these critical phenomena, namely structural phase transitions in the network architectures and transitions in cooperative models on networks as substrates. We also discuss systems where a network and interacting agents on it influence each other. We overview a wide range of critical phenomena in equilibrium and growing networks including the birth of the giant connected component, percolation, k-core percolation, phenomena near epidemic thresholds, condensation transitions, critical phenomena in spin models placed on networks, synchronization, and self-organized criticality effects in interacting systems on networks. We also discuss strong finite size effects in these systems and highlight open problems and perspectives. Comment: Review article, 79 pages, 43 figures, 1 table, 508 references, extended
Article
Full-text available
A search technique locating network modules, i.e., internally densely connected groups of nodes in directed networks is introduced by extending the Clique Percolation Method originally proposed for undirected networks. After giving a suitable definition for directed modules we investigate their percolation transition in the Erdos-Renyi graph both analytically and numerically. We also analyse four real-world directed networks, including Google's own webpages, an email network, a word association graph and the transcriptional regulatory network of the yeast Saccharomyces cerevisiae. The obtained directed modules are validated by additional information available for the nodes. We find that directed modules of real-world graphs inherently overlap and the investigated networks can be classified into two major groups in terms of the overlaps between the modules. Accordingly, in the word-association network and among Google's webpages the overlaps are likely to contain in-hubs, whereas the modules in the email and transcriptional regulatory networks tend to overlap via out-hubs.
Article
Full-text available
Many networks in nature, society and technology are characterized by a mesoscopic level of organization, with groups of nodes forming tightly connected units, called communities or modules, that are only weakly linked to each other. Uncovering this community structure is one of the most important problems in the field of complex networks. Networks often show a hierarchical organization, with communities embedded within other communities; moreover, nodes can be shared between different communities. Here we present the first algorithm that finds both overlapping communities and the hierarchical structure. The method is based on the local optimization of a fitness function. Community structure is revealed by peaks in the fitness histogram. The resolution can be tuned by a parameter enabling to investigate different hierarchical levels of organization. Tests on real and artificial networks give excellent results.
Article
Full-text available
Graph vertices are often organized into groups that seem to live fairly independently of the rest of the graph, with which they share but a few edges, whereas the relationships between group members are stronger, as shown by the large number of mutual connections. Such groups of vertices, or communities, can be considered as independent compartments of a graph. Detecting communities is of great importance in sociology, biology and computer science, disciplines where systems are often represented as graphs. The task is very hard, though, both conceptually, due to the ambiguity in the definition of community and in the discrimination of different partitions and practically, because algorithms must find ``good'' partitions among an exponentially large number of them. Other complications are represented by the possible occurrence of hierarchies, i.e. communities which are nested inside larger communities, and by the existence of overlaps between communities, due to the presence of nodes belonging to more groups. All these aspects are dealt with in some detail and many methods are described, from traditional approaches used in computer science and sociology to recent techniques developed mostly within statistical physics.
Article
Networks are widely used in the biological, physical, and social sciences as a concise mathematical representation of the topology of systems of interacting components. Understanding the structure of these networks is one of the outstanding challenges in the study of complex systems. Here we describe a general technique for detecting structural features in large-scale network data that works by dividing the nodes of a network into classes such that the members of each class have similar patterns of connection to other nodes. Using the machinery of probabilistic mixture models and the expectation–maximization algorithm, we show that it is possible to detect, without prior knowledge of what we are looking for, a very broad range of types of structure in networks. We give a number of examples demonstrating how the method can be used to shed light on the properties of real-world networks, including social and information networks. • clustering • graph • likelihood
Article
Many complex systems in nature and society can be described in terms of networks capturing the intricate web of connections among the units they are made of1, 2, 3, 4. A key question is how to interpret the global organization of such networks as the coexistence of their structural subunits (communities) associated with more highly interconnected parts. Identifying these a priori unknown building blocks (such as functionally related proteins5, 6, industrial sectors7 and groups of people8, 9) is crucial to the understanding of the structural and functional properties of networks. The existing deterministic methods used for large networks find separated communities, whereas most of the actual networks are made of highly overlapping cohesive groups of nodes. Here we introduce an approach to analysing the main statistical features of the interwoven sets of overlapping communities that makes a step towards uncovering the modular structure of complex systems. After defining a set of new characteristic quantities for the statistics of communities, we apply an efficient technique for exploring overlapping communities on a large scale. We find that overlaps are significant, and the distributions we introduce reveal universal features of networks. Our studies of collaboration, word-association and protein interaction graphs show that the web of communities has non-trivial correlations and specific scaling properties.
Article
In the US House and Senate, each piece of legislation is sponsored by a unique legislator. In addition, legislators can publicly express support for a piece of legislation by cosponsoring it. The network of sponsors and cosponsors provides information about the underlying social networks among legislators. I use a number of statistics to describe the cosponsorship network in order to show that it behaves much differently than other large social networks that have been recently studied. In particular, the cosponsorship network is much denser than other networks and aggregate features of the network appear to be influenced by institutional arrangements and strategic incentives. I also demonstrate that a weighted closeness centrality measure that I call ‘connectedness’ can be used to identify influential legislators.
Article
Part I. Introduction: Networks, Relations, and Structure: 1. Relations and networks in the social and behavioral sciences 2. Social network data: collection and application Part II. Mathematical Representations of Social Networks: 3. Notation 4. Graphs and matrixes Part III. Structural and Locational Properties: 5. Centrality, prestige, and related actor and group measures 6. Structural balance, clusterability, and transitivity 7. Cohesive subgroups 8. Affiliations, co-memberships, and overlapping subgroups Part IV. Roles and Positions: 9. Structural equivalence 10. Blockmodels 11. Relational algebras 12. Network positions and roles Part V. Dyadic and Triadic Methods: 13. Dyads 14. Triads Part VI. Statistical Dyadic Interaction Models: 15. Statistical analysis of single relational networks 16. Stochastic blockmodels and goodness-of-fit indices Part VII. Epilogue: 17. Future directions.
Book
A variety of different social, natural and technological systems can be described by the same mathematical framework. This holds from the Internet to food webs and to boards of company directors. In all these situations a graph of the elements of the system and their interconnections displays a universal feature. There are only few elements with many connections, and many elements with few connections. This book presents the experimental evidence of these "Scale-free networks" and provides students and researchers with a corpus of theoretical results and algorithms to analyse and understand these features. The content of this book and the exposition makes it a clear textbook for beginners, and a reference book for the experts. Available in OSO: http://www.oxfordscholarship.com/oso/public/content/physics/9780199211517/toc.html
Article
Uncovering the community structure exhibited by real networks is a crucial step toward an understanding of complex systems that goes beyond the local organization of their constituents. Many algorithms have been proposed so far, but none of them has been subjected to strict tests to evaluate their performance. Most of the sporadic tests performed so far involved small networks with known community structure and/or artificial graphs with a simplified structure, which is very uncommon in real systems. Here we test several methods against a recently introduced class of benchmark graphs, with heterogeneous distributions of degree and community size. The methods are also tested against the benchmark by Girvan and Newman [Proc. Natl. Acad. Sci. U.S.A. 99, 7821 (2002)] and on random graphs. As a result of our analysis, three recent algorithms introduced by Rosvall and Bergstrom [Proc. Natl. Acad. Sci. U.S.A. 104, 7327 (2007); Proc. Natl. Acad. Sci. U.S.A. 105, 1118 (2008)], Blondel [J. Stat. Mech.: Theory Exp. (2008), P10008], and Ronhovde and Nussinov [Phys. Rev. E 80, 016109 (2009)] have an excellent performance, with the additional advantage of low computational complexity, which enables one to analyze large systems.
Article
Using large-scale network analysis I map the cosponsorship networks of all 280,000 pieces of legislation proposed in the U.S. House and Senate from 1973 to 2004. In these networks, a directional link can be drawn from each cosponsor of a piece of legislation to its sponsor. I use a number of statistics to describe these networks such as the quantity of legislation sponsored and cosponsored by each legislator, the number of legislators cosponsoring each piece of legislation, the total number of legislators who have cosponsored bills written by a given legislator, and network measures of closeness, betweenness, and eigenvector centrality. I then introduce a new measure I call “connectedness” which uses information about the frequency of cosponsorship and the number of cosponsors on each bill to make inferences about the social distance between legislators. Connectedness predicts which members will pass more amendments on the floor, a measure that is commonly used as a proxy for legislative influence. It also predicts roll call vote choice even after controlling for ideology and partisanship.
Article
Networks of coupled dynamical systems have been used to model biological oscillators, Josephson junction arrays, excitable media, neural networks, spatial games, genetic control networks and many other self-organizing systems. Ordinarily, the connection topology is assumed to be either completely regular or completely random. But many biological, technological and social networks lie somewhere between these two extremes. Here we explore simple models of networks that can be tuned through this middle ground: regular networks 'rewired' to introduce increasing amounts of disorder. We find that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs. We call them 'small-world' networks, by analogy with the small-world phenomenon (popularly known as six degrees of separation. The neural network of the worm Caenorhabditis elegans, the power grid of the western United States, and the collaboration graph of film actors are shown to be small-world networks. Models of dynamical systems with small-world coupling display enhanced signal-propagation speed, computational power, and synchronizability. In particular, infectious diseases spread more easily in small-world networks than in regular lattices.
Article
The discovery and analysis of community structure in networks is a topic of considerable recent interest within the physics community, but most methods proposed so far are unsuitable for very large networks because of their computational cost. Here we present a hierarchical agglomeration algorithm for detecting community structure which is faster than many competing algorithms: its running time on a network with n vertices and m edges is O (md log n) where d is the depth of the dendrogram describing the community structure. Many real-world networks are sparse and hierarchical, with m approximately n and d approximately log n, in which case our algorithm runs in essentially linear time, O (n log(2) n). As an example of the application of this algorithm we use it to analyze a network of items for sale on the web site of a large on-line retailer, items in the network being linked if they are frequently purchased by the same buyer. The network has more than 400 000 vertices and 2 x 10(6) edges. We show that our algorithm can extract meaningful communities from this network, revealing large-scale patterns present in the purchasing habits of customers.
Article
High-throughput techniques are leading to an explosive growth in the size of biological databases and creating the opportunity to revolutionize our understanding of life and disease. Interpretation of these data remains, however, a major scientific challenge. Here, we propose a methodology that enables us to extract and display information contained in complex networks. Specifically, we demonstrate that we can find functional modules in complex networks, and classify nodes into universal roles according to their pattern of intra- and inter-module connections. The method thus yields a 'cartographic representation' of complex networks. Metabolic networks are among the most challenging biological networks and, arguably, the ones with most potential for immediate applicability. We use our method to analyse the metabolic networks of twelve organisms from three different superkingdoms. We find that, typically, 80% of the nodes are only connected to other nodes within their respective modules, and that nodes with different roles are affected by different evolutionary constraints and pressures. Remarkably, we find that metabolites that participate in only a few reactions but that connect different modules are more conserved than hubs whose links are mostly within a single module.
Article
We propose and study a set of algorithms for discovering community structure in networks-natural divisions of network nodes into densely connected subgroups. Our algorithms all share two definitive features: first, they involve iterative removal of edges from the network to split it into communities, the edges removed being identified using any one of a number of possible "betweenness" measures, and second, these measures are, crucially, recalculated after each removal. We also propose a measure for the strength of the community structure found by our algorithms, which gives us an objective metric for choosing the number of communities into which a network should be divided. We demonstrate that our algorithms are highly effective at discovering community structure in both computer-generated and real-world network data, and show how they can be used to shed light on the sometimes dauntingly complex structure of networked systems.
Article
Extracting understanding from the growing “sea” of biological and socioeconomic data is one of the most pressing scientific challenges facing us. Here, we introduce and validate an unsupervised method for extracting the hierarchical organization of complex biological, social, and technological networks. We define an ensemble of hierarchically nested random graphs, which we use to validate the method. We then apply our method to real-world networks, including the air-transportation network, an electronic circuit, an e-mail exchange network, and metabolic networks. Our analysis of model and real networks demonstrates that our method extracts an accurate multiscale representation of a complex system. • cellular metabolism • complex networks • multiscale representation
Article
To comprehend the multipartite organization of large-scale biological and social systems, we introduce an information theoretic approach that reveals community structure in weighted and directed networks. We use the probability flow of random walks on a network as a proxy for information flows in the real system and decompose the network into modules by compressing a description of the probability flow. The result is a map that both simplifies and highlights the regularities in the structure and their relationships. We illustrate the method by making a map of scientific communication as captured in the citation patterns of >6,000 journals. We discover a multicentric organization with fields that vary dramatically in size and degree of integration into the network of science. Along the backbone of the network-including physics, chemistry, molecular biology, and medicine-information flows bidirectionally, but the map reveals a directional pattern of citation from the applied fields to the basic sciences.
Article
Networks have in recent years emerged as an invaluable tool for describing and quantifying complex systems in many branches of science. Recent studies suggest that networks often exhibit hierarchical organization, in which vertices divide into groups that further subdivide into groups of groups, and so forth over multiple scales. In many cases the groups are found to correspond to known functional units, such as ecological niches in food webs, modules in biochemical networks (protein interaction networks, metabolic networks or genetic regulatory networks) or communities in social networks. Here we present a general technique for inferring hierarchical structure from network data and show that the existence of hierarchy can simultaneously explain and quantitatively reproduce many commonly observed topological properties of networks, such as right-skewed degree distributions, high clustering coefficients and short path lengths. We further show that knowledge of hierarchical structure can be used to predict missing connections in partly known networks with high accuracy, and for more general network structures than competing techniques. Taken together, our results suggest that hierarchy is a central organizing principle of complex networks, capable of offering insight into many network phenomena.
Article
Despite their importance for urban planning, traffic forecasting and the spread of biological and mobile viruses, our understanding of the basic laws governing human motion remains limited owing to the lack of tools to monitor the time-resolved location of individuals. Here we study the trajectory of 100,000 anonymized mobile phone users whose position is tracked for a six-month period. We find that, in contrast with the random trajectories predicted by the prevailing Lévy flight and random walk models, human trajectories show a high degree of temporal and spatial regularity, each individual being characterized by a time-independent characteristic travel distance and a significant probability to return to a few highly frequented locations. After correcting for differences in travel distances and the inherent anisotropy of each trajectory, the individual travel patterns collapse into a single spatial probability distribution, indicating that, despite the diversity of their travel history, humans follow simple reproducible patterns. This inherent similarity in travel patterns could impact all phenomena driven by human mobility, from epidemic prevention to emergency response, urban planning and agent-based modelling.