Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Abstract—Soft overlapping clustering is one of the notable problems of community detection. Extensive research has been conducted to develop efficient methods for non-overlapping and crisp-overlapping community detection in large-scale networks. In this paper, Fast Fuzzy Modularity Maximization (FFMM) for soft overlapping community detection is proposed. FFMM exploits novel iterative equations to calculate the modularity gain associated with changing the fuzzy membership values of network vertices. The simplicity of the proposed scheme enables efficient modifications, reducing computational complexity to a linear function of the network size and the number of communities. Moreover, to further reduce the complexity of FFMM for very large networks, Multi-cycle FFMM (McFFMM) is proposed. The proposed McFFMM reduces complexity by breaking networks into multiple sub-networks and applying FFMM to detect their communities. Performance of the proposed techniques are demonstrated with real-world data and the Lancichinetti-Fortunato-Radicchi (LFR) benchmark networks. Moreover, the performance of the proposed techniques is eval- uated versus some state-of-the-art soft overlapping community detection approaches. Results show that the McFFMM produces a remarkable performance in terms of overlapping modularity with fuzzy memberships, computational time, number of detected overlapping nodes, and Overlapping Normalized Mutual Informa- tion (ONMI).

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... More efforts about algorithmic level based methods for imbalanced datasets can be seen in [21][22][23]. Related studies have indicated that the misjudgment of classifiers often occurs in the boundary region [24,25], which actually is the area of class overlapping. Although some of the above AUC methods have tried to disentangle the overlapping areas in imbalanced samples in a sense, they still have the following two serious bottlenecks. ...
... Let us keep in mind that in order to classify imbalanced data effectively, more avoidance of the overlapping areas can be viewed to certain extent as more disentanglement of the modularity between different classes. In other words, the adversarial link information [25] is likely to play a key role in assisting imbalanced classification. Therefore, when designing an AUC maximization based adversarial deoverlapping learning machine, we may reasonably postulate that such a machine should be motivated from three aspects: (1) the values of the decision function for training samples should share the intrinsic property of AUC maximization based classification learning; (2) the modularity should be disentangled as soon as possible so as to guide adversarially de-overlapping the overlapping areas around the boundary between different classes; (3) in order to reduce tunable hyperparameters as much as possible, such a machine should be easily extended to semi-supervised scenarios in a unified way to avoid an explicitly additional regularization term with more adversarial information in the corresponding objective function. ...
... Therefore, when designing an AUC maximization based adversarial deoverlapping learning machine, we may reasonably postulate that such a machine should be motivated from three aspects: (1) the values of the decision function for training samples should share the intrinsic property of AUC maximization based classification learning; (2) the modularity should be disentangled as soon as possible so as to guide adversarially de-overlapping the overlapping areas around the boundary between different classes; (3) in order to reduce tunable hyperparameters as much as possible, such a machine should be easily extended to semi-supervised scenarios in a unified way to avoid an explicitly additional regularization term with more adversarial information in the corresponding objective function. In order to do so, we propose the novel concept of class overlapping measure based on both AUC and the modularity in [25]. To best of our knowledge, this is the first attempt in exploring for imbalanced classification learning from this novel perspective. ...
Article
Full-text available
While adversarial link information like the commonly used must-link and cannot-link constraints on training data are available, the existing AUC maximization learning frameworks cannot explicitly incorporate them to better guide disentanglements of the overlapping areas. As the first attempt in filling this gap, this study first develops the coupling-based adversarial overlapping concept by means of the coupling of the classical AUC with the modularity caused by adversarial link information. Then the corresponding adversarial de-overlapping maximization learning machine called De-OVL for supervised imbalanced data is developed. Furthermore, by using the proposed two-channel based strategy, De-OVL is extended to its semi-supervised version SDe-OVL with only one tunable hyperparameter for semi-supervised imbalanced data. Based on random Fourier features (RFF), the fast training versions RFF-De-OVL and RFF-SDe-OVL are developed to scale up De-OVL and SDe-OVL, respectively. In contrast to existing imbalanced classification methods, De-OVL has its unified adversarial de-overlapping maximization framework for supervised and semi-supervised imbalanced data, with fewer hyperparameters to be tuned. Extensive experimental results on four groups of benchmarking imbalanced datasets verify the above effectiveness of the proposed machines.
... Fuzzy community detection (FCD) methods mainly include label propagation [17]- [20], nonnegative matrix factorization [21]- [24], fuzzy clustering [25]- [28], fuzzy agglomeration [29]- [31] and fuzzy modularity maximization (FMM) [32]- [37]. In these methods, FMM is becoming one of the most effective methods because of high modularity quality, few parameters, and strong portability [32]. ...
... Fuzzy community detection (FCD) methods mainly include label propagation [17]- [20], nonnegative matrix factorization [21]- [24], fuzzy clustering [25]- [28], fuzzy agglomeration [29]- [31] and fuzzy modularity maximization (FMM) [32]- [37]. In these methods, FMM is becoming one of the most effective methods because of high modularity quality, few parameters, and strong portability [32]. Firstly, FMM can approximate the global optimal fuzzy partition by maximizing a specific fuzzy modularity function, such as the generalized modularity [16]. ...
... Popular FMM approaches include FFMM [32] based on fast deterministic optimization, FMM/H2 [34] based on heuristic optimization, and GAFCD [35] and SOSFCD [36] based on meta-heuristic optimization. These FMM algorithms are usually effective and stable in fuzzy community detection by maximizing a fuzzy modularity function (e.g., Q g ). ...
Article
Full-text available
Fuzzy community detection can not only reveal crisp community structures of real-life networks, but also provide fine-grained information of node fuzzy membership. However, existing methods for detecting fuzzy partitions often produce inaccurate fuzzy memberships for bridge nodes that do not align with the prior ground-truth crisp/hard partition, which diminishes the reliability and usefulness of fine-grained community information. To solve this issue, we propose a concept of constrained fuzzy partition and the corresponding constrained fuzzy community detection problem. This problem is formulated as a constrained fuzzy modularity maximization task to approximate the modularity-optimal constrained fuzzy partition, in which the fuzzy memberships can accurately quantify node membership to ground-truth communities and association strength between communities. To improve modularity quality and computational efficiency, a general framework of Constrained Fuzzy Modularity Maximization (CFMM) is proposed, which includes two key components: Fuzzy Membership Constraint (FMC) strategy and Node Dimension Reduction (NDR) operation. FMC constrains fuzzy membership of each node to be consistent with its ground-truth hard community and improves the quality of fuzzy partitions, and NDR reduces the time complexity of modularity evaluation by removing converged fuzzy membership of non-bridge nodes. Experimental results on synthetic and real-life networks prove the effectiveness of CFMM on state-of-the-art algorithms in terms of accuracy, quality, and efficiency, providing more precise and high-quality fuzzy community information.
... Other community detection algorithms were later proposed, including random walk [5,6], spectral clustering [7,8,9], and statistical-inference [10] techniques. Recently, the performance of the modularity function was further improved with fuzzy maximization [11] and correlation clustering [12]. However, these algorithms require storing the entire graph in the form of an adjacency matrix or list, making them impractical for larger graphs. ...
... Before discussing the literature for these two categories, let us first discuss some of the quality metrics used in many community detection algorithms [18]. One of the popular metrics is modularity [4,11,12], which calculates the difference between the number of edges observed in each cluster and those were randomly distributed in the cluster. Another well known metric is conductance [19,20], the ratio between the relationship of edges outside a community and within a community. ...
... The Louvian algorithm [23] improves modularity by running each iteration in two phases: first, each node is placed in its own community, and modularity gain is calculated; secondly, a new network is created using the communities discovered in the first phase. Correlation clustering [12] and fast fuzzy modularity maximization [11] have been used to enhance classical community detection algorithms using the modularity function. Random walk-based algorithms [5,24] involve using a random surfer to identify the neighbourhood and tend to become trapped in the densest section of the graph. ...
Preprint
Full-text available
p>Identifying and preserving community structures in a streaming graph is a very challenging task. However, many applications require the identification of these communities in very limited space and time. In this paper, we design Community Sketch, a small space data structure that efficiently preserves communities. On query, it provides communities in constant time. With the use of community sketch data structure, a linear streaming community detection algorithm is proposed. Experimental results on the large real-world networks show that our algorithm outperforms other state-of-the-art algorithms in terms of quality metrics (NMI, F1-score, and WCC). Further, we propose an algorithm to produce benchmark network, namely, Temporal Community Benchmark Dataset (TCBD) which contains both true community labels and temporal information of edges. These synthetic networks are used to validate the proposed algorithm </p
... Other community detection algorithms were later proposed, including random walk [5,6], spectral clustering [7,8,9], and statistical-inference [10] techniques. Recently, the performance of the modularity function was further improved with fuzzy maximization [11] and correlation clustering [12]. However, these algorithms require storing the entire graph in the form of an adjacency matrix or list, making them impractical for larger graphs. ...
... Before discussing the literature for these two categories, let us first discuss some of the quality metrics used in many community detection algorithms [18]. One of the popular metrics is modularity [4,11,12], which calculates the difference between the number of edges observed in each cluster and those were randomly distributed in the cluster. Another well known metric is conductance [19,20], the ratio between the relationship of edges outside a community and within a community. ...
... The Louvian algorithm [23] improves modularity by running each iteration in two phases: first, each node is placed in its own community, and modularity gain is calculated; secondly, a new network is created using the communities discovered in the first phase. Correlation clustering [12] and fast fuzzy modularity maximization [11] have been used to enhance classical community detection algorithms using the modularity function. Random walk-based algorithms [5,24] involve using a random surfer to identify the neighbourhood and tend to become trapped in the densest section of the graph. ...
Preprint
Full-text available
p>Identifying and preserving community structures in a streaming graph is a very challenging task. However, many applications require the identification of these communities in very limited space and time. In this paper, we design Community Sketch, a small space data structure that efficiently preserves communities. On query, it provides communities in constant time. With the use of community sketch data structure, a linear streaming community detection algorithm is proposed. Experimental results on the large real-world networks show that our algorithm outperforms other state-of-the-art algorithms in terms of quality metrics (NMI, F1-score, and WCC). Further, we propose an algorithm to produce benchmark network, namely, Temporal Community Benchmark Dataset (TCBD) which contains both true community labels and temporal information of edges. These synthetic networks are used to validate the proposed algorithm </p
... For one thing, to our knowledge, no methods can provide detailed higher-order fuzzy community information. While lower-order fuzzy memberships offer fine-grained membership grades for nodes valued continuously between [0, 1], they primarily focus on lower-order crisp communities formed at the edge level [24]- [26], which are unsuitable for higher-order community structures. Furthermore, for the crisp higherorder communities identified by previous HCD methods, each node (or motif) can only fully belong to one community, with a binary membership grade (0 or 1) [9], [12]. ...
... To our knowledge, no methods can provide detailed higherorder fuzzy memberships. Existing lower-order fuzzy memberships [24]- [26] are unsuitable for higher-order community structures. Specifically, lower-order fuzzy memberships quantify the partial belongingness of nodes to lower-order crisp communities formed at the edge level [32]. ...
Article
Full-text available
Higher-order community detection reveals both mesoscale structures and functional characteristics of real-world networks. Although many methods have been developed from diverse perspectives, to our knowledge, none can provide fine-grained higher-order fuzzy community information. This study introduces a novel concept of higher-order fuzzy memberships that quantify the membership grades of motifs to crisp higher-order communities, thereby revealing partial community affiliations. Furthermore, we utilize higher-order fuzzy memberships to enhance higher-order community detection via a general framework called fuzzy memberships-assisted motif-based evolutionary modularity. On the one hand, a fuzzy membership-based neighbor community modification strategy is designed to correct misassigned bridge nodes, thereby improving partition quality. On the other hand, a fuzzy membership-based local community merging strategy is proposed to combine excessively fragmented communities, enhancing local search ability. Experimental results indicate that the proposed framework outperforms state-of-the-art methods in both synthetic and real-world datasets, particularly in networks with ambiguous and complex structures.
... For one thing, to our knowledge, no methods can provide detailed higher-order fuzzy community information. Although lower-order fuzzy memberships have been developed to provide more fine-grained membership grades for nodes valued continuously between [0, 1], they primarily focus on lowerorder crisp communities formed at the edge level [22]- [24], which are not suitable for higher-order community structures. Furthermore, the higher-order communities identified by previous HCD methods are generally characterized by coarsegrained crisp community structures, where each motif can only full belong to one community, i.e., the membership grade is binary (0 or 1) [9]. ...
... To our knowledge, no methods can provide detailed higherorder fuzzy memberships. Existing lower-order fuzzy memberships [22]- [24] are not suitable for higher-order community structures. Specifically, lower-order fuzzy memberships quantify the partial belongingness of nodes to lower-order crisp communities formed at edge level [27]. ...
Preprint
Full-text available
Higher-order community detection (HCD) reveals both mesoscale structures and functional characteristics of real-life networks. Although many methods have been developed from diverse perspectives, to our knowledge, none can provide fine-grained higher-order fuzzy community information. This study presents a novel concept of higher-order fuzzy memberships that quantify the membership grades of motifs to crisp higher-order communities, thereby revealing the partial community affiliations. Furthermore, we employ higher-order fuzzy memberships to enhance HCD via a general framework called fuzzy memberships assisted motif-based evolutionary modularity (FMMEM). In FFMEM, on the one hand, a fuzzy membership-based neighbor community modification (FM-NCM) strategy is designed to correct misassigned bridge nodes, thereby improving partition quality. On the other hand, a fuzzy membership-based local community merging (FM-LCM) strategy is also proposed to combine excessively fragmented communities for enhancing local search ability. Experimental results indicate that the FMMEM framework outperforms state-of-the-art methods in both synthetic and real-world datasets, particularly in the networks with ambiguous and complex structures.
... However, determining the proper group for the nodes in the network is not an easy task. In this regard, researchers have developed many clustering algorithms and then applied them to various datasets [4][5][6][7][8][9]. Nevertheless, there is no consensus on what the best clustering algorithm means. ...
... Yazdanparast et. al. [4] proposed a new clustering technique for overlapping clusters using a fuzzy system. They developed Fast Fuzzy Modularity Maximization (FFMM) for finding communities in overlapping networks. ...
... However, determining the proper group for the nodes in the network is not an easy task. In this regard, researchers have developed many clustering algorithms and then applied them to various datasets [4][5][6][7][8][9]. Nevertheless, there is no consensus on what the best clustering algorithm means. ...
... Yazdanparast et. al. [4] proposed a new clustering technique for overlapping clusters using a fuzzy system. They developed Fast Fuzzy Modularity Maximization (FFMM) for finding communities in overlapping networks. ...
Conference Paper
Discovering clusters in social networks is of fundamental and practical interest. This paper presents a novel clustering strategy for large-scale highly-connected social networks. We propose a new hybrid clustering technique based on non-negative matrix fac-torization and independent component analysis for finding complex relationships among users of a huge social network. We extract the important features of the network and then perform clustering on independent and important components of the network. Moreover, we introduce a new k-means centroid initialization method by which we achieve higher efficiency. We apply our approach on four well-known social networks: Facebook, Twit-ter, Academia and Youtube. We experimentally show that our approach achieves much better results in terms of the Silhouette coefficient compared to well-known counterparts such as Hierarchical Louvain, Multiple Local Community detection, and k-means++.
... Also, a fuzzy clustering algorithm based on a multi-kernel hybridization was proposed, and this model was evaluated using real-world data sets. In [18], a fast fuzzy modularity maximizer for overlapping community detection is presented, which uses new iterative equations to calculate undefined membership values of network vertices. Due to its simplicity, this model has led to efficient changes and reduces the computational complexity to a linear function of the size of the network and the number of communities. ...
Article
Full-text available
In recent years, extensive studies have been carried out in community detection for social network analysis because it plays a crucial role in social network systems in today's world. However, most social networks in the real world have complex overlapping social structures, one of the NP-hard problems. This paper presents a new model for overlapping community detection that uses a multi-objective approach based on a hybrid optimization algorithm. In this model, the Modified Selection Function (MSF) hybrids the algorithms and recovery mechanism, the Slime Mould Algorithm (SMA), the Sine Cosine Algorithm (SCA), and the association strategy. Also, considering that these algorithms have been presented to solve single-objective optimization problems, the Pareto dominance technique has been used to solve multi-objective problems. In addition to overlapping community detection and increasing detection accuracy, the fuzzy clustering technique has been used to select the heads of clusters. Sixteen synthetic and real-world data sets were utilized to assess the suggested model, and the outcomes were contrasted with those of existing optimization techniques. The proposed model has performed better than the other tested algorithms in comparing the tests conducted by us in all 16 data sets, in the comparisons made with the algorithms proposed in other works in 11 data sets out of 14 data. The set has performed better than competitors. As a conclusion, the findings show that this model performs better than other methods.
... The spectral algorithm [18,19,20,21] transforms the community detection problem into a simple quadratic optimization problem and obtains the approximate optimal network partition by solving the eigenvector of the Laplace matrix. The modular maximization method [22,23] finds the maximum value of the modularity function in the network, whose representative algorithms include the Louvain method and the simulated annealing algorithm. The Louvain algorithm [24,25] has demonstrated good results in efficiency and effectiveness and can find hierarchical communities. ...
Article
With the increasing diversity and complexity of online social networks, effectively dividing communities presents a growing challenge. These networks are characterized by their large scale, sparse structure, and numerous isolated points. Traditional community detection methods lack consideration of node attribute information, thereby negatively impacting the accuracy of community detection. To address these challenges, this paper presents a novel Louvain-FTAS community detection algorithm that integrates topology and attribute structure. The proposed algorithm first selects attributes with positive effects to account for attribute heterogeneity. Subsequently, it utilizes a semi-local strategy to calculate topology similarity and information entropy to calculate attribute similarity. These values are combined to obtain the final node similarity matrix, which is then fed into the Louvain algorithm to maximize modularity and incorporate multi-dimensional attribute features to enhance community detection accuracy. The proposed model is evaluated through comparative experiments on two real datasets and artificial synthetic networks, demonstrating its rationality and effectiveness.
... It is commonly acknowledged that there is no unique community detection algorithm that can universally accommodate all kinds of social networks with high accuracy because of the discrepancy in network types and purposes. For example, in one type of network, algorithmic biases can increase the efficiency, while in another type, they may decrease the efficiency [65][66][67][68]. A distinguished class of traditional community detection methods only considers the structure of the network (i.e., relationships between nodes) and ignores node attributes, if any. ...
Article
Full-text available
Over the past few years, the number and volume of data sources in healthcare databases has grown exponentially. Analyzing these voluminous medical data is both opportunity and challenge for knowledge discovery in health informatics. In the last decade, social network analysis techniques and community detection algorithms are being used more and more in scientific fields, including healthcare and medicine. While community detection algorithms have been widely used for social network analysis, a comprehensive review of its applications for healthcare in a way to benefit both health practitioners and the health informatics community is still overwhelmingly missing. This paper contributes to fill in this gap and provide a comprehensive and up-to-date literature research. Especially, categorizations of existing community detection algorithms are presented and discussed. Moreover, most applications of social network analysis and community detection algorithms in healthcare are reviewed and categorized. Finally, publicly available healthcare datasets, key challenges, and knowledge gaps in the field are studied and reviewed.
... But, the majority of these algorithms require prior knowledge, such as community size, and community number. Su et al. [20] and Yazdanparast et al. [21] have applied the fuzzy method for community detection by modularity maximization. The concept of self-membership has been introduced in paper [22]. ...
Article
Full-text available
Crime reports clustering is crucial for identifying and preventing criminal activities that frequently happened in society. In the proposed work, named entities in a report are recognized to extract the crime-related phrases and subsequently, the phrases are preprocessed by applying stopword removal and lemmatization operations. Next, the module of the universal encoder model, called the transformer, is applied to extract phrases of the report to get a sentence embedding for each associated sentence, aggregation of which finally provides the vector representation of that report. An innovative and efficient graph-based clustering algorithm consisting of splitting and merging operations has been proposed to get the cluster of crime reports. The proposed clustering algorithm generates overlapping clusters, which indicates the existence of reports of multiple crime types. The fuzzy theory has been used to provide a score to the report for expressing its membership into different clusters, and accordingly, the reports are labelled by multiple categories. The efficiency of the proposed method has been assessed by taking into account different datasets and comparing them with other state-of-the-art approaches with the help of various performance measure metrics.
... Out of these, generalized modularity is the quite popular in the field of FCD. A new approach for soft overlapping community detection via fuzzy modularity maximization along with its multi-cycle based faster version were proposed by Yazdanparast et al. in [23]. Luo et al. [24] used dynamic membership function to propose two local community detection algorithms. ...
... Link strength is defined based on community structure, i.e., edges between vertices of different communities are regarded as weak links and edges between vertices of one community as regarded as strong links. Considering the fact that many networks do not contain groundtruth community information, we employ modularity maximization [32][33][34], which is a commonly used method for detecting community structure, to learn the link strength. ...
Article
Full-text available
Predictive graph learning approaches have been bringing significant advantages in many real-life applications, such as social networks, recommender systems, and other social-related downstream tasks. For those applications, learning models should be able to produce a great prediction result to maximize the usability of their application. However, the paradigm of current graph learning methods generally neglects the differences in link strength, leading to discriminative predictive results, resulting in different performance between tasks. Based on that problem, a fairness-aware predictive learning model is needed to balance the link strength differences and not only consider how to formulate it. To address this problem, we first formally define two biases (i.e., Preference and Favoritism) that widely exist in previous representation learning models. Then, we employ modularity maximization to distinguish strong and weak links from the quantitative perspective. Eventually, we propose a novel predictive learning framework entitled ACE that first implements the link strength differentiated learning process and then integrates it with a dual propagation process. The effectiveness and fairness of our proposed ACE have been verified on four real-world social networks. Compared to nine different state-of-the-art methods, ACE and its variants show better performance. The ACE framework can better reconstruct networks, thus also providing a high possibility of resolving misinformation in graph-structured data.
... is section tests the performance of OCDIF on synthetic networks and real networks. We select the overlapping community detection algorithms based on the global structure and local structure of the static networks as the comparison objects of OCDIF (CLPA [29], GREESE [30], ILPA [31], LMD [32], McFFMM [33], MCMOEA [34], MPEA [35], and SSLPA [36]). We use the following two common indicators to evaluate the quality of the community detection: (1) F1-Score (average F1 value) and (2) NMI (normalized mutual information). ...
Article
Full-text available
Community structure is one of the most important characteristics of complex networks, which has important applications in sociology, biology, and computer science. The community detection method based on local expansion is one of the most adaptable overlapping community detection algorithms. However, due to the lack of effective seed selection and community optimization methods, the algorithm often gets community results with lower accuracy. In order to solve these problems, we propose a seed selection algorithm of fusion degree and clustering coefficient. The method calculates the weight value corresponding to degree and clustering coefficient by entropy weight method and then calculates the weight factor of nodes as the seed node selection order. Based on the seed selection algorithm, we design a local expansion strategy, which uses the strategy of optimizing adaptive function to expand the community. Finally, community merging and isolated node adjustment strategies are adopted to obtain the final community. Experimental results show that the proposed algorithm can achieve better community partitioning results than other state-of-the-art algorithms.
... Clustering analysis of networks is sensitive to network noise , and biochemical experimental challenges as well as cell-to-cell variation can contribute to this issue. One way this issue may be addressed is through fuzzy or overlapping communities, where nodes can belong to more than one (Yazdanparast et al., 2020); in fact, multiple containment of an entity within several larger ones is already a property of many community detection algorithms, as noted above. ...
Article
Biological systems are by nature multiscale, consisting of subsystems that factor into progressively smaller units in a deeply hierarchical structure. At any level of the hierarchy, an ever-increasing diversity of technologies can be applied to characterize the corresponding biological units and their relations, resulting in large networks of physical or functional proximities—e.g., proximities of amino acids within a protein, of proteins within a complex, or of cell types within a tissue. Here, we review general concepts and progress in using network proximity measures as a basis for creation of multiscale hierarchical maps of biological systems. We discuss the functionalization of these maps to create predictive models, including those useful in translation of genotype to phenotype, along with strategies for model visualization and challenges faced by multiscale modeling in the near future. Collectively, these approaches enable a unified hierarchical approach to biological data, with application from the molecular to the macroscopic.
Article
Full-text available
Our ability to observe the mesoscale topology of complex networks through community detection has significantly advanced in the past decades. This progress has opened up new frontiers in discovering more sophisticated and meaningful community structures that possess fuzzy and higher-order characteristics. This review provides an overview of two emerging research directions, which are fuzzy and higher-order community detection. It includes related concepts and practical scenarios, mathematical descriptions and latest advancements, as well as current challenges and future directions. Therefore, it will facilitate researchers in swiftly grasping the two emerging fields, offering valuable insights for future development of community detection studies.
Article
Community structure is a main structural feature of networks, which can model many complex systems. It illustrates a lot of information about networks like their internal organisation and the degree of similarity between network nodes. Many methods have been proposed to find community structure in networks. However, there is always requirement to new methods for many reasons especially the accuracy. In this paper, we present a new method to find community structure in networks. Our method is hierarchical algorithm based on Tabu Search metaheuristic. A dendrogram is built by our method after dividing a network several times. The community structure will be selected from the built dendrogram based on quality function called Modularity. Our method has been tested on artificial benchmark and real networks. Results demonstrate the superiority of our method compared with several state-of-art methods.
Article
The exploration of community features is a key issue in network science and data mining. As a vital structural characteristic, community vulnerability has been paid great deal of concern. Recent works underline that many internal and external parameters to quantify community vulnerability necessarily improve conformity with topology, but are suffering from a shortage of comprehensiveness. In this paper, we propose a novel metric, namely communication and structural heterogeneity method (CSH), designed to characterize topological information by communicability and structural dissimilarity. CSH is a global path-related strategy which is based on community dissimilarity. Furthermore, intra-link number, average communicability, topological heterogeneity in communities, as well as inter-link number and structural dissimilarity between communities are employed. Thus, a more detailed evaluation of community vulnerability is suggested. The effectiveness and accuracy of CSH are verified by empirical results in real-world networks. Moreover, the propagation dynamic SIR model and simulations of random and deliberate attack are utilized to validate rationality. Meanwhile, the correlation between node importance (vulnerability) and community vulnerability is explored through experiments. The proposed method (CSH) shows its superiority when comparing it to some state-of-the-art methods.
Article
The explosion of research in the field of network science has seen enormous growth in detection of communities. Since communities are not always disjoint and might overlap, fuzzy clustering algorithm is the most feasible approach to detect overlapping communities. However, one of the major disadvantage of this algorithm lies in the prior specification of number of communities. In this study, a method known as LapEFCM is proposed that adopts modularity strategy to obtain number of overlapping communities which overcome the disadvantage of fuzzy clustering. The method comprises of two major phases. In the first phase, using local random walk, the method computes the similarity matrix to explore features of nodes. In the second phase, a non-linear dimension reduction based fuzzy clustering method is applied to detect overlapping communities. The method incorporates low dimensional data as an input in the fuzzy clustering algorithm. Experiments conducted on both real and artificial networks demonstrate the effectiveness of this method.
Article
In modern Online Social Networks (OSNs), the need to detect users’ communities based on their interests and social connections has became a more and more important challenge in literature. Community Detection supports and make more effective and efficient several Social Network Analysis (SNA) applications: the diffusion of a new idea or technologies can be maximized by identifying of people group interested about a given topic, the recommendation suggestion can be improved taking in account also how the social ties can be influenced the user chooses and the behaviors of people in the same communities, expert finding tasks could be more accurate if users are earlier subdivided into thematic groups, and so on. This paper presents a survey that provides a comprehensive and comparative study of all the different community detection techniques applicable to the various models proposed for OSNs. In particular, the most diffused approaches based on game theory, artificial intelligence and fuzzy strategies are detailed and compared, highlighting the related pros and cons. In addition, the problem of their applicability on the different OSN models is discussed, focusing on complex networks. Finally, the main open issues and challenges for the community detection problem are reported to address the futures work concerning this topic.
Chapter
Full-text available
A new fuzzy genetic algorithm proposed for community identification in social networks. In this paper, we have used matrix encoding that enables traditional crossover between individuals and mutation takes place in some of the individuals. Matrix encoding determines which node belongs to which community. Using these concepts enhance the overall performance of any evolutionary algorithms. In this experiment, we used the genetic algorithm with the fuzzy concept and compared to other existing methods like as crisp genetic algorithm and vertex similarity based genetic algorithm. We employed the three real world dataset strike, Karate Club, Dolphin in this work. The usefulness and efficiency of proposed algorithm are verified through the accuracy and quality metrics and provide a rank of proposed algorithm using multiple criteria decision-making method.
Article
Full-text available
One of the most important elements of social network analysis is community detection, i.e., finding groups of similar people based on their traits. In this paper, we present the fuzzy modularity maximization (FMM) approach for community detection, which finds overlapping—that is, fuzzy—communities (where appropriate) by maximizing a generalized form of Newman's modularity. The first proposed FMM solution uses a tree-based structure to find a globally optimal solution, while the second proposed solution uses alternating optimization to efficiently search for a locally optimal solution. Both of these approaches are based on a proposed algorithm called one-step modularity maximization (OSMM), which computes the optimal cluster memberships for one person in the social network. We prove that OSMM can be formulated as a simplified quadratic knapsack optimization problem, which is O(n) time complexity. We then propose a tree-based algorithm, called FMM/Find Best Leaf Node (FMM/FBLN), which represents sequences of OSMM steps in a tree-based structure. It is proved that FMM/FBLN finds globally optimal solutions for FMM; however, the time complexity of FMM/FBLN is O(nd)O(n^d), dge2dge 2; thus, it is impractical for most real-world networks. To combat this inefficiency, we propose five heuristic-based alternating optimization schemes, i.e., FMM/H1–H5, which are all shown to be O(n2)O(n^2) time complexity. We compare the results of the FMM/H solutions with those of state-of-the-art community detection algorithms, MULTICUT spectral FCM (MSFCM) and GALS, and with those of two fuzzy community detection algorithms called GA and vertex-similarity based gradient-descent method (VSGD) on ten real-world datasets. We conclude that one of the fi- e heuristic algorithms (FMM/H2) is very competitive with GALS and much more effective than MSFCM, GA, and VSGD. Furthermore, all of the FMM/H schemes are at least two orders of magnitude faster than GALS in run time. Finally, FMM/H, unlike GALS (which only produces crisp partitions) and MSFCM (which always finds fuzzy partitions), is the only fuzzy community detection algorithm to date that can find the max-modularity partition, fuzzy or crisp.
Article
Full-text available
Community detection has been extensively studied in the past decades largely because of the fact that community exists in various networks such as technological, social and biological networks. Most of the available algorithms, however, only focus on the properties of the vertices, ignoring the roles of the edges. To explore the roles of the edges in the networks for community discovery, the authors introduce the novel edge centrality based on its antitriangle property. To investigate how the edge centrality characterises the community structure, they develop an approach based on the edge antitriangle centrality with the isolated vertex handling strategy (EACH) for community detection. EACH first calculates the edge antitriangle centrality scores for all the edges of a given network and removes the edge with the highest score per iteration until the scores of the remaining edges are all zero. Furthermore, EACH is characterised by being free of the parameters and independent of any additional measures to determine the community structure. To demonstrate the effectiveness of EACH, they compare it with the state-of-the art algorithms on both the synthetic networks and the real world networks. The experimental results show that EACH is more accurate and has lower complexity in terms of community discovery and especially it can gain quite inherent and consistent communities with a maximal diameter of four jumps.
Article
Full-text available
Identification and classification of overlapping nodes in networks are important topics in data mining. In this paper, a network-based (graph-based) semi-supervised learning method is proposed. It is based on competition and cooperation among walking particles in a network to uncover overlapping nodes by generating continuous-valued outputs (soft labels), corresponding to the levels of membership from the nodes to each of the communities. Moreover, the proposed method can be applied to detect overlapping data items in a data set of general form, such as a vector-based data set, once it is transformed to a network. Usually, label propagation involves risks of error amplification. In order to avoid this problem, the proposed method offers a mechanism to identify outliers among the labeled data items, and consequently prevents error propagation from such outliers. Computer simulations carried out for synthetic and real-world data sets provide a numeric quantification of the performance of the method.
Article
Full-text available
The amount of graph-structured data has recently experienced an enormous growth in many applications. To transform such data into useful information, fast analytics algorithms and software tools are necessary. One common graph analytics kernel is disjoint community detection (or graph clustering). Despite extensive research on heuristic solvers for this task, only few parallel codes exist, although parallelism will be necessary to scale to the data volume of real-world applications. We address the deficit in computing capability by a flexible and extensible community detection framework with shared-memory parallelism. Within this framework we design and implement efficient parallel community detection heuristics: A parallel label propagation scheme; the first large-scale parallelization of the well-known Louvain method, as well as an extension of the method adding refinement; and an ensemble scheme combining the above. In extensive experiments driven by the algorithm engineering paradigm, we identify the most successful parameters and combinations of these algorithms. We also compare our implementations with state of the art competitors. The processing rate of our fastest algorithm often reaches 50M edges/second, making it suitable for massive data sets with billions of edges. We recommend the parallel Louvain method and our variant with refinement as both qualitatively strong and fast.
Article
Full-text available
We discuss a new formulation of a fuzzy validity index that generalizes the Newman-Girvan (NG) modularity function. The NG function serves as a cluster validity functional in community detection studies. The input data is an undirected weighted graph that represents, e.g., a social network. Clusters correspond to socially similar substructures in the network. We compare our fuzzy modularity with two existing modularity functions using the well-studied Karate Club and American College Football datasets.
Article
Full-text available
More than 12 studies of different bottlenose dolphin populations, spanning from tropical to cold temperate waters, have shown that the species typically lives in societies in which relationships among individuals are predominantly fluid. In all cases dolphins lived in small groups characterised by fluid and dynamic interactions and some degree of dispersal from the natal group by both sexes. We describe a small, closed population of bottlenose dolphins living at the southern extreme of the species' range. Individuals live in large, mixed-sex groups in which no permanent emigration/immigration has been observed over the past 7years. All members within the community are relatively closely associated (average half-weight index>0.4). Both male–male and female–female networks of preferred associates are present, as are long-lasting associations across sexes. The community structure is temporally stable, compared to other bottlenose dolphin populations, and constant companionship seems to be prevalent in the temporal association pattern. Such high degrees of stability are unprecedented in studies of bottlenose dolphins and may be related to the ecological constraints of Doubtful Sound. Fjords are low-productivity systems in which survival may easily require a greater level of co-operation, and hence group stability. These conditions are also present in other cetacean populations forming stable groups. We therefore hypothesise that ecological constraints are important factors shaping social interactions within cetacean societies.
Conference Paper
Full-text available
Detecting communities from complex networks has triggered considerable attention in several application domains. Targeting this problem, a local search based genetic algorithm (GALS) which employs a graph-based representation (LAR) has been proposed in this work. The core of the GALS is a local search based mutation technique. Aiming to overcome the drawbacks of the existing mutation methods, a concept called marginal gene has been proposed, and then an effective and efficient mutation method, combined with a local search strategy which is based on the concept of marginal gene, has also been proposed by analyzing the modularity function. Moreover, in this paper the percolation theory on ER random graphs is employed to further clarify the effectiveness of LAR presentation; A Markov random walk based method is adopted to produce an accurate and diverse initial population; the solution space of GALS will be significantly reduced by using a graph based mechanism. The proposed GALS has been tested on both computer-generated and real-world networks, and compared with some competitive community mining algorithms. Experimental result has shown that GALS is highly effective and efficient for discovering community structure.
Conference Paper
Full-text available
Community detection is an important task for mining the structure and function of complex networks. Generally, there are several different kinds of nodes in a network which are cluster nodes densely connected within communities, as well as some special nodes like hubs bridging multiple communities and outliers marginally connected with a community. In addition, it has been shown that there is a hierarchical structure in complex networks with communities embedded within other communities. Therefore, a good algorithm is desirable to be able to not only detect hierarchical communities, but also identify hubs and outliers. In this paper, we propose a parameter-free hierarchical network clustering algorithm SHRINK by combining the advantages of density-based clustering and modularity optimization methods. Based on the structural connectivity information, the proposed algorithm can effectively reveal the embedded hierarchical community structure with multiresolution in large-scale weighted undirected networks, and identify hubs and outliers as well. Moreover, it overcomes the sensitive threshold problem of density-based clustering algorithms and the resolution limit possessed by other modularity-based methods. To illustrate our methodology, we conduct experiments with both real-world and synthetic datasets for community detection, and compare with many other baseline methods. Experimental results demonstrate that SHRINK achieves the best performance with consistent improvements.
Conference Paper
Full-text available
Community detection is an important task for mining the structure and function of complex networks. Many pervious approaches are difficult to detect communities with arbitrary size and shape, and are unable to identify hubs and outliers. A recently proposed network clustering algorithm, SCAN, is effective and can overcome this difficulty. However, it depends on a sensitive parameter: minimum similarity threshold ε, but provides no automated way to find it. In this paper, we propose a novel density-based network clustering algorithm, called gSkeletonClu (graph-skeleton based clustering). By projecting a network to its Core-Connected Maximal Spanning Tree (CCMST), the network clustering problem is converted to finding core-connected components in the CCMST. We discover that all possible values of the parameter ε lie in the edge weights of the corresponding CCMST. By means of tree divisive or agglomerative clustering, our algorithm can find the optimal parameter ε and detect communities, hubs and outliers in large-scale undirected networks automatically without any user interaction. Extensive experiments on both real-world and synthetic networks demonstrate the superior performance of gSkeletonClu over the baseline methods.
Article
Full-text available
Given the increasing popularity of algorithms for overlapping clustering, in particular in social network analysis, quantitative measures are needed to measure the accuracy of a method. Given a set of true clusters, and the set of clusters found by an algorithm, these sets of clusters must be compared to see how similar or different the sets are. A normalized measure is desirable in many contexts, for example assigning a value of 0 where the two sets are totally dissimilar, and 1 where they are identical. A measure based on normalized mutual information, [1], has recently become popular. We demonstrate unintuitive behaviour of this measure, and show how this can be corrected by using a more conventional normalization. We compare the results to that of other measures, such as the Omega index [2].
Article
Full-text available
In this paper we present a novel strategy to discover the community structure of (possibly, large) networks. This approach is based on the well-know concept of network modularity optimization. To do so, our algorithm exploits a novel measure of edge centrality, based on the k-paths. This technique allows to efficiently compute a edge ranking in large networks in near linear time. Once the centrality ranking is calculated, the algorithm computes the pairwise proximity between nodes of the network. Finally, it discovers the community structure adopting a strategy inspired by the well-known state-of-the-art Louvain method (henceforth, LM), efficiently maximizing the network modularity. The experiments we carried out show that our algorithm outperforms other techniques and slightly improves results of the original LM, providing reliable results. Another advantage is that its adoption is naturally extended even to unweighted networks, differently with respect to the LM.
Article
Full-text available
Identifying overlapping communities in networks is a challenging task. In this work we present a probabilistic approach to community detection that utilizes a Bayesian non-negative matrix factorization model to extract overlapping modules from a network. The scheme has the advantage of soft-partitioning solutions, assignment of node participation scores to modules, and an intuitive foundation. We present the performance of the method against a variety of benchmark problems and compare and contrast it to several other algorithms for community detection.
Article
Full-text available
The modern science of networks has brought significant advances to our understanding of complex systems. One of the most relevant features of graphs representing real systems is community structure, or clustering, i. e. the organization of vertices in clusters, with many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters. Such clusters, or communities, can be considered as fairly independent compartments of a graph, playing a similar role like, e. g., the tissues or the organs in the human body. Detecting communities is of great importance in sociology, biology and computer science, disciplines where systems are often represented as graphs. This problem is very hard and not yet satisfactorily solved, despite the huge effort of a large interdisciplinary community of scientists working on it over the past few years. We will attempt a thorough exposition of the topic, from the definition of the main elements of the problem, to the presentation of most methods developed, with a special focus on techniques designed by statistical physicists, from the discussion of crucial issues like the significance of clustering and how methods should be tested and compared against each other, to the description of applications to real networks. Comment: Review article. 103 pages, 42 figures, 2 tables. Two sections expanded + minor modifications. Three figures + one table + references added. Final version published in Physics Reports
Article
Full-text available
We investigate the recently proposed label-propagation algorithm (LPA) for identifying network communities. We reformulate the LPA as an equivalent optimization problem, giving an objective function whose maxima correspond to community solutions. By considering properties of the objective function, we identify conceptual and practical drawbacks of the label-propagation approach, most importantly the disparity between increasing the value of the objective function and improving the quality of communities found. To address the drawbacks, we modify the objective function in the optimization problem, producing a variety of algorithms that propagate labels subject to constraints; of particular interest is a variant that maximizes the modularity measure of community quality. Performance properties and implementation details of the proposed algorithms are discussed. Bipartite as well as unipartite networks are considered.
Article
Full-text available
In this paper, we use a partition of the links of a network in order to uncover its community structure. This approach allows for communities to overlap at nodes so that nodes may be in more than one community. We do this by making a node partition of the line graph of the original network. In this way we show that any algorithm that produces a partition of nodes can be used to produce a partition of links. We discuss the role of the degree heterogeneity and propose a weighted version of the line graph in order to account for this.
Article
Full-text available
Using a database of jazz recordings we study the collaboration network of jazz musicians. We define the network at two different levels. First we study the collaboration network between individuals, where two musicians are connected if they have played in the same band. Then we consider the collaboration between bands, where two bands are connected if they have a musician in common. The community structure analysis reveals that these constructions capture essential ingredients of the social interactions between jazz musicians. We observe correlations between recording locations, racial segregation and the community structure. A quantitative analysis of the community size distribution reveals a surprising similarity with an e-mail based social network recently studied.
Article
Full-text available
Community structure is one of the most important features of real networks and reveals the internal organization of the nodes. Many algorithms have been proposed but the crucial issue of testing, i.e., the question of how good an algorithm is, with respect to others, is still open. Standard tests include the analysis of simple artificial graphs with a built-in community structure, that the algorithm has to recover. However, the special graphs adopted in actual tests have a structure that does not reflect the real properties of nodes and communities found in real networks. Here we introduce a class of benchmark graphs, that account for the heterogeneity in the distributions of node degrees and of community sizes. We use this benchmark to test two popular methods of community detection, modularity optimization, and Potts model clustering. The results show that the benchmark poses a much more severe test to algorithms than standard benchmarks, revealing limits that may not be apparent at a first analysis.
Article
Full-text available
We propose a procedure for analyzing and characterizing complex networks. We apply this to the social network as constructed from email communications within a medium sized university with about 1700 employees. Email networks provide an accurate and nonintrusive description of the flow of information within human organizations. Our results reveal the self-organization of the network into a state where the distribution of community sizes is self-similar. This suggests that a universal mechanism, responsible for emergence of scaling in other self-organized complex systems, as, for instance, river networks, could also be the underlying driving force in the formation and evolution of social networks.
Article
Full-text available
We propose a method to find the community structure in complex networks based on an extremal optimization of the value of modularity. The method outperforms the optimal modularity found by the existing algorithms in the literature giving a better understanding of the community structure. We present the results of the algorithm for computer-simulated and real networks and compare them with other approaches. The efficiency and accuracy of the method make it feasible to be used for the accurate identification of community structure in large complex networks.
Article
Full-text available
Community detection and analysis is an important methodology for understanding the organization of various real-world networks and has applications in problems as diverse as consensus formation in social communities or the identification of functional modules in biochemical networks. Currently used algorithms that identify the community structures in large-scale real-world networks require a priori information such as the number and sizes of communities or are computationally expensive. In this paper we investigate a simple label propagation algorithm that uses the network structure alone as its guide and requires neither optimization of a predefined objective function nor prior information about the communities. In our algorithm every node is initialized with a unique label and at every step each node adopts the label that most of its neighbors currently have. In this iterative process densely connected groups of nodes form a consensus on a unique label to form communities. We validate the algorithm by applying it to networks whose community structures are known. We also demonstrate that the algorithm takes an almost linear time and hence it is computationally less expensive than what was possible so far.
Article
Full-text available
We consider the problem of fuzzy community detection in networks, which complements and expands the concept of overlapping community structure. Our approach allows each vertex of the graph to belong to multiple communities at the same time, determined by exact numerical membership degrees, even in the presence of uncertainty in the data being analyzed. We create an algorithm for determining the optimal membership degrees with respect to a given goal function. Based on the membership degrees, we introduce a measure that is able to identify outlier vertices that do not belong to any of the communities, bridge vertices that have significant membership in more than one single community, and regular vertices that fundamentally restrict their interactions within their own community, while also being able to quantify the centrality of a vertex with respect to its dominant community. The method can also be used for prediction in case of uncertainty in the data set analyzed. The number of communities can be given in advance, or determined by the algorithm itself, using a fuzzified variant of the modularity function. The technique is able to discover the fuzzy community structure of different real world networks including, but not limited to, social networks, scientific collaboration networks, and cortical networks, with high confidence.
Article
Full-text available
We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection methods in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2 million customers and by analysing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad hoc modular networks.
Article
In this paper, a fuzzy agglomerative (FuzAg) approach is proposed for community detection that iteratively updates membership degree of nodes. Earlier approaches assign membership degree to nodes based on communities only. We introduce the notion of self-membership in addition to the membership of different communities. The essence of self-membership is to give opportunity to all nodes in growing their own community. Nodes having higher self-membership degree are referred as anchors, and they get a chance to expand their associated community. Meanwhile, some new anchors may emerge in successive iterations while false or redundant anchors get removed. The time complexity of proposed algorithm is shown to be O(n 2^{2} ). We compare the results of the proposed FuzAg algorithm with those of state-of-the-art fuzzy community detection algorithms on ten real-world datasets as well as on synthetic networks. Results indicated by various quality and accuracy metrics show impressive performance of FuzAg in identifying both disjoint communities and fuzzy communities.
Article
The increasing demand for knowledge from network data poses significant challenges in many tasks. Discovering community structure from a network is one of the classic and significant problems faced in network analysis. In this paper, we study the network structure from the perspective of the composition of fuzzy relations, and a novel algorithm based on fuzzy relations, i.e., CDFR (Community Detection by Fuzzy Relations), is proposed for non-overlapping community detection. The key idea of CDFR is to find the NGC node (Nearest node with Greater Centrality) for each node and compute the fuzzy relation between them. Then, the community to which a node belongs depends on its NGC node. In addition, the decision graph will be constructed to guide community detection. Experimental results on artificial and real-world networks verify the effectiveness and superiority of our CDFR algorithm.
Article
Clustering or community detection is one of the most important problems in social network analysis, and because of the existence of overlapping clusters, fuzzy clustering is a suitable way to cluster these networks. In fuzzy clustering, in addition to the correctness of the clusters assigned to each node, the produced membership of one node to each cluster is also important. In this paper, we introduce a new fuzzy clustering algorithm based on the nonnegative matrix factorization (NMF) method. Despite the well-known fuzzy clustering techniques like FCM, the proposed method does not depend on any parameter. Also, it can produce appropriate memberships based on the network structure and so identify the overlap nodes from non-overlap nodes, well. Also, to evaluate the validity of such fuzzy clustering algorithms, we propose two new evaluation criteria (SFEC and UFEC), which are constructed based on the neighborhood structure of nodes and can evaluate the memberships. Experimental results on some real-world networks and also many artificial networks show the effectiveness and reliability of our proposed criteria.
Article
Community detection is a task of fundamental importance in social network analysis. Community structures enable us to discover the hidden interactions among the network members that can be used in many knowledge-based domains such as bioinformatics, computer networks, e-commerce and forensic science. While there exist many works on community detection based on connectivity structure, they suffer from either considering the overlapping or non-overlapping community structures. In this work, we propose a novel approach for general community detection through an integrated framework to extract the overlapping and non-overlapping community structures without assuming prior structural connectivity on the networks. Our general framework is based on a primary node based criterion which consists up the internal association degree along with the external association degree to compute the criterion in the proposed approach. The evaluation of the proposed method is investigated through the extensive simulation experiments and several benchmark real network data-sets. The experimental results show that the proposed method outperforms the earlier state-of-the-art algorithms based on the well-known evaluation criteria.
Article
Community detection is one of the most prominent problems of social network analysis. In this paper, a novel method for Modularity Maximization (MM) for community detection is presented which exploits the Alternating Direction Augmented Lagrangian (ADAL) method for maximizing a generalized form of Newman’s modularity function. We first transform Newman’s modularity function into a quadratic program and then use Completely Positive Programming (CPP) to map the quadratic program to a linear program, which provides the globally optimal maximum modularity partition. In order to solve the proposed CPP problem, a closed form solution using the ADAL merged with a rank minimization approach is proposed. The performance of the proposed method is evaluated on several real-world data sets used for benchmarks community detection. Simulation results shows the proposed technique provides outstanding results in terms of modularity value for crisp partitions.
Article
Community detection is a fundamental component of large network analysis. In both academia and industry, progressive research has been made on problems related to community network analysis. Community detection is gaining significant attention and importance in the area of network science. Regular and synthetic complex networks have motivated intense interest in studying the fundamental unifying principles of various complex networks. This paper presents a new game-theoretic approach towards community detection in large-scale complex networks based on modified modularity; this method was developed based on modified adjacency, modified Laplacian matrices and neighborhood similarity. This approach was used to partition a given network into dense communities. It is based on determining a Nash stable partition, which is a pure strategy Nash equilibrium of an appropriately defined strategic game in which the nodes of the network were the players and the strategy of a node was to decide to which community it ought to belong. Players chose to belong to a community according to a maximized fitness/payoff. Quality of the community networks was assessed using modified modularity along with a new fitness function. Community partitioning was performed using Normalized Mutual Information and a ‘modularity measure’, which involved comparing the new game-theoretic community detection algorithm (NGTCDA) with well-studied and well-known algorithms, such as Fast Newman, Fast Modularity Detection, and Louvain Community. The quality of a network partition in communities was evaluated by looking at the contribution of each node and its neighbors against the strength of its community.
Article
One of the main challenges of fuzzy community detection problems is to be able to measure the quality of a fuzzy partition. In this paper, we present an alternative way of measuring the quality of a fuzzy community detection output based on n-dimensional grouping and overlap functions. Moreover, the proposed modularity measure generalizes the classical Girvan–Newman (GN) modularity for crisp community detection problems and also for crisp overlapping community detection problems. Therefore, it can be used to compare partitions of different nature (i.e. those composed of classical, overlapping and fuzzy communities). Particularly, as is usually done with the GN modularity, the proposed measure may be used to identify the optimal number of communities to be obtained by any network clustering algorithm in a given network. We illustrate this usage by adapting in this way a well-known algorithm for fuzzy community detection problems, extending it to also deal with overlapping community detection problems and produce a ranking of the overlapping nodes. Some computational experiments show the feasibility of the proposed approach to modularity measures through n-dimensional overlap and grouping functions.
Article
In complex network analysis, fuzzy community detection is a challenging task that aims to reveal the network structure by assigning each vertex quantitative membership-degrees to various communities. In this paper, we propose a fuzzy community detection method that iteratively propagates membership-degrees of all vertices. In each iteration, a candidate seed vertex of a potential community is first selected according to the topological characteristics. After that, the membership-degrees are propagated among adjacent vertices so that a number of communities can be obtained with respect to all selected seeds. To ensure that the modularity keeps improving, in each iteration we discard the selected seeds that decreases the modularity of the community decomposition. In this manner, the topological information about the network can be fully utilized, and communities gradually emerge along with the acceptance of new seeds. Experimental results on real-world and synthetic networks demonstrate that our approach has impressive performance and is robust on both disjoint and fuzzy community detections. Moreover, the proposed approach exhibits a high flexibility between computational complexity and overall performance.
Article
Our personal social networks are big and cluttered, and currently there is no good way to organize them. Social networking sites allow users to manually categorize their friends into social circles (e.g. 'circles' on Google+, and 'lists' on Facebook and Twitter), however they are laborious to construct and must be updated whenever a user's network grows. We define a novel machine learning task of identifying users' social circles. We pose the problem as a node clustering problem on a user's ego-network, a network of connections between her friends. We develop a model for detecting circles that combines network structure as well as user profile information. For each circle we learn its members and the circle-specific user profile similarity metric. Modeling node membership to multiple circles allows us to detect overlapping as well as hierarchically nested circles. Experiments show that our model accurately identifies circles on a diverse set of data from Facebook, Google+, and Twitter for all of which we obtain hand-labeled ground-truth.
Article
Consider data consisting of pairwise measurements, such as presence or absence of links between pairs of objects. These data arise, for instance, in the analysis of protein interactions and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing pairwise measurements with probabilistic models requires special assumptions, since the usual independence or exchangeability assumptions no longer hold. Here we introduce a class of variance allocation models for pairwise measurements: mixed membership stochastic blockmodels. These models combine global parameters that instantiate dense patches of connectivity (blockmodel) with local parameters that instantiate node-specific variability in the connections (mixed membership). We develop a general variational inference algorithm for fast approximate posterior inference. We demonstrate the advantages of mixed membership stochastic blockmodels with applications to social networks and protein interaction networks.
Chapter
Fuzzy community detection in social networks has caught researchers’ attention because, in most real world networks, the vertices (i.e., people) do not belong to only one community. Our recent work on generalized modularity motivated us to introduce a generalized fuzzy t-norm formulation of fuzzy modularity. We investigated four fuzzy t-norm operators, Product, Drastic, Lukasiewicz and Minimum, and the generalized Yager operator, with five well-known social network data sets. The experiments show that the Yager operator with a proper parameter value performs better than the product operator in revealing community structure: (1) the Yager operator can provide a more certain visualization of the number of communities for simple networks; (2) it can find a relatively small-sized community in a flat network; (3) it can detect communities in networks with hierarchical structures; and (4) it can uncover several reasonable covers in a complicated network. These findings lead us to believe that the Yager operator can play a big role in fuzzy community detection. Our future work is to build a theoretical relation between the Yager operator and different types of networks.
Conference Paper
A great deal of research has been conducted on modeling and discovering communities in complex networks. In most real life networks, an object often participates in multiple overlapping communities. In view of this, recent research has focused on mining overlapping communities in complex networks. The algorithms essentially materialize a snapshot of the overlapping communities in the network. This approach has three drawbacks, however. First, the mining algorithm uses the same global criterion to decide whether a subgraph qualifies as a community. In other words, the criterion is fixed and predetermined. But in reality, communities for different vertices may have very different characteristics. Second, it is costly, time consuming, and often unnecessary to find communities for an entire network. Third, the approach does not support dynamically evolving networks. In this paper, we focus on online search of overlapping communities, that is, given a query vertex, we find meaningful overlapping communities the vertex belongs to in an online manner. In doing so, each search can use community criterion tailored for the vertex in the search. To support this approach, we introduce a novel model for overlapping communities, and we provide theoretical guidelines for tuning the model. We present several algorithms for online overlapping community search and we conduct comprehensive experiments to demonstrate the effectiveness of the model and the algorithms. We also suggest many potential applications of our model and algorithms.
Article
Community detection, which aims to cluster N nodes in a given graph into r distinct groups based on the observed undirected edges, is an important problem in network data analysis. In this paper, the popular stochastic block model (SBM) is extended to the generalized stochastic block model (GSBM) that allows for adversarial outlier nodes, which are connected with other nodes in the graph in an arbitrary way. Under this model, we introduce a procedure using convex optimization followed by k-means algorithm with k = r. Both theoretical and numerical properties of the method are analyzed. A theoretical guarantee is given for our methodology to accurately detect the communities with small misclassification rate under the setting where the number of clusters can grow with N. This theoretical result admits to the best known result in the literature of computationally feasible community detection. Numerical results show that our method is both computationally fast and robust to different kinds of outliers, while some popular computationally fast community detection algorithms, such as spectral clustering applied to adjacency matrices or graph Laplacians, may fail due to a very small portion of outliers. We apply a slight modification of our method to a political blogs data set, showing that our method is competent in practice, and comparable to existing computationally feasible methods in the literature. To the best of the authors' knowledge, our result is the first in the literature in terms of clustering communities with fast growing numbers under the generalized stochastic block model where a portion of arbitrary outlier nodes exist.
Article
Community detection is a very important problem in social network analysis. Classical clustering approach, KK-means, has been shown to be very efficient to detect communities in networks. However, KK-means is quite sensitive to the initial centroids or seeds, especially when it is used to detect communities. To solve this problem, in this study, we propose an efficient algorithm KK-rank, which selects the top-KK nodes with the highest rank centrality as the initial seeds, and updates these seeds by using an iterative technique like KK-means. Then we extend KK-rank to partition directed, weighted networks, and to detect overlapping communities. The empirical study on synthetic and real networks show that KK-rank is robust and better than the state-of-the-art algorithms including KK-means, BGLL, LPA, infomap and OSLOM.
Article
To find the fuzzy community structure in a complex network, in which each node has a certain probability of belonging to a certain community, is a hard problem and not yet satisfactorily solved over the past years. In this paper, an extension of modularity, the fuzzy modularity is proposed, which can provide a measure of goodness for the fuzzy community structure in networks. The simulated annealing strategy is used to maximize the fuzzy modularity function, associating with an alternating iteration based on our previous work. The proposed algorithm can efficiently identify the probabilities of each node belonging to different communities with random initial fuzzy partition during the cooling process. An appropriate number of communities can be automatically determined without any prior knowledge about the community structure. The computational results on several artificial and real-world networks confirm the capability of the algorithm.
Article
Identification of (overlapping) communities/clusters in a complex network is a general problem in data mining of network data sets. In this paper, we devise a novel algorithm to identify overlapping communities in complex networks by the combination of a new modularity function based on generalizing NG's Q function, an approximation mapping of network nodes into Euclidean space and fuzzy c-means clustering. Experimental results indicate that the new algorithm is efficient at detecting both good clusterings and the appropriate number of clusters.
Conference Paper
Observations consisting of measurements on relationships for pairs of objects arise in many settings, such as protein interaction and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing such data with probabilisic models can be delicate because the simple exchangeability assumptions underlying many boilerplate models no longer hold. In this paper, we describe a latent variable model of such data called the mixed membership stochastic blockmodel. This model extends blockmodels for relational data to ones which capture mixed membership latent relational structure, thus providing an object-specific low-dimensional representation. We develop a general variational inference algorithm for fast approximate posterior inference. We explore applications to social and protein interaction networks.
Conference Paper
Recent years have seen the development of many graph clustering algorithms, which can identify community structure in networks. The vast ma- jority of these only find disjoint communities, but in many real-world networks communities overlap to some extent. We present a new algorithm for discover- ing overlapping communities in networks, by extending Girvan and Newman's well-known algorithm based on the betweenness centrality measure. Like the original algorithm, ours performs hierarchical clus tering — partitioning a net- work into any desired number of clusters — but allo ws them to overlap. Ex- periments confirm good performance on randomly generated networks based on a known overlapping community structure, and interesting results have also been obtained on a range of real-world networks.
Article
Many complex networks display a mesoscopic structure with groups of nodes sharing many links with the other nodes in their group and comparatively few with nodes of different groups. This feature is known as community structure and encodes precious information about the organization and the function of the nodes. Many algorithms have been proposed but it is not yet clear how they should be tested. Recently we have proposed a general class of undirected and unweighted benchmark graphs, with heterogeneous distributions of node degree and community size. An increasing attention has been recently devoted to develop algorithms able to consider the direction and the weight of the links, which require suitable benchmark graphs for testing. In this paper we extend the basic ideas behind our previous benchmark to generate directed and weighted networks with built-in community structure. We also consider the possibility that nodes belong to more communities, a feature occurring in real systems, such as social networks. As a practical application, we show how modularity optimization performs on our benchmark.
Article
A number of recent studies have focused on the statistical properties of networked systems such as social networks and the Worldwide Web. Researchers have concentrated particularly on a few properties that seem to be common to many networks: the small-world property, power-law degree distributions, and network transitivity. In this article, we highlight another property that is found in many networks, the property of community structure, in which network nodes are joined together in tightly knit groups, between which there are only looser connections. We propose a method for detecting such communities, built around the idea of using centrality indices to find community boundaries. We test our method on computer-generated and real-world graphs whose community structure is already known and find that the method detects this known structure with high sensitivity and reliability. We also apply the method to two networks whose community structure is not well known--a collaboration network and a food web--and find that it detects significant and informative community divisions in both cases.
Article
Many networks display community structure--groups of vertices within which connections are dense but between which they are sparser--and sensitive computer algorithms have in recent years been developed for detecting this structure. These algorithms, however, are computationally demanding, which limits their application to small networks. Here we describe an algorithm which gives excellent results when tested on both computer-generated and real-world networks and is much faster, typically thousands of times faster, than previous algorithms. We give several example applications, including one to a collaboration network of more than 50,000 physicists.
Article
We propose and study a set of algorithms for discovering community structure in networks-natural divisions of network nodes into densely connected subgroups. Our algorithms all share two definitive features: first, they involve iterative removal of edges from the network to split it into communities, the edges removed being identified using any one of a number of possible "betweenness" measures, and second, these measures are, crucially, recalculated after each removal. We also propose a measure for the strength of the community structure found by our algorithms, which gives us an objective metric for choosing the number of communities into which a network should be divided. We demonstrate that our algorithms are highly effective at discovering community structure in both computer-generated and real-world network data, and show how they can be used to shed light on the sometimes dauntingly complex structure of networked systems.
Article
In many networks, it is of great interest to identify "communities", unusually densely knit groups of individuals. Such communities often shed light on the function of the networks or underlying properties of the individuals. Recently, Newman suggested "modularity" as a natural measure of the quality of a network partitioning into communities. Since then, various algorithms have been proposed for (approximately) maximizing the modularity of the partitioning determined. In this paper, we introduce the technique of rounding mathematical programs to the problem of modularity maximization, presenting two novel algorithms. More specifically, the algorithms round solutions to linear and vector programs. Importantly, the linear programing algorithm comes with an a posteriori approximation guarantee: by comparing the solution quality to the fractional solution of the linear program, a bound on the available "room for improvement" can be obtained. The vector programming algorithm provides a similar bound for the best partition into two communities. We evaluate both algorithms using experiments on several standard test cases for network partitioning algorithms, and find that they perform comparably or better than past algorithms. Comment: Submitted to EPJB. 9 pages, 3 EPS figures