Figure - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
Source publication
With the recent prevalence of information networks, the topic of community detection has gained much interest among researchers. In real-world networks, node attribute (content information) is also available in addition to topology information. However, the collected topology information for networks is usually noisy when there are missing edges. F...
Contexts in source publication
Context 1
... further assess the parameters analysis phase, the number of initial clusters identified at local clustering stage along with the value of α against the per cent of removed edges, for the four datasets, is reported in Table 1. ...Context 2
... results in Table 1 indicate that the noise has no significant influence on the value of α. In other words, the method used to define α value (see Eq. 12) is somewhat stable. ...Similar publications
Recent studies show that graph convolutional network (GCN) often performs worse for low-degree nodes, exhibiting the so-called structural unfairness for graphs with long-tailed degree distributions prevalent in the real world. Graph contrastive learning (GCL), which marries the power of GCN and contrastive learning, has emerged as a promising self-...
Citations
... The algorithm dynamically optimizes the similarity matrix for each iteration by examining the implicit contents of the topological structure and the content similarity matrix, deepening the relationship between network topology and node contents. (Bhih et al. 2020) provided an optimization tool as a preprocessing step for constructing a hybrid similarity matrix for existing community detection algorithms. Their approach assigns weight values to each content during the matrix construction process and determines the similarity between information-links through the Jaccard similarity coefficient. ...
The realm of complex network analysis is witnessing a surge in research focus on community detection. Numerous algorithms have been put forth, each harboring distinct advantages and drawbacks. Predominantly, these algorithms rely solely on network topologies for community detection. Yet, many real-world networks harbor valuable node content that intricately mirrors the fabric of their communities. Recognizing this, leveraging node contents stands as a potential avenue to augment the quality of community detection. This study introduces an innovative evolutionary algorithm rooted in the fuzzy analytical hierarchy process (FAHP) to propel community detection in complex networks by intertwining content and structural information. Noteworthy is its departure from the conventional multi-chromosome evolutionary algorithms, opting for a single-chromosome design that substantially curtails computational complexity. The algorithm employs a distinctive FAHP-based local operator, termed the community topological modifier, to refine community structures and elevate the quality of community detection within the current generation. A novel criterion for gauging content similarity among nodes is integrated into the algorithm. Additionally, an early fusion approach is suggested, creating a hybrid graph that amalgamates structural and content information between nodes. Rigorous evaluation in diverse real networks ensued, with comparative analyses against state-of-the-art and traditional methods. Notably, the proposed algorithm emerged as the frontrunner, securing top rankings across all evaluation criteria—such as normalized mutual information (NMI) and adjusted Rand index (ARI)—based on the results of the Friedman test.
... Disadvantages Content and topology information A pre-processing approach that considers both the attribute information and connectivity information aspects of the network for community detection is presented in this work [13] Considering more than one source of information for community detection could produce meaningful clusters and improve the robustness of the network The main disadvantage of these categories is that it is difficult to determine the ideal network topology for an issue. When the topology is decided, it acts as a lower bound for classification error A multi-view clustering method based on robust nonnegative matrix factorization (MVCRNMF) [14] MVCRNMF can learn the contribution weights from the link and content information adaptively A novel spammer classification method based on LDA (Latent Dirichlet Allocation) [15] Retrieves global and local data to capture the nature of spam-sending in subject distribution patterns ...
Networks in the real world are dynamic and evolving. The most critical process in networks is to determine the structure of the community, based on which we can detect hidden communities in a complex network. The design of strong network structures is of great importance, meaning that a system must maintain its function in the face of attacks and failures and have a strong community structure. In this paper, we proposed the robust memetic algorithm and used the idea to optimize the detection of dynamic communities in complex networks called RDMA_NET (Robust Dynamic Memetic Algorithm). In this method, we work on dynamic data that affects the two main parts of the initial population value and the calculation of the evaluation function of each population, and there is no need to determine the number of communities in advance. We used two sets of real-world networks and the LFR dataset. The results show that our proposed method, RDMA_Net, can find a better solution than modern approaches and provide near-optimal performance in search of network topologies with a strong community structure.
... These, in conjunction with the global transitivity coefficient, the identification of giant components, and the detection of within-network communities, were employed to describe how connected and clustered the entire cattle movement system in Uganda is. The Walktrap algorithm, based on the principle of random walks 41,42 , was used to identify the communities. In this study, www.nature.com/scientificreports/ ...
Animal movements are a major driver for the spread of Transboundary Animal Diseases (TADs). These movements link populations that would otherwise be isolated and hence create opportunities for susceptible and infected individuals to meet. We used social network analysis to describe the seasonal network structure of cattle movements in Uganda and unravel critical network features that identify districts or sub-regions for targeted risk-based surveillance and intervention. We constructed weighted, directed networks based on 2019 between-district cattle movements using official livestock mobility data; the purpose of the movement (‘slaughter’ vs. ‘live trade’) was used to subset the network and capture the risks more reliably. Our results show that cattle trade can result in local and long-distance disease spread in Uganda. Seasonal variability appears to impact the structure of the network, with high heterogeneity of node and edge activity identified throughout the seasons. These observations mean that the structure of the live trade network can be exploited to target influential district hubs within the cattle corridor and peripheral areas in the south and west, which would result in rapid network fragmentation, reducing the contact structure-related trade risks. Similar exploitable features were observed for the slaughter network, where cattle traffic serves mainly slaughter hubs close to urban centres along the cattle corridor. Critically, analyses that target the complex livestock supply value chain offer a unique framework for understanding and quantifying risks for TADs such as Foot-and-Mouth disease in a land-locked country like Uganda. These findings can be used to inform the development of risk-based surveillance strategies and decision making on resource allocation. For instance, vaccine deployment, biosecurity enforcement and capacity building for stakeholders at the local community and across animal health services with the potential to limit the socio-economic impact of outbreaks, or indeed reduce their frequency.
... In [51], the authors propose an optimization tool that exploits both the content and the topology of social networks. The authors show that the information conveyed by the topology of the network is usually noisy, and aim to support such a dimension of analysis with the content associated with the nodes. ...
The massive adoption of social networks increased the need to analyze users’ data and interactions to detect and block the spread of propaganda and harassment behaviors, as well as to prevent actions influencing people towards illegal or immoral activities. In this paper, we propose HURI, a method for social network analysis that accurately classifies users as safe or risky, according to their behavior in the social network. Specifically, the proposed hybrid approach leverages both the topology of the network of interactions and the semantics of the content shared by users, leading to an accurate classification also in the presence of noisy data, such as users who may appear to be risky due to the topic of their posts, but are actually safe according to their relationships. The strength of the proposed approach relies on the full and simultaneous exploitation of both aspects, giving each of them equal consideration during the combination phase. This characteristic makes HURI different from other approaches that fully consider only a single aspect and graft partial or superficial elements of the other into the first. The achieved performance in the analysis of a real-world Twitter dataset shows that the proposed method offers competitive performance with respect to eight state-of-the-art approaches.
... A pre-processing approach that considers both the attribute information and connectivity information aspects of the network for community detection is presented in this work. [13] Considering more than one source of information for community detection could produce meaningful clusters and improve the robustness of the network. ...
Networks in the real world are dynamic and evolving. The most critical process in networks is to determine the structure of the community, based on which we can detect hidden communities in a complex network. The design of strong network structures is of great importance, meaning that a system must maintain its function in the face of attacks and failures and have a strong community structure. In this paper, we proposed the robust memetic algorithm and used the idea to optimize the detection of dynamic communities in complex networks called RDMA_NET (Robust Dynamic Memetic Algorithm). In this method, we work on dynamic data that affects the two main parts of the initial population value and the calculation of the evaluation function of each population, and there is no need to determine the number of communities in advance. We used two sets of real-world networks and the LFR dataset. The results show that our proposed method, RDMA_Net, can find a better solution than modern approaches and provide near-optimal performance in search of network topologies with a strong community structure.
... The proposed method identifies the community clusters from an entire network without the global knowledge of the network topology due to the use of the Parallel Decentralized Iterative Community Clustering Approach (PDICCA), a pipelined parallel implementation that transforms the serial process of the DICCA into a parallelized approach. Recent research [13][14][15][16][17][18][19][20] has shown that using content data also helps measure the similarity between nodes. Nodes with similar content are highly likely to belong to the same community. ...
... They have also adopted the block coordinate descent method to optimize the model parameters. Bhih et al. [20] proposed a new method for community detection by considering both topology and content sources of information. The proposed algorithm tightly integrates the network's attribute information, shared neighbors, and connectivity information aspects to build a hybrid similarity matrix. ...
Community detection is essential in P2P network analysis as it helps identify connectivity structure, undesired centralization, and influential nodes. Existing methods primarily utilize topological data and neglect the rich content data. This paper proposes a technique combining topological and content data to detect communities inside the Bitcoin network using a deep feature representation algorithm and Deep Feedforward Autoencoders. Our results show that the Bitcoin network has a higher clustering coefficient, assortativity coefficient, and community structure than expected from a random P2P network. In the Bitcoin network, nodes prefer to connect to other nodes that share the same characteristics.
... After each iteration, similar clusters merge until k clusters are produced. Bhih et al. [23] proposed a topological-and content-based community detection algorithm. In their work, they proposed a new hybrid similarity matrix representing a weighted contribution of attribute information, information shared by the neighbors of a node, and the connectivity information among the nodes. ...
The presence of community structures in complex networks reveals meaningful insights about such networks and their constituent entities. Finding groups of related nodes based on mutual interests, common features, objectives, or interactions in a network is known as community detection. In this paper, we propose a novel Stacked Autoencoder-based deep learning approach augmented by the Crow Search Algorithm (CSA)-based k-means clustering algorithm to uncover community structure in complex networks. As per our approach, firstly, we generate a modularity matrix for the input graph. The modularity matrix is then passed through a series of stacked autoencoders to reduce the dimensionality of the matrix while preserving the topology of the network and improving the computational time of the proposed algorithm. The obtained matrix is then provided as an input to a modified k-means clustering algorithm augmented with the crow search optimization to detect the communities. We use Crow Search Algorithm-based optimization to generate the initial centroids for the k-means algorithm instead of generating them randomly. We perform extensive experimental analysis on several real-world and synthetic datasets and evaluate various performance metrics. We compare the results obtained by our algorithm with several traditional and contemporary community detection algorithms. The obtained results reveal that our proposed method achieves commendable results.
... Two algorithms were applied to four real-world networks with various characteristics to prove the usefulness and universality of this method. Reference [19] proposed a scheme to improve the existing community detection algorithm by considering the information topology and content. It can detect a more meaningful community structure in the network with incomplete information, but the discrimination of invalid information is not high, resulting in the unexpected detection effect. ...
In view of the difficulty and low efficiency of most existing algorithms in detecting large-scale community networks, an unsupervised community detection algorithm based on graph convolution networks and social media is proposed. First, some positive and negative sample nodes are labeled according to the node similarity to complete the graph segmentation. Then, the improved graph convolution network model is used for training to obtain the local community where the given starting node is located. Finally, the local community is optimized by setting the threshold of membership degree, so as to further screen the nodes outside the community and obtain accurate community detection results. The experimental analysis of the proposed algorithm based on Flixster, Douban, and Yelp datasets shows that when the number of community divisions is 12, the modularity values on the three datasets are 0.59, 0.62, and 0.69, respectively, and the standard deviations of F1 are 0.044, 0.048, and 0.040, respectively. Overall, the proposed unsupervised community detection algorithm has better robustness.
... Wang et al. [63] proposed a network-relevant music recommendation system to discover user-track relationships that assume user's priority as a playlist of users, contextual factors, present time, place, etc., to describe the user's favored music topics. Bhih et al. [8] addressed the issues of existing community detection methods and presented a new method by combining both topological and content-based attributes of the given data. ...
Large-scale graph processing is one of the recently developed signifcant research
areas relevant to big data analytics. Distributed graph analytics is useful to see the
intuitive insights of node interactions from large-scale network data. Distributed
graph computing is an upcoming area in graph data mining that explores crucial
node relationships for a given graph dataset. In this paper, we propose a new method
to discover top-k user–user communities for a weighted bipartite network by defning
a weighted similarity measure. We extend the structural similarity metric, namely
Otsuka–Ochiai coefcient, by adding weights of nodes and quantifes the similarity
between distinct items of a user–item network. We propose a new method to mine
top-k user–user communities based on the similarity of items using a weighted similarity measure. Further, two algorithms, namely TUCSGF, TUCFlink, are presented
to mine top-k user–user communities in a distributed approach based on the strength
of the item-to-item similarities. Moreover, we execute the TUCSGF algorithm using
Apache Spark by utilizing the advantage of Spark GraphFrames to mine top-k user–
user communities. Also, we implement the TUCFlink algorithm to mine top-k communities using Apache Flink by utilizing the functionalities of Flink Gelly. Further,
we explore two real-world network applications online learning network, chain of
hospitals network with various graph methods that are to be applied for both the
applications. Furthermore, we systematically perform various experiments concerning execution time, memory consumption, and CPU usage of both TUCSGF,
TUCFlink on three distinct datasets. The performance of TUCFLINK is far better
than TUCSGF concerning computing time. Applying distributed graph analytics for
various complex networks using distributed graph processing tools GraphX, GraphFrames and Gelly provides more intuitive insights about distinct types of node interactions in graph data mining.
... In addition to this, we have picked some community detection algorithms to analyze the direct usage of some of the facets discussed in section II. For instance, algorithm by A. Bhih et al. [83] illustrates the direct use of facets and algorithms namely, community diffusion (CoDi) [20], COmmunity-preserving SocIal Network Embeddings (COSINE) [18], Overlapping Community Detection based on Information Dynamics (OCDID) [21] demonstrates the indirect usage of the facets. ...
... The usage of the facets by these algorithms in addition to their proposed strategies are discussed here. A. Bhih et al. [83] proposed an algorithm using a similarity matrix which is build using the influence of clustering coefficient, topic, common interests etc., on community structure and further detected communities using a novel clustering algorithm. However, CoDi and COSINE algorithm is dependent on the exploitation of cascade based information and structural equivalence measure for community detection. ...
... Out of these algorithms, we shall discuss CoDi, COSINE and community detection algorithm by A. Bhih et al. [83] to illustrate combined algorithms. CoDi and COSINE are information diffusion based algorithms that have been picked to explain how communities are detected when only cascade information is provided. ...
The flow of information through active users in online social networks (OSNs) plays a major role in forming natural social groups, popularly known as communities. Although structural and topological aspects of the network had been central to most of the community detection approaches, incorporation of information flow for community detection has been an emerging topic in the recent past. Often, the flow of information is studied as a traceable process called information diffusion. The flow of information in the network affects various factors like temporal characteristics, network attributes, or social attributes. The information diffusion process helps to extract this information including where and when information is generated and in what fashion the dispersion occurs. Thus, it has the potential to aid the community detection process in social networks. In this article, the deployment of the information diffusion process for community detection has been studied extensively. The study is mainly focused on how information flow affects various network properties and social facets and explored the possibility of deployment for community detection. Various information diffusion models and community detection algorithms have been discussed in the context of network properties and social facets. Current challenges, future directions, and modalities for the deployment of information diffusion in community detection have been discussed. In addition, various widely used datasets, evaluation metrics as well as evaluation methods for evaluating community detection algorithms are also detailed.