Article

Finding community structure in very large networks

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In this paper, we evaluate eight modularity-based heuristics [13,7,34,56,60,8,57,35], two variations of a graph neural network algorithm [55], and several variations of an approximate algorithm [4]. We quantify the extent to which these ten algorithms succeed in returning an optimal partition or a partition resembling an optimal partition. ...
... We evaluate ten modularity-based algorithms known as Clauset-Newman-Moore (CNM) [13], Louvain [7], Reichardt Bornholdt with the configuration model as the null model (LN) [46,34], Combo [56], Belief [60], Paris [8], Leiden [57], and EdMot-Louvain [35], recurrent graph neural network (GNN) [55], and Bayan [4]. Except for Bayan and GNN, we use the Python implementations of the remaining eight algorithms (collectively referred to as heuristics) which are accessible in the Community Discovery library (CDlib) version 0.2.6 [48]. ...
... CNM: The CNM algorithm initializes each node as a community by itself. It then follows a greedy scheme of merging two communities that contribute the maximum positive value to modularity [13]. ...
Article
Full-text available
Community detection, a fundamental problem in computational sciences, finds applications in various domains. Heuristics are often employed to detect communities through maximizing an objective function , modularity, over partitions of network nodes. Our research delves into the performance of different modularity maximization algorithms in achieving optimal partitions. We use 104 networks, comprising real-world instances from diverse contexts and synthetic graphs with modular structures. We analyze ten inexact modularity-based algorithms against an exact baseline which is an exact integer programming method that globally optimizes modularity. The ten algorithms analyzed include eight heuristics, two variations of a graph neural network algorithm, and several variations of the Bayan approximation algorithm. Our findings show that, on average, modularity-based heuristics produce optimal partitions for 43.9% of the 104 networks considered. Graph neural networks on average achieve optimality 68.7% of the times. The weakest variation of Bayan achieves optimality 75% of the times. This rate increases to 91.3% when a sufficiently small approximation threshold is chosen. Furthermore , our analysis uncovers substantial dissimilarities between the partitions obtained by most commonly used modularity-based methods and any optimal partition of the networks, as indicated by both adjusted and reduced mutual information metrics. Importantly, our results show that near-optimal partitions are often disproportionately dissimilar to any optimal partition. Taken together, our analysis points to a crucial limitation of the commonly used unguaranteed modularity-based methods for discovering communities: they rarely produce an optimal partition or a partition resembling an optimal partition even on networks with modular structures. If modularity is to be used for detecting communities , approximate optimization algorithms are recommendable for a more methodologically sound usage of modularity within its applicability limits. This article is an extended version of the conference paper (Aref et al., 2023) published in the proceedings of ICCS 2023.
... NG partitioned by VOS (NG vos ) describes VOS membership cohesiveness 46 www.nature.com/scientificreports/ approach of iteratively sampling random links that would increase the modularity of an initial subnetwork linking highly connected nodes in the original network 63 . Remarkably, the NG vos and FGC indicators measured along the timeline uncovered mirrored patterns of increase in modular cohesiveness and agglomerative structure for all growing networks. ...
... High values of both γ and R 2 indicate that scale free behavior is strongly supported. Other power law statistics included: (1) KS fit statistic, which compares the fitted distribution with the input degree vector; (2) the KS p-value, with the null hypothesis of data being drawn from the power law distribution 62,63 ; and (3) the exponent of the fitted power law distribution (α), which assumes P(X = x) is proportional to x -α . Lower KS fit score, larger KS p-value (≥ 0.05), and higher α suggest better fit to power law distribution. ...
... (2) The Clustering Ratio (C-ratio) is the ratio of the number of clusters to the size of an inter connected node set; (3) The average Clustering Coefficient (C) describes the mean ratio of triangles to connected triads over all nodes in the simplified (undirected/unweighted) network [52][53][54] is meaningful only for unimodal graphs 62 . We also report coefficients of linear regression over C for loop and domain network projections; (4) The Fast Greedy Community (FGC) hierarchical agglomeration algorithm detects community structure with linear run time O(m d logn) ~ O(n log 2 n), of a network with m edges, n nodes, and depth d of the dendrogram describing its community structure 63 ; and (5 and 6) The Newman-Girvan algorithm index (NG), computed with partitions defined by age (NG age ) and VOS clustering (NG vos ). NG calculates the modularity of a network based on some classification (partition) to measure how good that classification is in dividing the various node types, indicated by assortative (positive) or disassortative (negative) mixing across modules 51 . ...
Article
Full-text available
The structures and functions of proteins are embedded into the loop scaffolds of structural domains. Their origin and evolution remain mysterious. Here, we use a novel graph-theoretical approach to describe how modular and non-modular loop prototypes combine to form folded structures in protein domain evolution. Phylogenomic data-driven chronologies reoriented a bipartite network of loops and domains (and its projections) into ‘waterfalls’ depicting an evolving ‘elementary functionome’ (EF). Two primordial waves of functional innovation involving founder ‘p-loop’ and ‘winged-helix’ domains were accompanied by an ongoing emergence and reuse of structural and functional novelty. Metabolic pathways expanded before translation functionalities. A dual hourglass recruitment pattern transferred scale-free properties from loop to domain components of the EF network in generative cycles of hierarchical modularity. Modeling the evolutionary emergence of the oldest P-loop and winged-helix domains with AlphFold2 uncovered rapid convergence towards folded structure, suggesting that a folding vocabulary exists in loops for protein fold repurposing and design.
... In the Creation Mode, this is done by first randomly grouping the cell types and applying Kruskal's algorithm [59] with random weights to sketch the underlying tree or forest. In the Emulation Mode, this is accomplished by the Clauset-Newman-Moore greedy modularity maximization algorithm [60,61] followed by the Kruskal's algorithm with the L2 distances among observed average cell means as weights. In both circumstances the root node represents the starting point of differentiation, where its child nodes inherit certain channels that define the entire lineage. ...
... Next, we carve out a spanning tree or a spanning forest via the Clauset-Newman-Moore greedy modularity maximization algorithm [60,61] followed by Kruskal's algorithm [59]. Finally, to convert the graph to a directed acyclic graph (DAG), a depth-first search with a randomly selected node as the root on each graph component is performed. ...
Article
Full-text available
Recently, many analysis tools have been devised to offer insights into data generated via cytometry by time-of-flight (CyTOF). However, objective evaluations of these methods remain absent as most evaluations are conducted against real data where the ground truth is generally unknown. In this paper, we develop Cytomulate, a reproducible and accurate simulation algorithm of CyTOF data, which could serve as a foundation for future method development and evaluation. We demonstrate that Cytomulate can capture various characteristics of CyTOF data and is superior in learning overall data distributions than single-cell RNA-seq-oriented methods such as scDesign2, Splatter, and generative models like LAMBDA. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-023-03099-1.
... In this paper, we investigate the community detection in complex networks. This problem has been receiving a lot of attention [4,13,18,22,27,31,33,38,39,40]. To learn more on the topic, we invite the reader to overview the survey conducted by Fortunato [18]. ...
... Several authors [3,7,9,11,13,22,27,31,33,38,40,41] proposed different kinds of algorithms for the community detection problem, according to their proper definition of community. Using the Variable Neighborhood Search (VNS) [12,24,29], Aloise, et al. [3] designed a successful algorithm for this problem. ...
... Este trabalho propõe o uso de redes complexas como instrumento preditivo de resultados eleitorais, usando técnicas da análise de redes sociais com destaque para as medidas de centralidade e a modularidade (MD) [Clauset et al. 2004]. Na próxima seção serão apresentados os trabalhos relacionados, na seção 3 será descrita a metodologia aplicada para análise, com posterior exposição dos resultados obtidos e sua discussão na seção 4. As limitações e fragilidades do trabalho são apontadas na seção 5, por fim, a seção 6 apresentará a conclusão e sugestões para trabalhos futuros. ...
... Os valores das centralidades e da modularidade foram normalizados e agregados por candidato através do cálculo de suas médias (Tabela 2). A possibilidade de segmentar uma rede complexa em grupos de nós que tenham características comuns levou ao estudo da modularidade [Clauset et al. 2004]. ...
Article
O uso das pesquisas de opinião para predição de tendências é uma prática utilizada desde o século XIX. As pesquisas eleitorais, como conhecemos hoje, tiveram seu início nos Estados Unidos da América (EUA) há mais de cem anos, sempre com o objetivo de prever, com uma margem de erro aceitável, o resultado de cada pleito. Entretanto, a realização de uma pesquisa implica em custos operacionais significativos, e.g. pessoal treinado, deslocamento, divulgação, software. Este artigo tem como objetivo explorar o uso de métricas de análise de redes, abordando as medidas de centralidade e, em particular, a modularidade, para a formulação de um modelo simples de predição de resultados eleitorais. O estudo tomou como base de dados mais de trezentos mil comentários coletados no Twitter às vésperas de três pleitos eleitorais, utilizados como estudos de caso. O uso da modularidade foi capaz de alcançar um erro absoluto médio de 1,59%, inferior às chamadas pesquisas de “boca de urna” que obtiveram, de forma combinada, 2,48%.
... If a small change such as randomly added or deleted edges in the network can completely change the obtained community structures, then it can be argued that the communities found should not be considered trustworthy (Karrer et al. 2008). While several algorithms are used for community detection in network analyses such as walktrap (Pons and Latapy 2005), leading eigenvector (Newman 2006), greedy optimization of modularity (Clauset et al. 2004), and label propagation (Raghavan et al. 2007), the effectiveness of Louvain method (Blondel et al. 2008) has been shown in recent community detection studies in hydrology (e.g., Agarwal et al. 2018;Conticello et al. 2018;Joo et al. 2021;Ozturk et al. 2019). The Louvain method implements a multi-level modularity optimization algorithm where a higher value of modularity indicates that nodes are densely linked within communities but sparsely linked to nodes in other communities. ...
... To obtain VI curves, a clustering algorithm and a perturbation strategy are required. Several community detection algorithms such as fast greedy algorithm (Clauset et al. 2004), walktrap method (Pons and Latapy 2005), and label propagation methods (Raghavan et al. 2007) could be used and different results can be achieved, but the multilevel modularity optimization or Louvain algorithm (Blondel et al. 2008) has been successfully applied in hydrological studies (Agarwal et al. 2018;Conticello et al. 2018;Ozturk et al. 2019) due to its accuracy and high computation speed in addition to effective in determining community structures as noted by Signorelli and Cutillo (2022). Please refer to Blondel et al. (2008) for details of the algorithm. ...
Article
Full-text available
Recently, complex network-based approaches are shown to be efficient for spatial analysis of rainfall variation. One of the most critical limitations of correlation-based networks is using some assumed threshold levels to identify the existence of links that lead to different topological and community structures. A hypothesis test is formulated for the robustness analysis of recovered community structures for monthly rainfall data of Tasmania, Australia. To this aim, variation of information (VI) curves are constructed for the original and random networks. Then Gaussian process regression method is applied for these curves at different correlation threshold values (i.e., 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, and 0.90) to obtain maximum likelihoods as forms of Bayes factors. The networks are analyzed on a local and global scale by using node strength, node efficiency, edge density, and global efficiency measures to reveal the features of the rainfall network in the basin. Mainly, local strengths and efficiencies show that the networks are more efficient for threshold values higher than of 0.70. Global measures (i.e., edge density and global efficiency) decrease as the threshold increases except for the threshold of 0.80. In a rainfall forecasting exercise, using the robust network with the threshold value of 0.80 increases the coefficient of determination, Nash–Sutcliffe, and Kling-Gupta efficiencies from 0.29, 0.27, and 0.43 to 0.41, 0.36, and 0.61, respectively, leading to about 41%, 33%, and 41% improvement. Therefore, the results could be useful for determining robust network structures for various hydrological purposes such as filling missing values, regional flood analysis, and forecasting.
... An additionally computed cluster analysis (cf. Clauset et al., 2004) confirmed the existence of the two groupings. ...
... However, a bulk of countries resides in between (e.g., Germany, Canada, United States, Sweden, Austria), linking one or more competing multinationals from both groupings with each other and marking national markets that are contested between the two groupings. Fruchterman and Reingold (1991), underlay highlights groupings calculated with clustering algorithm by Clauset et al. (2004) Considering these patterns, it is interesting to learn how each multinational corporation and the network of competition relationships relate to the diversity of contract types we found above. Figure 2 shows the contract type combinations across all subsidiaries for each multinational corporation, separated by groupings A and B. The case "Just Eat Takeaway" presents a clear outlier as it exhibits the highest share of the "only employed" category. ...
Preprint
Full-text available
This article challenges the idea of platform capitalism that digital platforms implement a uniform model based on a self-employed labor force. Expanding on empirical evidence of a diversity of platform models, we theorize expectations about platform diversity from competition and comparative capitalism research. Using a unique cross-national dataset of leading food delivery platforms in 32 countries across North America and Europe, we compare platform models and competitive relations across national institutional regimes. Our analyses uncover a considerable diversity of platform models across Europe, in contrast to a clear uniformity in North America. We also find that the use of self-employment varies across and within large multinational corporations and is most prevalent in countries of the lightly regulated regime type. Our results call for an economic sociology perspective on the platform economy that integrates a general concept of platforms but allows for diversity stemming from competition and different national regimes.
... Before we approach the extension of community detection to the case of intervalweighted networks (IWN) in Sect. 4, it is important to note that our entire approach is based on the concept that the definition of modularity for a sum over vertices' pairs as (Newman 2004b;Clauset et al. 2004;Arenas et al. 2007), ...
... where o rs and and e rs are respectively the observed and expected weights of edges connecting vertices in community r to vertices in community s. This reduced formulation of modularity gain (9) is computationally more efficient than its initial expression (8), because it acts locally and not globally (Clauset et al. 2004). ...
Article
Full-text available
In this paper we introduce and develop the concept of interval-weighted networks (IWN), a novel approach in Social Network Analysis, where the edge weights are represented by closed intervals composed with precise information, comprehending intrinsic variability. We extend IWN for both Newman’s modularity and modularity gain and the Louvain algorithm, considering a tabular representation of networks by contingency tables. We apply our methodology to two real-world IWN. The first is a commuter network in mainland Portugal, between the twenty three NUTS 3 Regions (IWCN). The second focuses on annual merchandise trade between 28 European countries, from 2003 to 2015 (IWTN). The optimal partition of geographic locations (regions or countries) is developed and compared using two new different approaches, designated as “Classic Louvain” and “Hybrid Louvain” , which allow taking into account the variability observed in the original network, thereby minimizing the loss of information present in the raw data. Our findings suggest the division of the twenty three Portuguese regions in three main communities for the IWCN and between two to three country communities for the IWTN. However, we find different geographical partitions according to the community detection methodology used. This analysis can be useful in many real-world applications, since it takes into account that the weights may vary within the ranges, rather than being constant.
... Cluster analysis was used to examine the network characteristics (i.e. often referred to as the structural signature) in relating RQ1 and RQ2 by grouping the vertices using the Clauset-Newman-Moore (CNM) algorithm set in NodeXL (Clauset et al., 2004). The CNM is a heuristic method suitable for quickly finding communities for large-scale networks (Clauset et al., 2004). ...
... often referred to as the structural signature) in relating RQ1 and RQ2 by grouping the vertices using the Clauset-Newman-Moore (CNM) algorithm set in NodeXL (Clauset et al., 2004). The CNM is a heuristic method suitable for quickly finding communities for large-scale networks (Clauset et al., 2004). The CNM assumes all vertices as separate clusters, then compares all vertices pair by pair to see which two vertices can be merged as a cluster (Yum, 2020). ...
Article
Purpose The study aims to explore the digital fashion trend within the Metaverse, characterized by non-fungible tokens (NFTs), across Twitter networks. Integrating theories of diffusion of innovation, two-step flow of communication and self-efficacy, the authors aimed to uncover the diffusion structure and the influencer's social roles undertaken by social entities in fostering communication and collaboration for the advancement of Metaverse fashion. Design/methodology/approach Social network analysis examined the critical graph metrics to profile, visualize, and cluster the unstructured network data. The authors used the NodeXL program to analyze two hashtag keyword networks, “#metaverse fashion” and “#metawear,” using Twitter API data. Cluster, semantic, and time series analyses were performed to visualize the contents and contexts of communication and collaboration in the diffusion of Metaverse fashion. Findings The results unraveled the “broadcast network” structure and the influencers' social roles of opinion leaders and market mavens within Twitter's “#metaverse fashion” diffusion. The roles of innovators and early adopters among influencers were comparable in collaborating within the competition venues, promoting awareness and participation in digital fashion diffusion during specific “fad” periods, particularly when digital fashion NFTs and cryptocurrencies became intertwined with the competition in the Metaverse. Originality/value The study contributed to theory building by integrating three theories, emphasizing effective communication and collaboration among influencers, organizations, and competition venues in broadcasting digital fashion within shared networks. The validation of multi-faceted Social Network Analysis was crucial for timely insights, highlighting the critical digital fashion equity in capturing consumers' attention and driving engagement and ownership of Metaverse fashion.
... Next, gene-set overlap is calculated between each pair of terms, using either the overlap coefficient (OC) or the Jaccard index (JI), (for two sets X and Y, = ∩ /min ( , | |), and = ∩ /| ∪ |), and a network of terms is built with an edge between any pair of terms exceeding a userdefined overlap threshold (Merico, et al., 2010). Within this network, communities of related terms are identified using greedy modularity maximisation (Clauset, et al., 2004), attenuated by an adaptive algorithm that limits the maximum community size (see Supplementary Figure 2 and Supplementary Boxes 1 to 4 for details). Finally, communities are grouped into larger meta communities when weaker, residual gene-set overlap remains between terms, or when strong gene-set overlap exists between terms from different databases, but multi-database agglomeration is off (see Supplementary Box 1 for details). ...
... [6] GeneFEAST uses the Python package NetworkX (Aric A. Hagberg, 2008) implementation of greedy modularity maximisation (Clauset, et al., 2004). ...
Preprint
Full-text available
GeneFEAST, implemented in Python, is a gene-centric functional enrichment analysis summarisation and visualisation tool that can be applied to large functional enrichment analysis (FEA) results arising from upstream FEA pipelines. It produces a systematic, navigable HTML report, making it easy to identify sets of genes putatively driving multiple enrichments and to explore gene-level quantitative data first used to identify input genes. Further, GeneFEAST can compare FEA results from multiple studies, making it possible, for example, to highlight patterns of gene expression amongst genes commonly differentially expressed in two sets of conditions, and giving rise to shared enrichments under those conditions. GeneFEAST offers a novel, effective way to address the complexities of linking up many overlapping FEA results to their underlying genes and data, advancing gene-centric hypotheses, and providing pivotal information for downstream validation experiments. Availability: GeneFEAST is available at https://github.com/avigailtaylor/GeneFEAST Contact: avigail.taylor@well.ox.ac.uk
... A module refers to a group of nodes that exhibit strong connections among themselves, but relatively weaker connections with nodes outside the module. On the other hand, modularity is a measure that quantifies the extent to which a network is divided into distinct modules (Clauset et al. 2004). Edge density (ED) is employed to characterize the density of connected edges between nodes within the network. ...
Article
Full-text available
In shallow eutrophic lakes, submersed macrophytes are significantly influenced by two main factors: light availability and benthic fish disturbance. Plant foraging is one of the most crucial aspects of plant behavior. The present study was carried out to effects of light regimes and fish disturbance on the foraging behavior of Vallisneria natans in heterogeneous sediments. V. natans was cultivated in heterogeneous sediments with four treatments: high-light regime (H), high-light regime with benthic fish (HF), low-light regime (L), and low-light regime with benthic fish (LF). We use plant trait network analysis to evaluate the relationships between traits in heterogeneous sediments. We found the plant foraging intensity was positively correlated with trait network modularity. The biomass of stem, maternal plant biomass ratio, and ramet number were the hub traits of plant growing in heterogeneous habitats. Although the plant relative growth rate (RGR) was positively correlated with foraging intensity, the hub traits had closer links with plant RGR than foraging intensity. Light regime and benthic fish indirectly affected the plant foraging intensity by changing the chlorophyll a content and pH of overlying water. Overall, our analysis provides valuable insights into plant foraging behavior in response to environmental changes.
... O algoritmo Fast-Greedy particiona o grafo a partir de análise topológica de modularidade ( ), isto é, por meio de uma medida de quantificação da conectividade do grafo da rede (Clauset et al., 2004;Newman e Girvan, 2004). é definida pela equação (Newman, 2004;Reichardt e Bornholdt, 2004): ...
Conference Paper
Full-text available
: A partir da setorização de redes de distribuição de água (RDA) e controle da vazão nas tubulações é possível regular as pressões, melhorar a qualidade de água, reduzir os vazamentos e obter uma distribuição mais uniforme das demandas. Geralmente, a setorização é feita em duas etapas, que são caracterizadas, respectivamente, pelo particionamento topológico do sistema e alocação de dispositivos de controle (válvulas de controle e de isolamento), as quais garantam fisicamente tal partição. Não obstante, trabalhos da literatura mostram que a resiliência do sistema de abastecimento pode ser reduzida e a qualidade de água piorar em setores fixos, devido ao aumento dos caminhos do fluido nas tubulações que, por estarem particionadas, são maiores desde às fontes até aos consumidores finais. Entretanto, uma possível forma de solucionar este aumento no tempo de residência da água é setorizar dinamicamente as RDA e, nesse sentido, em uma primeira etapa, desenvolver partições dinâmicas, isto é, variar as configurações de partições e posicionamento de tubos de fronteira, a fim de, posteriormente, formar setores de maior conectividade. Portanto, à vista deste contexto, nesta pesquisa foram desenvolvidas partições dinâmicas a partir da implementação, via linguagem de programação Python, de dois algoritmos de particionamento consagrados em literatura, Fast-Greedy e Louvain. Ambos os algoritmos foram comparados e cada um apresentou vantagens e desvantagens quanto à sua implementação e ao processo de particionamento.
... The results are shown in Table 6. Another experiment we perform to recognise equivalent questions is applying a "graph community detection" method (Clauset et al., 2004;Hagberg et al., 2008) in order to group similar questions. A graph is built with question nodes to cluster the questions using the cosine similarity matrix. ...
Article
Full-text available
Questions have a critical role in learning and teaching. People ask questions to obtain information and express interest in ideas. The Bristol scientific centre “We The Curious” launched “Project What If” in 2017 to inspire residents of Bristol to record their questions and pursue their curiosities. Researching these questions may help the museum better understand the curiosity of its audiences and create exhibitions or educational content that are more relevant to their interests and lives. The project managed to collect more than 10,000 questions on various topics, and more questions are being collected on a daily basis. With this large amount of data collected, it is time-consuming to process and analyse all the questions by humans. This research aims to apply artificial intelligence (AI) techniques and models in analysing these questions gathered by We The Curious. Meanwhile, in AI, there is a lack of tools that focus on processing and analysing the questions. Thus, we introduce a deep neural network called QBERT to process the questions for three tasks: question taxonomy, equivalent question detection, and question answering. Then we apply QBERT to provide an analysis of the questions collected by We The Curious, as well as comprehend Bristolians’ curiosity. Then using QBERT, we categorise the We The Curious questions into 90 themes and 5,930 communities. Moreover, 436 questions are answered by one-sentence answers extracted from Wikipedia.
... This methodological perspective was used to study edges and vertices in depth compared to global network properties which show an insight of the overall network structure. The community detection determines groups (also modules, clusters or communities) of nodes or sub-graphs of actors and motivations that are densely interconnected to one another and poorly connected to other parts of the network (Newman & Girvan, 2004;Clauset et al., 2004;Fortunato, 2010;Dormann, 2022). This strategy is useful to detect groups of nodes that share common properties and/or play similar roles within the network structure. ...
Article
Full-text available
The paper examines domestic ’femicide’ in Italy. Under an exploratory statistical approach, we investigated: (1) difficulties and strategies for reconstructing a historical dataset on family crimes for studies over time; (2) the main causes of family femicides; and (3) groups of actors driven by the same motivations interpreted as patterns of criminal behavior. First, we integrated and systematised data from official sources to guarantee comparison over time; second, we used Social Network Analysis properties to study the relationships between ’motivations’ and ’victim-perpetrator’; and third, we applied and compared community detection algorithms to the linkages between ’actors’ and ’motivations’ to detect groups of criminal behavior. From 2015 to 2020 in Italy, the cohabitant was the major family murderer, but in 2020, passion motivation also surfaced. Mental problems connected to parents-children and cohabitants, jealousy of ex-partners or rivals, and economic issues for blood relations were observed in 2015. Psychopathologies and money characterised parents-children in 2020, while passion and disagreements caused cohabitants or ex-partners.
... Several algorithms classified as unipartite graphs are the Louvain algorithm [24], Infomap [25], LPA [26], and Walktrap [27], and Spinglass [28]. The greedy algorithm proposed by Clauset and Newman et al. [29] is based on modularity optimization to find the best partition in the network. Each node is considered as its community using hierarchical clustering. ...
... In particular, we propose two ways to perform such a task. Preliminarily, we need an algorithm A CD for community detection on graphs with weighted edges, such as Louvain [44], FastGreedy [45], Label Propagation Algorithm [46], and/or another among those proposed in the past literature [47]. The application of A CD to G and G e ðtÞ returns two sets of communities CS and CS e ðtÞ. ...
Article
Full-text available
In this paper, we propose a framework that uses the theory and techniques of (Social) Network Analysis to investigate the learned representations of a Graph Neural Network (GNN, for short). Our framework receives a graph as input and passes it to the GNN to be investigated, which returns suitable node embeddings. These are used to derive insights on the behavior of the GNN through the application of (Social) Network Analysis theory and techniques. The insights thus obtained are employed to define a new training loss function, which takes into account the differences between the graph received as input by the GNN and the one reconstructed from the node embeddings returned by it. This measure is finally used to improve the performance of the GNN. In addition to describe the framework in detail and compare it with related literature, we present an extensive experimental campaign that we conducted to validate the quality of the results obtained.
... Several approaches are provided in the literature to identify community structures in networks (Fortunato 2010). In this work, we consider the Fast Modularity algorithm proposed by Clauset et al. (2004) to identify community structures in a set of infrastructure networks that are physically interdependent. ...
... 1) FN: The most significant category of community identification algorithms is modularity optimization techniques, which search for an optimal community structure by maximizing the quantity of modularity measures. FN [67] is an important method in this area that maximizes modularity by utilizing global information and a maxheap structure. 2) LPA: LPA is a node-centric community algorithm that operates by assigning a label or community members to each node, and then iteratively updating the labels of the nodes based on the labels of their neighbors. ...
Article
Network clustering is one of the fundamental unsupervised methods of knowledge discovery. Its goal is to group similar nodes together without supervision or prior knowledge of the nature of the clusters. Among various clustering methods, semi-supervised clustering detection is one of the most promising approaches for community detection because of its ability to employ side information to better understand network topology. However, most of the previous work faces two problems: the use of linear methods to reduce dimensionality and the random selection of side information, and as a result of these two drawbacks, semi-supervised community detection methods are less efficient. To fill these gaps, we developed an end-to-end deep semi-supervisor community detection (DSSC) for complex networks. A new learning objective is designed that uses a semi-autoencoder (SeAE) with a defined pair-wise constraint matrix based on point-wise mutual information (PMI) in the representation layer to accurately learn distinctive features and, in the clustering layer, adds a pair-wise constraint as a term to minimize distance within the cluster while the distance between clusters increases. The results show that our method performs unexpectedly well in comparison to the existing state-of-the-art community detection methods in complex networks.
... With N the total number of regulations in the graph (V, E), the modularity, under partition D , is calculated as, It measures the goodness of partition D in defining subnetworks of our constructed GRN by quantifying the within-subnetwork regulations. SIGNET maximizes this modularity to obtain the optimal partition by using the fast greedy modularity optimization algorithm 12 , which is implemented in the R packages igraph 52 . ...
Article
Full-text available
Gene regulation plays an important role in understanding the mechanisms of human biology and diseases. However, inferring causal relationships between all genes is challenging due to the large number of genes in the transcriptome. Here, we present SIGNET (Statistical Inference on Gene Regulatory Networks), a flexible software package that reveals networks of causal regulation between genes built upon large-scale transcriptomic and genotypic data at the population level. Like Mendelian randomization, SIGNET uses genotypic variants as natural instrumental variables to establish such causal relationships but constructs a transcriptome-wide gene regulatory network with high confidence. SIGNET makes such a computationally heavy task feasible by deploying a well-designed statistical algorithm over a parallel computing environment. It also provides a user-friendly interface allowing for parameter tuning, efficient parallel computing scheduling, interactive network visualization, and confirmatory results retrieval. The Open source SIGNET software is freely available (https://www.zstats.org/signet/).
... Q is a metric that characterizes the degree of connection within a community, i.e., the strength of connectivity among nodes in the community. Based on a large body of experimental evidence, it has been determined that Q > 0.3 is indicative of a strong community structure 45,46 . NMI is a metric used to evaluate the similarity of the calculated clustering solution to the actual community structure, as it measures the clustering similarity of two clustering solutions. ...
Article
Full-text available
Community partitioning is an effective technique for cyberspace mapping. However, existing community partitioning algorithm only uses the topological structure of the network to divide the community and disregards factors such as real hierarchy, overlap, and directionality of information transmission between communities in cyberspace. Consequently, the traditional community division algorithm is not suitable for dividing cyberspace resources effectively. Based on cyberspace community structure characteristics, this study introduces an algorithm that combines an improved local fitness maximization (LFM) algorithm with the PageRank (PR) algorithm for community partitioning on cyberspace resources, called PR-LFM. First, seed nodes are determined using degree centrality, followed by local community expansion. Nodes belonging to multiple communities undergo further partitioning so that they are retained in the community where they are most important, thus preserving the community’s original structure. The experimental data demonstrate good results in the resource division of cyberspace.
... The input parameters include the training graph G T , community detection method CDMethod and the clustering number k. Two community detection methods, i.e., Spectral Clustering [45] and Greedy Modularity [46], are employed in this work. At line 1 of the Algorithm 2, community information of G T is obtained. ...
Article
Full-text available
Due to the evolving nature of complex networks, link prediction plays a crucial role in exploring likelihood of potential relationships among nodes. There exist a great number of strategies to apply the similarity-based metrics for estimating proximity of nodes in complex networks. In this paper, we propose three new variants – CCPAL3, LPCPA, and GPCPA – for the well-known Common Neighbor and Centrality-based Parameterized Algorithm (CCPA) taking into account 3-hop path, quasi-local path, and global path, respectively. In addition, four novel link prediction strategies based on community detection information, CCPA_CD, CCPAL3_CD, LPCPA_CD and GPCPA_CD, are proposed. Meanwhile, the Jaccard index is extended to three new metrics, i.e., Jaccard_L3, Jaccard_QuasiLoc and Jaccard_Global. Extensive experiments are conducted on thirteen real-world networks. The experimental results indicate that the proposed metrics improve the prediction accuracy measured by AUC and are more competitive on Precision compared to the state-of-the-art link prediction methods.
... This approach acknowledges that the transmission of COVID-19 is not confined to neighboring ZIP codes and the patterns of connectivity between geographically distant communities may be persistent. We detected communities using a greedy search algorithm, which was chosen to optimize the modularity score [30,31]. Modularity measures the strength of dividing a network into clusters: a network with high modularity has dense connections among the nodes within clusters but sparse connections between nodes in different clusters. ...
Article
Full-text available
Background Understanding community transmission of SARS-CoV-2 variants of concern (VOCs) is critical for disease control in the post pandemic era. The Delta variant (B.1.617.2) emerged in late 2020 and became the dominant VOC globally in the summer of 2021. While the epidemiological features of the Delta variant have been extensively studied, how those characteristics shaped community transmission in urban settings remains poorly understood. Methods Using high-resolution contact tracing data and testing records, we analyze the transmission of SARS-CoV-2 during the Delta wave within New York City (NYC) from May 2021 to October 2021. We reconstruct transmission networks at the individual level and across 177 ZIP code areas, examine network structure and spatial spread patterns, and use statistical analysis to estimate the effects of factors associated with COVID-19 spread. Results We find considerable individual variations in reported contacts and secondary infections, consistent with the pre-Delta period. Compared with earlier waves, Delta-period has more frequent long-range transmission events across ZIP codes. Using socioeconomic, mobility and COVID-19 surveillance data at the ZIP code level, we find that a larger number of cumulative cases in a ZIP code area is associated with reduced within- and cross-ZIP code transmission and the number of visitors to each ZIP code is positively associated with the number of non-household infections identified through contact tracing and testing. Conclusions The Delta variant produced greater long-range spatial transmission across NYC ZIP code areas, likely caused by its increased transmissibility and elevated human mobility during the study period. Our findings highlight the potential role of population immunity in reducing transmission of VOCs. Quantifying variability of immunity is critical for identifying subpopulations susceptible to future VOCs. In addition, non-pharmaceutical interventions limiting human mobility likely reduced SARS-CoV-2 spread over successive pandemic waves and should be encouraged for reducing transmission of future VOCs.
... Given that SPIEC-EASI already included node selection strategies, further node filtering was unnecessary. We used a hierarchical agglomeration algorithm [29] to determine clusters of nodes that are highly connected but have a small number of connections to the nodes outside their module. ...
Article
Full-text available
Background The early life stage is critical for the gut microbiota establishment and development. We aimed to investigate the lifelong impact of famine exposure during early life on the adult gut microbial ecosystem and examine the association of famine-induced disturbance in gut microbiota with type 2 diabetes. Methods We profiled the gut microbial composition among 11,513 adults (18–97 years) from three independent cohorts and examined the association of famine exposure during early life with alterations of adult gut microbial diversity and composition. We performed co-abundance network analyses to identify keystone taxa in the three cohorts and constructed an index with the shared keystone taxa across the three cohorts. Among each cohort, we used linear regression to examine the association of famine exposure during early life with the keystone taxa index and assessed the correlation between the keystone taxa index and type 2 diabetes using logistic regression adjusted for potential confounders. We combined the effect estimates from the three cohorts using random-effects meta-analysis. Results Compared with the no-exposed control group (born during 1962–1964), participants who were exposed to the famine during the first 1000 days of life (born in 1959) had consistently lower gut microbial alpha diversity and alterations in the gut microbial community during adulthood across the three cohorts. Compared with the no-exposed control group, participants who were exposed to famine during the first 1000 days of life were associated with consistently lower levels of keystone taxa index in the three cohorts (pooled beta − 0.29, 95% CI − 0.43, − 0.15). Per 1-standard deviation increment in the keystone taxa index was associated with a 13% lower risk of type 2 diabetes (pooled odds ratio 0.87, 95% CI 0.80, 0.93), with consistent results across three individual cohorts. Conclusions These findings reveal a potential role of the gut microbiota in the developmental origins of health and disease (DOHaD) hypothesis, deepening our understanding about the etiology of type 2 diabetes.
... In a complex network, a community refers to a network cluster with closely connected internal nodes and a specific organizational relationship; these network clusters embody the functional attributes of the system, and their essential characteristics are "high cohesion" and "low coupling" [38]. Community is sometimes referred to as "clustering" in sociology and computer science [39]. In urban geography, scholars use community to describe tightly connected urban groups [34,40]. ...
Article
Full-text available
The United Nations Sustainable Development Goals (SDGs) and the rise of global sustainability science have led to the increasing recognition of basins as the key natural geographical units for human–land system coupling and spatial coordinated development. The effective measurement of spatiotemporal patterns of urban connectivity within a basin has become a key issue in achieving basin-related SDGs. Meanwhile, China has been actively working toward co-ordinated regional development through in-depth implementation of the Yellow River Basin’s ecological protection and high-quality development. Urban connectivity has been trending in urban planning, and significant progress has been made on different scales according to the flow space theory. Nevertheless, few studies have been conducted on the multiscale spatiotemporal patterns of urban agglomeration connectivity. In this study, the urban network in the Yellow River Basin was constructed using Tencent population migration data from 2015 and 2019. It was then divided into seven distinct communities to enable analysis at both the basin and community scales. Centrality, symmetry, and polycentricity indices were employed, and the multiscale spatiotemporal patterns of urban agglomerations in the Yellow River Basin were identified using community detection, complex networks, and the migration kaleidoscope method. Community connectivity was notably concentrated at the basin scale with a centripetal pattern and spatial heterogeneity. Additionally, there was a symmetrical and co-ordinated relationship in population migration between the eastern and western regions of the basin, as well as between the internal and external parts of the basin. At the community scale, there was significant variation in the extent of central agglomeration among different communities, with few instances of similar-level, long-distance, and interregional bilateral links. The utilization of multiscale spatiotemporal patterns has the potential to enhance the comprehension of economic cooperation between various cities and urban agglomerations. This understanding can aid decision-makers in formulating sustainable development policies that foster the spatial integration of the basin.
... A detecção de comunidades consiste em identificar grupos da rede com base apenas nas informações presentes na topologia do grafo. Entre os principais métodos de detecção de comunidades, destacam-se o Fastgreedy [Clauset et al. 2004] e Louvain [Blondel et al. 2008], que utilizam a modularidade como medida para avaliar a qualidade de uma determinada divisão da rede. ...
Conference Paper
O agrupamento em fluxo de dados é uma tarefa de aprendizado de máquina crucial para vários sistemas que geram dados de maneira contínua e carecem de analisá-los ininterruptamente. Através de recursos oferecidos pela plataforma MOA (Massive Online Analysis), a proposta desta pesquisa consiste em aplicar modelos de aprendizado baseados em redes complexas na fase offline do CluStream, na qual micro-grupos são agrupados através do algoritmo kMeans. Para os experimentos desse projeto, foram consideradas três bases de dados e várias medidas de desempenho específicas ao problema. Os resultados mostraram que o uso de redes complexas apresenta desempenho competitivo, chegando a superar o método tradicional k-Means em diversos cenários.
... Graph clustering techniques use three clustering approach: Connected Component (Clauset et al., 2004;Newman, 2004;Park & Glass, 2007), and Blondel et al. (2008) for the grouping task. The proposed approach was compared with the Graph clustering approach in terms of NED, Coverage, Boundary, Type and Token performance metrics. ...
Article
Full-text available
An unsupervised spoken term discovery task aims to capture the pattern similarities among spoken terms in the absence of annotation. Such an approach is useful for the untranscribed spoken content from low-resource or zero-resource languages. A challenge in the discovery task is to compute the similarities among spoken terms without annotation. Dynamic time warping (DTW) is one of the techniques that computes temporal alignment between two acoustic feature representations of the speech signal without annotation. However, the speech variabilities that arise in natural speech introduce a challenge to the DTW approach. As a result, the performance of the spoken term discovery task was degraded. In this study, we overcome the challenges and improve the performance of the discovery task in three stages. At first, the speaker-independent acoustic feature representation was obtained from the Self Organising Map (SOM) to reduce the variabilities. In the second stage, non-segmental pseudo-labels were generated for the spoken content using context-free grammar. Finally, the spoken term similarities were obtained by grouping the similar sequences using proposed Label Sequence Similarity Mapping and Language modelling algorithms. The performance of the proposed system was measured using the Zero-Speech challenge corpus in terms of matching, clustering and parsing qualities. The experimental results reveal that the proposed approach improves the performance by 34.2% and 22.4% in English and Xitsonga, respectively, across multiple speakers. In addition, the clustering performance of the spoken terms at the word level was improved by 4.2% in English.
... To explore if network communities existed among individual fish and regions (Griffin et al. 2018), we applied six community detection algorithms that generate groups of nodes, known as modules (see Finn et al. 2014). The six algorithms included: 'Leading-Eigenvector' (Newman 2006a, b), 'Walk-Trap' (Pons and Latapy 2005), 'Fast-Greedy' (Clauset et al. 2004;Newman and Girvan 2004;Reichardt and Bornholdt 2006), 'Spin-Glass' (Reichardt and Bornholdt 2006), 'Label-Propagation' (Raghavan et al. 2007;Blondel et al. 2008), and 'Multilevel' (Blondel et al. 2008). ...
Article
Full-text available
Individual fish movement patterns and behaviors influence population-level traits, and are important for understanding their ecology and evolution. Understanding these behaviors is key for managing and conserving migratory animal populations, including Atlantic tarpon (Megalops atlanticus), that support an economically important recreational fishery. Using acoustic telemetry, we tracked individual movement patterns of M. atlanticus inhabiting the eastern Gulf of Mexico and the southeast coast of the US over successive years. Net-squared displacement models revealed considerable individual-level variation in movement patterns with high individual-level repeatability in the timing of migrations and migratory pathways. Although distinct migratory subgroups existed, M. atlanticus generally migrate northward in the spring and summer to putative foraging grounds and remain in these areas for, on average, four months and then migrate southward in the fall. Subadult M. atlanticus exhibited similar migratory patterns as adults, while large juveniles exhibited either resident or nomadic behaviors. For migratory individuals, fish size did not influence movement patterns. Given that distinct migratory subgroups seasonally mixed in southern Florida for spawning activity, our study indicates that M. atlanticus along the eastern Gulf of Mexico and southeastern coast of the US should be considered a single interconnected stock. With that in mind, using M. atlanticus angler and guide knowledge, we assessed the vulnerability of M. atlanticus to potential threats across their range and along migratory pathways. Collectively, the far-ranging nature of M. atlanticus and their diversity in movement patterns highlights the need for more uniform and cohesive management and conservation efforts.
... Modularity is a measure for finding communities, which looks at the links within and between a community and the other parts of a network (Clauset et al.;. It tries to maximize the value of modularity shown in equation 10 by finding the links most probably in the same community in comparison to a random network. ...
Preprint
Full-text available
This paper presents a novel method for analyzing the spatial diversity of innovation and technology developments within emerging socio-technical systems (STS). Socio-technical systems as complex and interconnected systems of actors, networks and technologies are characterized by their dynamic nature and the co-evolution of social and technical transformations. The proposed approach combines elements from the Technological Innovation Systems framework, institutional theory, and complex network theory. The paper employs an agent-based model to illustrate the applicability of the method in understanding the evolution of innovation systems across different geographical scales. The findings underscore the significance of comprehending the complexity of spatial diversity within collaboration networks. It is demonstrated that beyond examining the national and international dimensions of socio-technical transitions, understanding how national policies and institutions influence the distribution of innovative activities within diverse multinational projects is essential for effective policymaking and institutional design.
... BioNAR supports a non-exhaustive set of commonly used clustering algorithms. These are modularity-maximization-based algorithms, including the popular agglomerative 'Fast-Greedy Community' algorithm (fc) (Clauset et al. 2004), process driven agglomerative random walk algorithm 'Walktrap' (wt) (Pons and Latapy 2006), and coupled Potts/Simulated Annealing algorithm 'SpinGlass' (sg) (Reichardt andBornholdt 2006, Traag andBruggeman 2009), the divisive spectral-based 'Leading-Eigenvector' (lec) (Newman 2006) and fine-tuning (Spectral) (McLean et al. 2016) algorithms, and the hierarchical agglomerative 'Louvain' algorithm (louvain) (Blondel et al. 2008). We also include a non-modularity information-theory-based algorithm 'InfoMAP' (infomap) Bergstrom 2008, Rosvall et al. 2009). ...
Article
Full-text available
Motivation Biological function in protein complexes emerges from more than just the sum of their parts: molecules interact in a range of different sub-complexes and transfer signals/information around internal pathways. Modern proteomic techniques are excellent at producing a parts-list for such complexes, but more detailed analysis demands a network approach linking the molecules together and analysing the emergent architectural properties. Methods developed for the analysis of networks in social sciences have proven very useful for splitting biological networks into communities leading to the discovery of sub-complexes enriched with molecules associated with specific diseases or molecular functions that are not apparent from the constituent components alone. Results Here, we present the Bioconductor package BioNAR, which supports step-by-step analysis of biological/biomedical networks with the aim of quantifying and ranking each of the network’s vertices based on network topology and clustering. Examples demonstrate that while BioNAR is not restricted to proteomic networks, it can predict a protein’s impact within multiple complexes, and enables estimation of the co-occurrence of metadata, i.e. diseases and functions across the network, identifying the clusters whose components are likely to share common function and mechanisms. Availability and implementation The package is available from Bioconductor release 3.17: https://bioconductor.org/packages/release/bioc/html/BioNAR.html.
... Clauset-Newman-Moore Algorithm: This algorithm operates by incrementally merging nodes into communities to optimize modularity, a measure of community quality 40 . It starts with each node as an individual community and iteratively combines them to form larger communities, effectively building the hierarchy of communities within a network. ...
Preprint
This study presents a framework for detecting and mitigating fake and potentially attacking user communities within 5G social networks. This framework utilizes geo-location information, community trust within the network, and AI community detection algorithms to identify users that can cause harm. The framework incorporates an artificial control model to select appropriate community detection algorithms and employs a trust-based strategy to identify and filter out potential attackers. It adapts its approach by utilizing user and attack requirement data through the artificial conscience control model while considering the dynamics of community trust within the network. What sets this framework apart from other fake user detection mechanisms is its capacity to consider attributes challenging for malicious users to mimic. These attributes include the trust established within the community over time, the geographical location, and the framework’s adaptability to different attack scenarios. To validate its efficacy, we apply the framework to synthetic social network data, demonstrating its ability to distinguish potential malicious users from trustworthy ones.
... 2) Baselines: We compared RaftGP with five baselines, including MC-SBM [1], greedy modularity maximization (GMod) [26], spectral modularity maximization (SMod) [16], Par-SBM [27], and BP-Mod [28]. As stated in Definition 1, we consider the GP task where the number of blocks K is unknown and determined by the method to be evaluated. ...
Conference Paper
Full-text available
Graph partitioning (GP), a.k.a. community detection , is a classic problem that divides the node set of a graph into densely-connected blocks. Following prior work on the IEEE HPEC Graph Challenge benchmark and recent advances in graph machine learning, we propose a novel RAndom FasT Graph Partitioning (RaftGP) method based on an efficient graph embedding scheme. It uses the Gaussian random projection to extract community-preserving features from classic GP objectives. These features are fed into a graph neural network (GNN) to derive low-dimensional node embeddings. Surprisingly, our experiments demonstrate that a randomly initialized GNN even without training is enough for RaftGP to derive informative community-preserving embeddings and support high-quality GP. To enable the derived embeddings to tackle GP, we introduce a hierarchical model selection algorithm that simultaneously determines the number of blocks and the corresponding GP result. We evaluate RaftGP on the Graph Challenge benchmark and compare the performance with five baselines, where our method can achieve a better trade-off between quality and efficiency. In particular, compared to the baseline algorithm [1] of the IEEE HPEC Graph Challenge, our method is 6.68x-23.9x faster on graphs with 1E3-5E4 nodes and at least 64.5x faster on larger (1E5 node) graphs on which the baseline takes more than 1E4 seconds. Our method achieves better accuracy on all test cases. We also develop a new graph generator to address some limitations of the original generator in the benchmark.
... Further electrical distance with K-means clustering algorithm is used for partitioning the power system with consideration of operating constraints such as energy policies and ownership [14]. With utilization and advancement in complex network theory gearing up for analyzing the power grids, hierarchical agglomeration algorithm is used for optimal partitioning of power system [15]. For community detection various techniques were utilized such as divisive method [16], edge anti triangle centrality [17]. ...
... 7. Hub-neighbors: The individuals selected are those adjacent to hubs, but not themselves hubs. 8. Mod-community: Supporters are members of communities determined using the Clauset-Newman-Moore greedy modularity maximization algorithm (Clauset et al. 2004), designed to find communities in scale-free networks, among others. Communities are selected in the order of their size until the desired number of supporters is reached. ...
Article
Full-text available
Incorporating social factors into disease prevention and control efforts is an important undertaking of behavioral epidemiology. The interplay between disease transmission and human health behaviors, such as vaccine uptake, results in complex dynamics of biological and social contagions. Maximizing intervention adoptions via network-based targeting algorithms by harnessing the power of social contagion for behavior and attitude changes largely remains a challenge. Here we address this issue by considering a multiplex network setting. Individuals are situated on two layers of networks: the disease transmission network layer and the peer influence network layer. The disease spreads through direct close contacts while vaccine views and uptake behaviors spread interpersonally within a potentially virtual network. The results of our comprehensive simulations show that network-based targeting with pro-vaccine supporters as initial seeds significantly influences vaccine adoption rates and reduces the extent of an epidemic outbreak. Network targeting interventions are much more effective by selecting individuals with a central position in the opinion network as compared to those grouped in a community or connected professionally. Our findings provide insight into network-based interventions to increase vaccine confidence and demand during an ongoing epidemic.
... Although no formal definition of cluster is universally accepted [16], a largely adopted measure to quantify the quality of clusters is the minimization of modularity function [17] where nodes in the single clusters are more likely to be connected than expected in a random network null model [18]. Since the global optimization of the modularity is a well known NP-hard problem [19], different local heuristics are used [20], [21], [22], [23], [24], [25], [26]. When the clustering is hierarchical, the structure imposed by the hierarchy can be exploited both for the visualization of the clusters and for supporting navigation operations through the clusters [1]. ...
Article
Full-text available
Nowadays there is a great interest in the visualization of graphs for making easier their navigation, inspection and visual analysis. However, graphs can be quite large and their rendering on web browsers can lead to a dark cloud of points that is difficult to visually explore. Even if several approaches have been proposed for substituting clusters of related vertices with aggregated meta-nodes for reducing the size of the visualized graph, they usually consider the graph in main memory and do not adopt efficient data structures for extracting parts of it from disk. In this paper, we propose indexing structures and algorithms for the representation of the graph at different resolutions and for making faster the passage from a representation to another. An extensive experimental analysis has been conducted to assess the quality of the proposed solution.
... Modularity Measures the presence of strong connectivity within a subsystem and weaker connection between subsystems in a network (39,40). It ranges from zero to one. ...
Article
Full-text available
The rapid development of seafood trade networks alongside the decline in biomass of many marine populations raises important questions about the role of global trade in fisheries sustainability. Mounting empirical and theoretical evidence shows the importance of trade development on commercially exploited species. However, there is limited understanding of how the development of trade networks, such as differences in connectivity and duration, affect fisheries sustainability. In a global analysis of over 400,000 bilateral trade flows and stock status estimates for 876 exploited fish and marine invertebrates from 223 territories, we reveal patterns between seafood trade network indicators and fisheries sustainability using a dynamic panel regression analysis. We found that fragmented networks with strong connectivity within a group of countries and weaker links between those groups (modularity) are associated with higher relative biomass. From 1995 to 2015, modularity fluctuated, and the number of trade connections (degree) increased. Unlike previous studies, we found no relationship between the number or duration of trade connections and fisheries sustainability. Our results highlight the need to jointly investigate fisheries and trade. Improved coordination and partnerships between fisheries authorities and trade organizations present opportunities to foster more sustainable fisheries.
... This method allows for the analysis of very large networks, efficiently finding community structures [15]. ...
Preprint
Full-text available
The advent of social media has catalyzed a paradigm shift in the way information is disseminated and consumed, giving rise to novel phenomena such as viral trends and information diffusion. This review article provides an in-depth scholarly examination of network science as applied to social media analysis, focusing on the mathematical formulations, algorithmic techniques, and interdisciplinary methodolo-gies that underpin the field. By exploring graph theory, community detection, scale-free networks, centrality measures, machine learning applications, and cultural influences, this study offers a comprehensive and nuanced understanding of network structures and dynamics. As a top-tier contribution to the field of computer science, this review serves as a nexus for the interdisciplinary study of network science, providing valuable insights and directions for future research in the analysis of information diffusion and viral trends within social media platforms.
Chapter
The detection of communities is an essential task in social network analysis. Many distinct community detection algorithms can be employed for detecting these communities in a network. But the selection of a good community detection algorithm is challenging. No individual community detection technique available provides the best result every time for any social network. The objective of this paper is to access the effectiveness of different community detection algorithms based on quality measures. We use both real and synthetic networks for this compression and evaluation. The synthetic network is generated by the LFR algorithm, whereas a well-known network of Last FM users from Asian countries generates the real network. After evaluation, we found that the Spin glass and Louvain algorithms perform well in most cases, while the Leading Eigen algorithm performs worst in maximum instances.
Preprint
Full-text available
Rare diseases affect 1-in-10 people in the United States and despite increased genetic testing, up to half never receive a diagnosis. Even when using advanced genome sequencing platforms to discover variants, if there is no connection between the variants found in the patient's genome and their phenotypes in the literature, then the patient will remain undiagnosed. When a direct variant-phenotype connection is not known, putting a patient's information in the larger context of phenotype relationships and protein-protein-interactions may provide an opportunity to find an indirect explanation. Databases such as STRING contain millions of protein-protein-interactions and HPO contains the relations of thousands of phenotypes. By integrating these networks and clustering the entities within we can potentially discover latent gene-to-phenotype connections. The historical records for STRING and HPO provide a unique opportunity to create a network time series for evaluating the cluster significance. Most excitingly, working with Children's Hospital Colorado we provide promising hypotheses about latent gene-to-phenotype connections for 38 patients with undiagnosed diseases. We also provide potential answers for 14 patients listed on MyGene2. Clusters our tool finds significant harbor 2.35 to 8.72 times as many gene-to-phenotypes edges inferred from known drug interactions than clusters finds to be insignificant. Our tool, BOCC, is available as a web app and command line tool here: https://github.com/MSBradshaw/BOCC.
Preprint
Full-text available
Carsharing has become increasingly popular in recent years as a sustainable transportation solution, offering individuals access to shared vehicles on a short-term basis. One-way carsharing, in particular, presents unique challenges due to its flexible nature, allowing users to pick up and drop off vehicles at different locations within a designated service area. This flexibility increases the service ridership but comes at the expense of vehicle imbalance among the stations, as some stations may have excess vehicles while other stations have vehicle shortages. Therefore, carsharing companies need to decide on strategies to ensure a balanced distribution of vehicles among the stations. This is essential as unbalanced vehicle distribution can lead to the unavailability of vehicles when needed or, conversely, result in an increased number of unnecessary rebalancing trips, thereby exacerbating traffic congestion and environmental pollution. Such issues can potentially undermine the overall contribution of carsharing to urban sustainability. To this end, this paper reviews the vehicle imbalance problem that arises in this field and the solution algorithms that solve them.
Article
Full-text available
Decomposing a graph into groups of nodes that share similar connectivity properties is essential to understand the organization and function of complex networks. Previous works have focused on groups with specific relationships between group members, such as assortative communities or core-periphery structures, developing computational methods to find these mesoscale structures within a network. Here we go beyond these two traditional cases and introduce a methodology that is able to identify and systematically classify all possible community types in directed multi graphs, based on the pairwise relationship between groups. We apply our approach to 53 different networks and find that assortative communities are the most common structures, but that previously unexplored types appear in almost every network. A particularly prevalent new type of relationship, which we call a source-basin structure, has information flowing from a sparsely connected group of nodes (source) to a densely connected group (basin). We look in detail at two online social networks – a new network of Twitter users and a well-studied network of political blogs – and find that source-basin structures play an important role in both of them. This confirms not only the widespread appearance of non-assortative structures but also the potential of hitherto unidentified relationships to explain the organization of complex networks.
Chapter
Exploring label correlations is one of the main challenges in multi-label classification. The literature shows that prediction performances can be improved when classifiers learn these correlations. On the other hand, some works also argue that the multi-label classification methods cannot explore label correlations. The traditional multi-label local approach uses only information from individual labels, which makes it impractical to find relationships between them. In contrast, the multi-label global approach uses information from all labels simultaneously and may miss more specific relationships that are relevant. To overcome these limitations and verify if improving the prediction performances of multi-label classifiers is possible, we propose using Community Detection Methods to model label correlations and partition the label space into partitions between the local and global ones. These partitions, here named hybrid partitions, are formed of disjoint clusters of correlated labels, which are then used to build multi-label datasets and train multi-label classifiers. Since our proposal can generate several hybrid partitions, we validate all of them and choose the one that is considered the best. We compared our hybrid partitions with the local and global approaches and an approach that generates random partitions. Although our proposal improved the predictive performance of the used classifier in some datasets compared with other partitions, it also showed that, in general, independent of the approach used, the classifier still has difficulties learning several labels and predicting them correctly.
Article
Full-text available
From the perspective of human mobility, the COVID-19 pandemic constituted a natural experiment of enormous reach in space and time. Here, we analyse the inherent multiple scales of human mobility using Facebook Movement maps collected before and during the first UK lockdown. Firstly, we obtain the pre-lockdown UK mobility graph and employ multiscale community detection to extract, in an unsupervised manner, a set of robust partitions into flow communities at different levels of coarseness. The partitions so obtained capture intrinsic mobility scales with better coverage than nomenclature of territorial units for statistics (NUTS) regions, which suffer from mismatches between human mobility and administrative divisions. Furthermore, the flow communities in the fine-scale partition not only match well the UK travel to work areas but also capture mobility patterns beyond commuting to work. We also examine the evolution of mobility under lockdown and show that mobility first reverted towards fine-scale flow communities already found in the pre-lockdown data, and then expanded back towards coarser flow communities as restrictions were lifted. The improved coverage induced by lockdown is well captured by a linear decay shock model, which allows us to quantify regional differences in both the strength of the effect and the recovery time from the lockdown shock.
Conference Paper
Full-text available
Simplification of combustion reaction mechanisms through graph network clustering
Preprint
Full-text available
Ecological restoration of degraded lands is essential to human sustainability. Yet, an in-depth community, functional, and evolutionary microbial perspective of long-term restoration of damaged ecosystems is lacking. Herein, we comprehensively assessed the impact of long-term (up to 17 years) restoration of Tengger Desert, China, by multi-omic profiling of 1,910 topsoil samples. The soil biophysiochemical properties, especially soil hydraulics, microbiome stability, and functional diversity, significantly improved during restoration. The soil microbiome transitioned from an extreme oligotrophic and autotrophic community to a diverse copiotrophic ecosystem. The soil microbiota, including fungi, could mediate the soil physicochemical changes through metabolites. Importantly, the systematic rewiring of nutrient cycles featured the multi-domain preference of an efficient carbon fixation strategy in the extreme desert environment. Finally, the microbiome was evolving via positive selections of genes of biogeochemical cycles, resistance, and motility. In summary, we present a comprehensive community, functional, biogeochemical, and evolutionary landscape of the soil microbiome during the long-term restoration of desert environments. We highlight the crucial microbial role in restoration from soil hydraulic and biogeochemical perspectives, offering promising field applications. Highlights The desert soil microbiome transformed from simple oligotrophic to a diverse, stable, and nutrient-rich ecosystem with expanded functional diversity. Restoration led to systematically rewired biogeochemical cycles, which are highly efficient in carbon fixation in the desert environment. The microbiome was evolving via positive selections of genes involved in biogeochemical cycles and environmental adaptations. Microbes and metabolites could facilitate desert restoration from hydraulic and biogeochemical aspects, offering promising field applications.
Article
Novel combinations of technologies are usually the result of collaborative work that builds on existing knowledge. Albeit inventors and their respective communities tend to be specialized, inventor collaborations across differently specialized peers have the potential to generate co-inventor networks that provide access to a diverse set of knowledge and facilitate the production of radical novelty. Previous research has demonstrated that short access in large co-inventor networks enables innovative outcomes in regional economies. However, how connections in the network across different technological knowledge domains matter and what impact they might generate is still unknown. The present investigation focuses on ‘atypical’ combinations of technologies as indicated in patent documents. In particular, the role of technological specializations linked in co-inventor networks that result in radical innovation in European regions is analyzed. Our results confirm that the share of atypical patents is growing in regions where bridging ties establish short access to and across cohesive co-inventor sub-networks. Furthermore, the evidence suggests that the strong specialization of co-inventor communities in regions fosters atypical combinations because these communities manage to increase the scale and scope of novel combinations. Thus, bridges between communities that are specialized in different technologies favor atypical innovation outcomes. The work shows that not diversity per se, but links across variously specialized inventor communities can foster radical innovation.
ResearchGate has not been able to resolve any references for this publication.