Article

Finding community structure in very large networks

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... At the country level, the influence of countries on the supply chain can be evaluated using centrality measures: in-, out-and betweenness-degree and Laplacian energy. Next, changes to the structure of the global supply chain can be examined by identifying communities in the network, which are clusters of nodes that are densely connected internally-using an algorithm based on modularity, as in Gui et al. (2014), which improves upon that developed by Clauset, Newman and Moore (2004). This approach is based on the idea that there should be multiple edges between the nodes within a community and only a few between communities. ...
... The Clauset-Newman-Moore greedy modularity maximisation algorithm (Clauset, Newman and Moore, 2004;Gui et al., 2014;Monken et al., 2021) used in this study is an optimised version of the modularity maximisation described in Equations (2) and (3). It begins by assigning each node in the network to its own community and joining pairs of communities together such that the modularity is maximised, concluding when no further increase in modularity is possible. ...
... In simple term, seeQi et al. (2012: 242-243) for details. 6 A parameter that can be adjusted to limit the minimum size of the communities. 7 SeeClauset, Newman and Moore (2004). Downloaded from https://academic.oup.com/erae/advance-article/doi/10.1093/erae/jbae031/7942497 by guest on 04 January 2025 ...
Article
In the last decade, climate change, Covid-19, and several international conflicts have created significant disruptions to global and regional supply chains, leading to a re-evaluation of the benefits of globalisation. Modelling food trade as network graphs, this study spotlights the effects of these shocks, on the structure, flow and evolution of food supply chains. Network centrality measures show substantial changes in the influence exerted by China, Russia, and the United States, among others. Using machine learning, community detection and global metrics such as clustering further detail the structural changes in the trade network. Differences between systemic and idiosyncratic shocks are also discussed.
... We next explored the modules, the highly interconnected sub-structures that represent ecological units within networks, 29 in two age groups. Two main modules determined by the Greedy modularity algorithm 30 were identified. The first module included the positive correlation between two potential pathogenic microbes: G. vaginalis and F. vaginae. ...
... The networks were constructed by including only significant correlations with P value < 0.01 and |cor| > 0.1. The modularity of the network was determined using the fast greedy clustering algorithm 30 and visualized by the R package igraph (version 1.2.6). 91 The centrality indices of nodes were computed and visualized using the R package qgraph (version 1.9.8). ...
Article
Full-text available
The vaginal microbiome is critical for the reproductive health of women, yet the differential impacts exerted by the host and by ambient environmental variables on the vaginal microbiome remain largely unknown. Here, we conducted a comprehensive cross-sectional study of the relationships between the vaginal microbiome and 81 matched host and environmental variables across 6755 Chinese women. By 16S rRNA sequencing, we identified four core vaginal microbiota with a prevalence of over 90% and a total median abundance of 98.8%. Twenty-four variables, including physiology, lifestyle behaviors, gynecologic history, social and environmental information, were found associated with the microbiome composition, of which bacterial vaginosis (BV) showed the largest effect size. Age was among the strongest explanatory variables and the vaginal microbiome dynamically succeeded with increasing age, especially with a composition turning point at the age of 45. Our mediation analyses indicated that the effects of age on the microbiome could be mediated by variables such as parity number and lifestyles. We further classified the vaginal microbiomes of the population into 13 “Vagitypes”. Women with Lactobacillus iners - and Lactobacillus jensenii -dominated Vagitypes had significantly higher live birth rate than those with Vagitype dominated by Fannyhessea vaginae (53.40%, 59.09% vs 21.43%; OR [95% CI]: 3.62 [1.12–14.87], 5.39 [1.27–27.36]; P = 0.031, P = 0.021). This study provides a comprehensive overview of the associations between identified variables and the vaginal microbiome, representing an important step toward understanding of environment-microbe-host interactions.
... To select the most suitable community detection algorithm, we compared the performance and the quality of the communities generated by several algorithms discussed in Sect. 2. Some of these algorithms allow us to specify the desired number of communities, even if they do not yield the highest quality based on metrics such as modularity. Other algorithms, such as FastGreedy [35], only provide the number of communities that maximises the modularity metric. The quality of the communities obtained largely depends on the type of circuit, and for a given number of qubits, it is similar across most methods. ...
... In the case of these two types of circuits, with the GN algorithm we can choose the number of communities that allows us to complete the final contraction without exhausting the memory. For GHZ circuits, we use a greedy algorithm that tries to maximise the modularity of the decomposition and stops when it cannot improve this parameter [35]. Its cost is O(mlog 2 n) , which can be very small for sparse graphs with n nodes and m edges such as those associated with GHZ circuits. ...
Article
Full-text available
Quantum computing holds significant promise for solving complex problems, but simulating quantum circuits on classical computers remains essential due to the current limitations of quantum hardware. Efficient simulation is crucial for the development and validation of quantum algorithms and quantum computers. This paper explores and compares various strategies to leverage different levels of parallelism to accelerate the contraction of tensor networks representing large quantum circuits. We propose a new parallel multistage algorithm based on communities. The original tensor network is partitioned into several communities, which are then contracted in parallel. The pairs of tensors of the resulting network can be contracted in parallel using a GPU. We use the Girvan–Newman algorithm to obtain the communities and the contraction plans. We compare the new algorithm with two other parallelisation strategies: one based on contracting all the pairs of tensors in the GPU and another one that uses slicing to cut some indexes of the tensor network and then MPI processes to contract the resulting slices in parallel. The new parallel algorithm gets the best results with different well-known quantum circuits with a high degree of entanglement, including random quantum circuits. In conclusion, the results show that the main factor that limits the simulation is the space cost. However, the parallel multistage algorithm manages to reduce the cost of sequential simulation for circuits with a high number of qubits and allows simulating larger circuits.
... Another significant challenge in traditional methods for local detection is the effective expansion of the community. FLCS [23] introduces the concept of local modularity to expand the communities, and Shang et al. [24] improve local modularity to discover local communities. On the other hand, node similarity is also an effective indicator of the degree of connection between community members, so optimization methods [25] based on node similarity functions have also been used for local community expansion. ...
... After obtaining the community labels for both the nodes and seed nodes, our objective is to facilitate the gathering of nodes similar to the seed node while ensuring that dissimilar nodes remain distant from the seed node. To achieve this, we establish that the label distribution's target distribution is p uv , as shown in Eq. (23). ...
Article
Full-text available
Unlike global community detection, local community detection is to identify a cluster of nodes sharing similar feature information based on a given seed. The accuracy of many local community detection algorithms heavily relies on the quality of seed nodes. Only high-quality seed nodes can accurately detect local communities. At the same time, the inability to effectively obtain node attributes and structural information also leads to an increase in subgraph clustering error rates. This paper proposes a Local Community Detection based on Core Nodes using deep feature fusion, named LCDCN. We find the nearest nodes for the seed nodes, then construct a k-subgraph through a specific subgraph extractor based on the core nodes. Subsequently, two deep encoders are employed to encode and fuse the attribute and structure information of the subgraph, respectively. Finally, the local community is discovered by optimizing the fused feature representation through a self-supervised optimization function. Extensive experiments on 10 real and 4 synthetic datasets demonstrate that LCDCN outperforms its competitors in performance.
... These metrics focus on specific properties of the communities, such as their internal connectivity, isolation from the rest of the network, and overall density. Conductance (Kannan, Vempala, and Vetta 2004), Expansion (Leskovec et al. 2008), Normalized Cut (Shi and Malik 2000), Density (Radicchi et al. 2004), Internal Density (Fortunato 2010b) and Local Modularity (Clauset, Newman, and Moore 2004) were utilized to assess average community qualities. The appendices have detailed descriptions on these metrics and Girvan-Newman algorithm. ...
... where m S is the number of edges within community S and |S| is the number of nodes in community S. 6. Local Modularity (Clauset, Newman, and Moore 2004): ...
Preprint
Full-text available
Many Natural Language Processing (NLP) related applications involves topics and sentiments derived from short documents such as consumer reviews and social media posts. Topics and sentiments of short documents are highly sparse because a short document generally covers a few topics among hundreds of candidates. Imputation of missing data is sometimes hard to justify and also often unpractical in highly sparse data. We developed a method for calculating a weighted similarity for highly sparse data without imputation. This weighted similarity is consist of three components to capture similarities based on both existence and lack of common properties and pattern of missing values. As a case study, we used a community detection algorithm and this weighted similarity to group different shampoo brands based on sparse topic sentiments derived from short consumer reviews. Compared with traditional imputation and similarity measures, the weighted similarity shows better performance in both general community structures and average community qualities. The performance is consistent and robust across metrics and community complexities.
... This heuristic algorithm is widely used due to its speed and high-quality results. We also consider the Leiden algorithm (Traag, V.A. et al., 2019), a refinement of the Louvain method, as well as other classical methods such as Walktrap (Pons, P., & Latapy, M., 2005), Infomap (Rosvall, M. et al., 2009), the Fast Greedy (Clauset, A. et al., 2004) and the Surprise (Marchese, E. et al., 2022) algorithm. In general, classical methods are defined for non-directed networks. ...
... Different classic CDP algorithms have been applied with the R package igraph and P ython package surprisememore: Louvain (Blondel, V. et al., 2008) (Clauset, A. et al., 2004) (F ast Greedy), and the Surprise (Marchese, E. et al., 2022) (Surprise). As these algorithms are defined for undirected graphs, the adjacency matrices to which they are applied are understood from an undirected perspective (Malliaros, F. & Vazirgiannis, M., 2020). ...
... (Csardi & Nepusz, 2006). Community detection was performed using the Clauset-Newman-Moore greedy modularity maximization algorithm (Clauset et al., 2004) and the Infomap algorithm (Rosvall & Bergstrom, 2008). The k-core algorithm, which recursively prunes nodes with degrees below a threshold k, was applied to identify tightly connected communities. ...
... This network consists of 18 513 nodes and 70 929 edges. Using Clauset-Newman-Moore greedy modularity maximization algorithm (Clauset et al., 2004) and the Infomap algorithm, we identified 3341 and 3780 communities, respectively. Functionally annotations for genes within these communities were performed with KofamScan and E2P2 (Table S6). ...
Article
Full-text available
The clustered distribution of genes involved in metabolic pathways within the plant genome has garnered significant attention from researchers. By comparing and analyzing changes in the flanking regions of metabolic genes across a diverse array of species, we can enhance our understanding of the formation and distribution of biosynthetic gene clusters (BGCs). In this study, we have designed a workflow that uncovers and assesses conserved positional relationships between genes in various species by using synteny neighborhood networks (SNN). This workflow is then applied to the analysis of flanking genes associated with oxidosqualene cyclases (OSCs). The method allows for the recognition and comparison of homologous blocks with unique flanking genes accompanying different subfamilies of OSCs. The examination of the flanking genes of OSCs in 122 plant species revealed multiple genes with conserved positional relationships with OSCs in angiosperms. Specifically, the earliest adjacency of OSC genes and CYP716 genes first appeared in basal eudicots, and the nonrandom occurrence of CYP716 genes in the flanking region of OSC persists across different lineages of eudicots. Our study showed the substitution of genes in the flanking region of the OSC varies across different plant lineages, and our approach facilitates the investigation of flanking gene rearrangements in the formation of OSC‐related BGCs.
... Existing heuristic approaches for maximizing the modularity function come from various fields, including computer science, physics and sociology (Clauset et al., 2004;Massen and Doye, 2005;Newman, 2006;Reichardt and Bornholdt, 2006;Agrawal and Kempe, 2008). In this paper, we adopt a Louvain type maximization method. ...
... Using the Louvain method for community detection in a typical network with 2 million nodes only takes several minutes on a standard PC (Blondel, 2011). Fortunato (2012) noted that the modularity maximum found by the Louvain method often compares favorably with those found by using the methods in Clauset et al. (2004) and Wakita and Tsurumi (2007). ...
Preprint
Heterogeneous networks are networks consisting of different types of nodes and multiple types of edges linking such nodes. While community detection has been extensively developed as a useful technique for analyzing networks that contain only one type of nodes, very few community detection techniques have been developed for heterogeneous networks. In this paper, we propose a modularity based community detection framework for heterogeneous networks. Unlike existing methods, the proposed approach has the flexibility to treat the number of communities as an unknown quantity. We describe a Louvain type maximization method for finding the community structure that maximizes the modularity function. Our simulation results show the advantages of the proposed method over existing methods. Moreover, the proposed modularity function is shown to be consistent under a heterogeneous stochastic blockmodel framework. Analyses of the DBLP four-area dataset and a MovieLens dataset demonstrate the usefulness of the proposed method.
... Higher modularity values indicate stronger internal cluster connections. This work employs the community detection technique proposed in [11], which iteratively maximizes modularity by initially treating each node as an independent cluster and merging them iteratively based on modularity gain. ...
Preprint
Full-text available
The visualization of high-dimensional datasets has gained increasing attention in recent years, as identifying patterns and relationships within such data remains a significant challenge. A common approach involves applying dimensionality reduction techniques, such as PCA and t-SNE, to project the data into two or three dimensions for analysis. However, these visual-izations may limit interpretability due to transformations applied to the original data, potentially obscuring critical information that could reveal new patterns. To address these limitations, this work presents an unsupervised visualization tool that allows users to interactively explore and analyze potential patterns without requiring prior data categorization. The tool provides two visualization options: (1) graph-based representations and (2) similarity matrix, both of which effectively facilitated the discovery of new data patterns and clusters in the analyzed datasets.
... The modularity of a cluster model score can fall between −1 and 1, and a score closer to 1 means there are many connections within a cluster and few connections between clusters (i.e., clusters are dense and distinct), which is optimal for a clustering model [48]. Previous research has indicated that a modularity above 0.3 indicates that the structure of the network is not random [49]. The computed modularity of the 80-NN + AS model used in this study was 0.640, demonstrating that the clusters found were both densely connected within each cluster and sparsely connected between clusters, further supporting that the 80-NN + AS model successfully clustered our data. ...
Article
Full-text available
Background The circadian clock is a central driver of many biological and behavioral processes, regulating the levels of many genes and proteins, termed clock controlled genes and proteins (CCGs/CCPs), to impart biological timing at the molecular level. While transcriptomic and proteomic data has been analyzed to find potential CCGs and CCPs, multi-omic modeling of circadian data, which has the potential to enhance the understanding of circadian control of biological timing, remains relatively rare due to several methodological hurdles. To address this gap, a dual-approach co-expression analysis framework (D-CAF) was created to perform co-expression analysis that is robust to Gaussian noise perturbations on time-series measurements of both transcripts and proteins. Results Applying this D-CAF framework to previously gathered transcriptomic and proteomic data from mouse macrophages gathered over circadian time, we identified small, highly significant clusters of oscillating transcripts and proteins in the unweighted similarity matrices and larger, less significant clusters of of oscillating transcripts and proteins using the weighted similarity network. Functional enrichment analysis of these clusters identified novel immunological response pathways that appear to be under circadian control. Conclusions Overall, our findings suggest that D-CAF is a tool that can be used by the circadian community to integrate multi-omic circadian data to improve our understanding of the mechanisms of circadian regulation of molecular processes.
... node) is proportional to the number of connections. We identify four main clusters of genes using Louvain method (Clauset et al. 2004). The first one (showed in green) contains genes associated with the immune system such as LAPTM4b (Huygens et al. 2015), CD7 (Aandahl et al. 2003 (Chin et al. 1996), andZbtb20 (Nagao et al. 2016). ...
Preprint
Full-text available
This article focuses on covariance estimation for multi-study data. Popular approaches employ factor-analytic terms with shared and study-specific loadings that decompose the variance into (i) a shared low-rank component, (ii) study-specific low-rank components, and (iii) a diagonal term capturing idiosyncratic variability. Our proposed methodology estimates the latent factors via spectral decompositions and infers the factor loadings via surrogate regression tasks, avoiding identifiability and computational issues of existing alternatives. Reliably inferring shared vs study-specific components requires novel developments that are of independent interest. The approximation error decreases as the sample size and the data dimension diverge, formalizing a blessing of dimensionality. Conditionally on the factors, loadings and residual error variances are inferred via conjugate normal-inverse gamma priors. The conditional posterior distribution of factor loadings has a simple product form across outcomes, facilitating parallelization. We show favorable asymptotic properties, including central limit theorems for point estimators and posterior contraction, and excellent empirical performance in simulations. The methods are applied to integrate three studies on gene associations among immune cells.
... A commonly used generative model for network communities is the Stochastic Block Model (SBM). Clauset [12] and Lancichinetti and Fortunato [33] utilizes modularity based optimization technique for community detection task whereas Reichardt and Bornholdt [48], and Rosvall and Bergstrom [50] utilizes random walks, diffusion based algorithm to preserve the community structure. ...
Preprint
Full-text available
Deep neural networks have enabled researchers to create powerful generalized frameworks, such as transformers, that can be used to solve well-studied problems in various application domains, such as text and image. However, such generalized frameworks are not available for solving graph problems. Graph structures are ubiquitous in many applications around us and many graph problems have been widely studied over years. In recent times, there has been a surge in deep neural network based approaches to solve graph problems, with growing availability of graph structured datasets across diverse domains. Nevertheless, existing methods are mostly tailored to solve a specific task and lack the capability to create a generalized model leading to solutions for different downstream tasks. In this work, we propose a novel, resource-efficient framework named \emph{U}nified \emph{G}raph \emph{N}etwork (UGN) by leveraging the feature extraction capability of graph convolutional neural networks (GCN) and 2-dimensional convolutional neural networks (Conv2D). UGN unifies various graph learning tasks, such as link prediction, node classification, community detection, graph-to-graph translation, knowledge graph completion, and more, within a cohesive framework, while exercising minimal task-specific extensions (e.g., formation of supernodes for coarsening massive networks to increase scalability, use of \textit{mean target connectivity matrix} (MTCM) representation for achieving scalability in graph translation task, etc.) to enhance the generalization capability of graph learning and analysis. We test the novel UGN framework for six uncorrelated graph problems, using twelve different datasets. Experimental results show that UGN outperforms the state-of-the-art baselines by a significant margin on ten datasets, while producing comparable results on the remaining dataset.
... where c i is the community of i, c j that of j, the sum goes over all i and j pairs of nodes, and μ(c i , c j ) is 1 if c i = c j and 0 otherwise. Various modularity algorithms have been proposed in the literature, and in our work, we use the Clauset-Newman-Moore algorithm (Clauset et al., 2004). Modularity maximization is an NP-complete problem (Brandes, 2006). ...
Article
Full-text available
In this paper, we study urban road infrastructure in densely populated cities. As the subject of our study, we choose road networks from 35 populous cities worldwide, including China, India, Pakistan, Colombia, Brazil, Bangladesh, and Cote d’Ivoire. We abstract road networks as complex systems, represented by graphs consisting of nodes and links, and employ tools from network science to study their topological properties. Our multi-scale analysis includes macro-, meso-, and micro-scale perspectives, deriving insights into both common and unexpected patterns in these networks. At the macro-scale, we examine the global properties of these networks, summarizing the results in radar diagrams. This analysis reveals significant correlations among key metrics, indicating that more robust networks tend to be more efficient, while diameter and average path length show negative correlations with other properties. At the meso-scale, we explore the existence of sub-structures embedded within the road networks using two main concepts, namely, community and core-periphery structures. We find that while these densely populated city road networks show particularly strong community structures (high modularity values, close to 1.0) that are not typical to other networks, they exhibit a low level of presence of core-periphery structures, with an average coreness of 6.3%. This points to the cities being polycentric. At the micro-scale, we find nodal-level properties of the network. Specifically, we compute the various centrality measures and examine their distributions to capture the prevalent characteristics of these networks. We observe that the centrality measures present different distribution patterns. While the degree distribution demonstrates a limited range of degree values, the betweenness centrality distribution follows a power law, and the closeness centrality exhibits a binomial distribution—yet these patterns remain consistent across the studied cities. Overall, our multi-scale analysis provides valuable insights into the topological properties of urban road networks, informing city planning, traffic management, and infrastructure development in similar urban environments.
... It runs in time O(m). -Fast greedy modularity optimization (Fastgreedy) by Clauset, Newman and Moore [15] is a hierarchical agglomeration algorithm for detecting community structure based on modularity optimization. Starting from a set of isolated vertices, the edges of the original graph are iteratively added to produce the largest possible increase of the modularity at each step. ...
Preprint
Full-text available
Anonymization of graph-based data is a problem which has been widely studied over the last years and several anonymization methods have been developed. Information loss measures have been used to evaluate data utility and information loss in the anonymized graphs. However, there is no consensus about how to evaluate data utility and information loss in privacy-preserving and anonymization scenarios, where the anonymous datasets were perturbed to hinder re-identification processes. Authors use diverse metrics to evaluate data utility and, consequently, it is complex to compare different methods or algorithms in literature. In this paper we propose a framework to evaluate and compare anonymous datasets in a common way, providing an objective score to clearly compare methods and algorithms. Our framework includes metrics based on generic information loss measures, such as average distance or betweenness centrality, and also task-specific information loss measures, such as community detection or information flow. Additionally, we provide some metrics to examine re-identification and risk assessment. We demonstrate that our framework could help researchers and practitioners to select the best parametrization and/or algorithm to reduce information loss and maximize data utility.
... The results of community detection for the constructed networks are shown in the right part of Table 1. In every generation, bimodularity exceeds 0.48, a strong indicator of community structure [61]. Table 2 Further information on the primary communities in each generation is available in Additional file 1, Tables S2-S9. ...
... To determine communities within these networks we aim to maximise the modularity quality function (Q) which measures the number of edges that fall within the groups (g i denotes the group assignment of v i ), minus the expected number of edges that would exist if placed at random [47,48]: ...
Preprint
Full-text available
Modularity is a well-established concept for assessing community structures in various single and multi-layer networks, including those in biological and social domains. Biological networks, such as the brain, are known to exhibit group structure at a variety of scales -- local, meso, and global scale. Modularity, while useful in describing mesoscale brain organization, is limited as a metric to a global scale describing the overall strength of community structure. This approach, while valuable, overlooks important localized variations in community structure at the node level. To address this limitation, we extended modularity to individual nodes. This novel measure of nodal modularity (nQ) captures both meso and local scale changes in modularity. We hypothesized that nQ illuminates granular changes in the brain due to diseases such as Alzheimer's disease (AD), which are known to disrupt the brain's modular structure. We explored nQ in multiplex networks of a visual short-term memory binding task in fMRI and DTI data in the early stages of AD. Observed changes in nQ in fMRI and DTI networks aligned with known trajectories of AD and were linked to common biomarkers of the disease, including amyloid-β\beta and tau. Additionally, nQ clearly differentiated MCI from MCI converters showing indications that nQ may be a useful diagnostic tool for characterizing disease stages. Our findings demonstrate the utility of nQ as a measure of localized group structure, providing novel insights into temporal and disease related variability at the node level. Given the widespread application of modularity as a global measure, nQ represents a significant advancement, providing a granular measure of network organization applicable to a wide range of disciplines.
... For the random sample, we randomly select half of the population. For the cluster sampling strategy, we first detect the clusters using a greedy modularity optimization algorithm (Clauset et al., 2004), one that has been used in recent studies (Nadini et al., 2021;Hernandez et al., 2021). We then randomly select half the number of clusters and expose every respondent in the selected clusters to the policy. ...
Preprint
Full-text available
In the process of enacting or introducing a new policy, policymakers frequently consider the population's responses. These considerations are critical for effective governance. There are numerous methods to gauge the ground sentiment from a subset of the population; examples include surveys or listening to various feedback channels. Many conventional approaches implicitly assume that opinions are static; however, in reality, the population will discuss and debate these new policies among themselves, and reform new opinions in the process. In this paper, we pose the following questions: Can we quantify the effect of these social dynamics on the broader opinion towards a new policy? Given some information about the relationship network that underlies the population, how does overall opinion change post-discussion? We investigate three different settings in which the policy is revealed: respondents who do not know each other, groups of respondents who all know each other, and respondents chosen randomly. By controlling who the policy is revealed to, we control the degree of discussion among the population. We quantify how these factors affect the changes in policy beliefs via the Wasserstein distance between the empirically observed data post-discussion and its distribution pre-discussion. We also provide several numerical analyses based on generated network and real-life network datasets. Our work aims to address the challenges associated with network topology and social interactions, and provide policymakers with a quantitative lens to assess policy effectiveness in the face of resource constraints and network complexities.
... Repertoire context specificity. To classify signals into modules, we used the fast-greedy clustering algorithm 79 . Signals clustered within the same module co-occur at a higher frequency than those across modules, giving us a bottom-up method of inferring biologically relevant communicative context based upon patterns of co-occurrence 43 . ...
Article
Full-text available
Juveniles occupy a different social niche than adults, engaging in a smaller diversity of social contexts and perceiving greater social risks. Either or both of these factors may influence the form communication takes in immaturity and its developmental trajectory. We investigated the relative influence of these social forces on the development of multimodal communication in plains zebras (Equus quagga). Juveniles possessed smaller repertoires than adults, with lower combinatorial flexibility and greater stereotypy, particularly for signals used in submission. When interacting with adults, juveniles used a larger fraction of their repertoire, but with reduced combinatorial flexibility. The usage of a contextually flexible signal, “snapping”, also shifted across development, beginning as a stereotyped, submissive signal before diversifying into the full range of adult usage. Taken together, the lower complexity of juvenile communication may reduce signal ambiguity and the risk of miscommunication when interacting with social partners perceived as higher risk, like adults.
... Community detection in network analysis identifies densely connected groups of nodes, or communities, that are more connected to each other than to nodes outside the group (Fortunato & Hric, 2016). Unlike latent variable clustering, which assumes underlying unobservable constructs that influence observed relationships, community detection directly examines structural patterns in the network without presupposing latent dimensions (Clauset et al., 2004). Community detection also differs from traditional clustering methods, such as k-means, by leveraging the topology of the network rather than relying on distance metrics or data point attributes (Lancichinetti & Fortunato, 2009). ...
Article
Full-text available
Background: Due to the nature of their work, Public Safety Personnel (PSP; e.g., firefighters, paramedics, police officers) are frequently exposed to potentially psychological traumatic events (PPTE) and are at increased risk of developing posttraumatic stress symptoms (PTSS) compared to the general population. To date, there are a limited number of published studies that have used the statistical tools of network analysis to examine PTSS in PSP, typically relying on small, homogenous samples. Basic procedures: The current study used a large (n=5,319) and diverse sample of PSP to estimate a network of PTSS and exploratory graph analysis to assess alternative structures of symptom clustering, compared to traditional latent models. Main findings: The results of the analyses estimated two symptom clusters which differed from most latent models of PTSS. Re-experiencing and avoidance symptoms clustered together, instead of in two clusters. Similarly, hyperarousal symptoms (hypervigilance, sleep disturbance, startle reflex, concentration difficulties) clustered in a single community instead of two or three clusters in many latent models of PTSS. The symptom of detachment played the most central role in the network and acted as a bridge symptom between numerous clusters of symptoms. The least central symptom was amnesia, which also had the most inconsistent pattern of clustering and bridging. Other bridge symptoms included negative emotions, difficulty concentrating, and reckless behaviour. Principal conclusions: The symptom of detachment played a pervasive role in centrality and bridging in a network of PTSS in PSP. Future research is necessary to identify whether central PTSS differ across populations based on their PPTE type (e.g., combat, assault, rape) or typical environmental factors (e.g., group cohesion in PSP and military).
... We computed the modularity score of each year to assess whether communication became less combinatorically flexible and more stereotyped as the drought intensified. The modularity score, which ranges from 0 to 1, is a measure of the interconnectivity of a network (Clauset, Newman, and Moore 2004). When applied to a communication network, the modularity score can be used as another way to characterize the flexibility of a signaling system. ...
Article
Full-text available
Anthropogenically induced climate change has significantly increased the frequency of acute weather events, such as drought. As human activities amplify environmental stresses, animals may be forced to prioritize survival over behaviors less crucial to immediate fitness, such as socializing. Yet, social bonds may also enable individuals to weather the deleterious effects of environmental conditions. We investigated how the highly social plains zebra (Equus quagga) modify their activity budgets, social networks, and multimodal communication during a drought. Although animals prioritized feeding and the number of social interactions dramatically decreased in the late drought period, social associations remained robust. We observed age/sex class‐specific changes in social behavior, reflecting the nutritional needs and social niche of each individual. Stallions devoted more time to greeting behaviors, which could mitigate harassment by bachelor males and facilitate grazing time for the females of the harem. Juveniles significantly increased time spent active socializing, despite mothers showing the greatest decrease in the number of social interactions. Instead, unrelated, nonlactating females served as social partners, accommodating both juveniles' social needs and lactating mothers' nutritive requirements. Using a network‐based representation of multimodal communication, we observed a decrease in the number of signals used during the drought. Individuals used less diverse multimodal combinations, particularly in the costly context of aggression. These findings illustrate how social roles and differential responses to acute environmental stress within stable social groups may contribute to species resilience, and how communication flexibly responds to facilitate both survival and sociality under harsh environmental conditions.
... The algorithm is famous for its simplicity and efficiency in spotting community structures in vast networks. However, the Walktrap algorithm (hierarchical clustering algorithm) (Pons and Latapy, 2005), the Spinglass algorithm (Reichardt and Bornholdt, 2006), and a Fast-Greedy algorithm (Clauset et al., 2004) can be implemented for unsupervised output. ...
Article
Full-text available
China’s Belt and Road Initiative (BRI) aims to revive ancient trade routes and boost international trade, enhancing regional integration, trade, and economic growth. The present study aimed to examine the representation of the BRI in Indo-Pakistani newspapers, specifically the Daily Dawn Pakistan and the Times of India, by employing computational framing analysis. A total of 2081 news reports from the Daily Dawn and 587 news reports from The Times of India were collected using the Lexis Advance database between March 23, 2013, and December 1, 2019. The findings revealed that the two newspapers utilized distinct framing strategies to represent the BRI. The Daily Dawn frames were related to development, security challenges, and political concerns, while India sees China as a strong opponent and the BRI as China’s geostrategic plan to extend its global power and military presence in the Indian Ocean, posing a challenge to India’s national interests. The results also indicated that bilateral talks, China-Pakistan Economic Corridor (CPEC) implications, and BRI concerns are the mainframes in the news coverage of The Times of India. This research provided qualitative evidence to study the media coverage of the BRI in Indo-Pakistani media. The results of this study can be a reference for policymakers and international players who seek to improve relations with both countries.
... These characteristics diverge from the assumptions underlying traditional deep learning models, which are typically designed for Euclidean data. To address these challenges, researchers have proposed a variety of graph neural networks (GNNs) [9][10][11][12] , designed for various different machine learning applications, including community mining 13,14 and graph embedding 15,16 . GNNs stand out for their ability to capture complex structures and interactions within graph data while effectively integrating node attributes in an end-to-end learning process.In contrast to traditional deep learning techniques, GNNs are adept at representing complex structures and relationships in graph data while effectively incorporating node attributes into an end-to-end learning framework. ...
Article
Full-text available
Graph data is essential for modeling complex relationships among entities. Graph Neural Networks (GNNs) have demonstrated effectiveness in processing low-order undirected graph data; however, in complex directed graphs, relationships between nodes extend beyond first-order connections and encompass higher-order relationships. Additionally, the asymmetry introduced by edge directionality further complicates node interactions, presenting greater challenges for extracting node information. In this paper, We propose TWC-GNN, a novel graph neural network design, as a solution to this problem. TWC-GNN uses node degrees to define higher-order topological structures, assess node importance, and capture mutual interactions between central nodes and their adjacent counterparts. This approach improves our understanding of complex relationships within the network. Furthermore, by integrating self-attention mechanisms, TWC-GNN effectively gathers higher-order node information in addition to focusing on first-order node information. Experimental results demonstrate that the integration of topological structures and higher-order node information is crucial for the learning process of graph neural networks, particularly in directed graphs, leading to improved classification accuracy.
... Girvan and Newman [52] proposed an approach, known as GN method, to find community structure based on centrality indices. Clauset et al. [53] proposed a hierarchical agglomerative algorithm, known as CNM, to detect community structures in networks. Pons and Latapy [54] proposed a community detection method (PL) based on random walks. ...
Article
Full-text available
The detection of community structures in complex networks has garnered significant attention in recent years. Given its NP-hardness, numerous evolutionary optimization-based approaches have been proposed. However, there is still a need for improvement, since no previous method has been able to find the true community structure on all benchmark networks. In this paper, we propose a multi-objective approach based on the social-based algorithm (SBA). SBA is a hybrid of the imperialist competitive algorithm and the standard evolutionary algorithm. SBA has demonstrated remarkable performance in finding optimal solutions and can effectively identify correct community structures. In our method, instead of a single solution, a set of non-dominated solutions is produced, each representing a different compromise between the objective functions. To evaluate the performance of the proposed algorithm, various tests were conducted on synthetic and real-world datasets. The results of experiments on several social network benchmarks show up to a 57% improvement on benchmark networks compared to previous works. This indicates that the proposed algorithm performs well in community detection.
... becomes computationally infeasible for large networks, necessitating the use of approximate methods. Several heuristic approaches have been developed to address modularity maximization, including greedy algorithms [19,20], simulated annealing [21,22], extremal optimization [23], genetic algorithms [24], and the widely used Louvain method [25]. Despite these advances, no single algorithm performs optimally across all network types due to variations in network structures and purposes [26]. ...
Preprint
Community detection, also known as graph partitioning, is a well-known NP-hard combinatorial optimization problem with applications in diverse fields such as complex network theory, transportation, and smart power grids. The problem's solution space grows drastically with the number of vertices and subgroups, making efficient algorithms crucial. In recent years, quantum computing has emerged as a promising approach to tackling NP-hard problems. This study explores the use of a quantum-inspired algorithm, Simulated Bifurcation (SB), for community detection. Modularity is employed as both the objective function and a metric to evaluate the solutions. The community detection problem is formulated as a Quadratic Unconstrained Binary Optimization (QUBO) problem, enabling seamless integration with the SB algorithm. Experimental results demonstrate that SB effectively identifies community structures in benchmark networks such as Zachary's Karate Club and the IEEE 33-bus system. Remarkably, SB achieved the highest modularity, matching the performance of Fujitsu's Digital Annealer, while surpassing results obtained from two quantum machines, D-Wave and IBM. These findings highlight the potential of Simulated Bifurcation as a powerful tool for solving community detection problems.
... In power systems with many distributed ES systems, the electrical network can be locally aggregated based on their uneven distribution, known as the community structure [35]. Frequency control performance can be affected by the community structure. ...
Article
In recent years, a significant number of distributed small-capacity energy storage (ES) systems have been integrated into power grids to support grid frequency regulation. However, the challenges associated with high-dimensional control and synergistic operation alongside conventional generators remain unsolved. In this paper, a partitioning-based control approach is developed for the participation of widespread distributed ES systems on frequency control in power systems. The approach comprises a network partitioning method and a two-layer frequency control scheme. The partitioning method utilizes a community detection algorithm in which the weights between the buses are calculated based on the electrical distances. After partitioning the buses into different groups, an optimization-based frequency control system with two layers is established to aggregate and dis-aggregate the inertia and droop coefficients so that frequency regulation and economical operation can be achieved. The effectiveness of the proposed method is demonstrated through numerical simulations on an IEEE 39-bus system. The results confirm the successful elimination of frequency deviations and low operating cost of the proposed approach.
... For network graphs of the full chatbot population, we used the label-propagation clustering algorithm (Raghavan et al., 2007) to detect communities since it efficiently finds simple community structures in very large graphs. For network graphs of the English-language chatbot population, we instead used the fast-greedy clustering algorithm (Clauset et al., 2004) since it is more sensitive and thus more suitable for detecting sub-communities. To visualize these large graphs efficiently, we set the graph layouts using the Fruchterman-Reingold force-directed layout algorithm (Fruchterman & Reingold, 1991). ...
Article
Full-text available
Artificial Intelligence (AI) chatbots, such as ChatGPT, have been shown to mimic individual human behaviour in a wide range of psychological and economic tasks. Do groups of AI chatbots also mimic collective behaviour? If so, artificial societies of AI chatbots may aid social scientific research by simulating human collectives. To investigate this theoretical possibility, we focus on whether AI chatbots natively mimic one commonly observed collective behaviour: homophily , people's tendency to form communities with similar others. In a large simulated online society of AI chatbots powered by large language models ( N = 33,299), we find that communities form over time around bots using a common language. In addition, among chatbots that predominantly use English ( N = 17,746), communities emerge around bots that post similar content. These initial empirical findings suggest that AI chatbots mimic homophily, a key aspect of human collective behaviour. Thus, in addition to simulating individual human behaviour, AI‐powered artificial societies may advance social science research by allowing researchers to simulate nuanced aspects of collective behaviour.
... First, each node is assigned to a cluster by locally maximizing modularity. Then, the clusters are grouped into super-nodes to form a new network, and the process repeats until convergence 0.642 5 Girvan-Newman (Girvan and Newman 2002) This algorithm progressively removes the edges with the highest levels of betweenness centrality, which helps reveal the network's clusters Leading Eigenvector (Newman 2006) It uses the eigenvalues of the network to divide the nodes into clusters, maximizing modularity through eigenvector analysis 0.642 5 Fast Greedy (Clauset et al. 2004) This algorithm iteratively merges pairs of clusters that increase modularity until no further improvement is possible 0.642 5 • Adaptability: Another advantage of Louvain is its ability to adapt to various networks. Unlike the Fast Greedy algorithm, which performs optimally on small to medium-sized networks, Louvain can efficiently handle more complex networks. ...
Article
Full-text available
This article explores trends in the empirical studies on the careers of visual artists over the past decade. Methodologically, we used the Louvain algorithm to perform a co-occurrence analysis, revealing thematic clusters. This approach enabled us to identify five distinct trends, each highlighting specific aspects and the geographical diversity of visual artists’ careers, which were then evaluated using a thematic map. The main findings revealed that “working conditions, income inequalities, and professional practices in the creative and cultural industries” stand out as a central and influential theme, with a significant impact and notable advances that have shaped recent discussions and practices. The themes “professional dynamics, cultural policies, and career strategies in contemporary art” and “dynamics of artistic careers, resilience, and support policies in contemporary art” are well-developed and have a solid research base, offering an in-depth understanding of the relevant topics. The theme “positioning and success strategies of artists in the contemporary art market” represents an emerging theme that is receiving increasing attention and is likely to become prominent, while the theme “gender inequalities, geographical mobility, and dynamics of artistic clusters” indicates a waning influence for various reasons. The results obtained aim to provide concrete guidance to researchers and the institutions that confer legitimacy in the visual arts world. By shedding light on contemporary issues related to the construction of visual artists’ careers from a global perspective, this study contributes to a deeper understanding of current dynamics.
... We computed a network cluster analysis (cf. Clauset et al., 2004) to determine groupings within cross-national competition. ...
Article
Full-text available
This article challenges the idea of platform capitalism, that digital platforms implement a uniform model based on a self-employed labor force. Expanding on empirical evidence of a diversity of platform models, we theorize expectations about platform diversity from competition and comparative capitalism research. Using a unique cross-national dataset of leading food delivery platforms in 32 countries across North America and Europe, we compare platform models and competitive relations across national institutional regimes. Our analyses uncover a considerable diversity of platform models across Europe, in contrast to a clear uniformity in North America. We also find that the use of self-employment varies across and within large multinational corporations and is most prevalent in countries of the lightly regulated regime type. Our results call for an economic sociology perspective on the platform economy that integrates a general concept of platforms but allows for diversity stemming from competition and different national regimes.
... Local community detection has received much attention for its capacity to rapidly identify communities containing the seed node [1], [25]. Scholars have proposed methods based on local modularity [2], [3], k-core [26], k-truss [27], k-clique [28], personalized PageRank [5], [29] and other methods. ...
Preprint
Real-world networks are often constructed from different sources or domains, including various types of entities and diverse relationships between networks, thus forming multi-domain networks. A single network typically fails to capture the complete graph structure and the diverse relationships among multiple networks. Consequently, leveraging multiple networks is crucial for a comprehensive detection of community structures. Most existing local community detection methods discover community structures by integrating information from different views on multi-view networks. However, methods designed for multi-view networks are not suitable for multi-domain networks. Therefore, to mine communities from multiple networks, we propose a Local Algorithm for Multiple networks with node Affiliation, called LAMA, which is suitable for both multi-view and multi-domain networks. The core idea of LAMA is to optimize node affiliations by maximizing the quality of communities within each network while ensuring consistency in community structures across multiple networks. The algorithm iteratively optimizes node affiliations and expands the community outward based on affiliations to detect the community containing the seed node. Experimental results show that LAMA outperforms comparison algorithms on two synthetic datasets and five real datasets.
... We use the modularity ( M ) metric (Clauset et al. 2004;Schuetz and Caflisch 2008;Rossi and Villa-Vialaneix 2011), which is recently shown to be effective (Sözer 2019) for SAR. We adopt this metric to define a single objective. ...
Article
Full-text available
Software Architecture Recovery (SAR) techniques analyze dependencies between software modules and automatically cluster them to achieve high modularity. Many of these approaches employ Genetic Algorithms (GAs) for clustering software modules. A major drawback of these algorithms is their lack of scalability. In this paper, we address this drawback by introducing generic software components that can encapsulate subroutines (operators) of a GA to execute them in parallel. We use these components to implement a novel hybrid GA for SAR that exploits parallelism to find better solutions faster. We compare the effectiveness of parallel algorithms with respect to the sequential counterparts that are previously proposed for SAR. We observe that parallelization enables a greater number of iterations to be performed in the search for high-quality solutions. The increased efficiency achieved through parallel processing allows for faster convergence towards optimal solutions by harnessing the power of multiple processing units in a coordinated manner. The amount of improvement in modularity is above 50%, which particularly increases in the context of large-scale systems. Our algorithm can scale to recover the architecture of a large system, Chromium, which has more than 18,500 modules and 750,000 dependencies among these modules.
... ORDER AND SIZE OF NETWORK INSTANCES COMPARING MODULARITY OBTAINED BY DIFFERENT METHODS CNM (FAST-GREEDY)[24], EIG[15], LOUVAIN[12], SDPM, THE SEMIDEFINITE ROUNDING IN THIS PAPER, AND THE OPTIMAL MODULARITY VALUES OPT[22]. ...
Preprint
Many social networks and complex systems are found to be naturally divided into clusters of densely connected nodes, known as community structure (CS). Finding CS is one of fundamental yet challenging topics in network science. One of the most popular classes of methods for this problem is to maximize Newman's modularity. However, there is a little understood on how well we can approximate the maximum modularity as well as the implications of finding community structure with provable guarantees. In this paper, we settle definitely the approximability of modularity clustering, proving that approximating the problem within any (multiplicative) positive factor is intractable, unless P = NP. Yet we propose the first additive approximation algorithm for modularity clustering with a constant factor. Moreover, we provide a rigorous proof that a CS with modularity arbitrary close to maximum modularity QOPT might bear no similarity to the optimal CS of maximum modularity. Thus even when CS with near-optimal modularity are found, other verification methods are needed to confirm the significance of the structure.
... 12 To provide a quick sense of any observed community structure, we use the cluster edge betweenness (Newman and Girvan, 2004) and modularity (Clauset, Newman and Moore, 2004) function provided in the igraph (Csardi and Nepusz, 2006) package for R (R Core Team, 2016). This modularity statistic is included simply to provide an exploratory look at the extent to which community structure may be observed across the latent geometries, particularly for elliptic space. ...
Preprint
We review the class of continuous latent space (statistical) models for network data, paying particular attention to the role of the geometry of the latent space. In these models, the presence/absence of network dyadic ties are assumed to be conditionally independent given the dyads? unobserved positions in a latent space. In this way, these models provide a probabilistic framework for embedding network nodes in a continuous space equipped with a geometry that facilitates the description of dependence between random dyadic ties. Specifically, these models naturally capture homophilous tendencies and triadic clustering, among other common properties of observed networks. In addition to reviewing the literature on continuous latent space models from a geometric perspective, we highlight the important role the geometry of the latent space plays on properties of networks arising from these models via intuition and simulation. Finally, we discuss results from spectral graph theory that allow us to explore the role of the geometry of the latent space, independent of network size. We conclude with conjectures about how these results might be used to infer the appropriate latent space geometry from observed networks.
... Detecting the communities in networks means the appearance of dense connected groups of vertices and sparse connections among groups. We adopt modularity as a quality function of communities introduced by Newman and detect them by a fast greedy algorithm of modularity maximization that is one effective approach to identify communities [32]. If network V is divided into L subsets {V 1 , V 2 , · · · , V L } which do not overlap and are not empty, modularity Q is defined as ...
Preprint
We investigate the structure of global inter-firm linkages using a dataset that contains information on business partners for about 400,000 firms worldwide, including all the firms listed on the major stock exchanges. Among the firms, we examine three networks, which are based on customer-supplier, licensee-licensor, and strategic alliance relationships. First, we show that these networks all have scale-free topology and that the degree distribution for each follows a power law with an exponent of 1.5. The shortest path length is around six for all three networks. Second, we show through community structure analysis that the firms comprise a community with those firms that belong to the same industry but different home countries, indicating the globalization of firms' production activities. Finally, we discuss what such production globalization implies for the proliferation of conflict minerals (i.e., minerals extracted from conflict zones and sold to firms in other countries to perpetuate fighting) through global buyer-supplier linkages. We show that a limited number of firms belonging to some specific industries and countries plays an important role in the global proliferation of conflict minerals. Our numerical simulation shows that regulations on the purchases of conflict minerals by those firms would substantially reduce their worldwide use.
... However the infected people are more strongly interconnected than with the susceptibles (see Figure. 6d), which is indicated by non-zero modularity coefficient [44,45] of Q=0.2384. ...
Preprint
We study the standard SIS model of epidemic spreading on networks where individuals have a fluctuating number of connections around a preferred degree κ\kappa . Using very simple rules for forming such preferred degree networks, we find some unusual statistical properties not found in familiar Erd\H{o}s-R\'{e}nyi or scale free networks. By letting κ\kappa depend on the fraction of infected individuals, we model the behavioral changes in response to how the extent of the epidemic is perceived. In our models, the behavioral adaptations can be either `blind' or `selective' -- depending on whether a node adapts by cutting or adding links to randomly chosen partners or selectively, based on the state of the partner. For a frozen preferred network, we find that the infection threshold follows the heterogeneous mean field result λc/μ=/<k2>\lambda_{c}/\mu = /<k^{2}> and the phase diagram matches the predictions of the annealed adjacency matrix (AAM) approach. With `blind' adaptations, although the epidemic threshold remains unchanged, the infection level is substantially affected, depending on the details of the adaptation. The `selective' adaptive SIS models are most interesting. Both the threshold and the level of infection changes, controlled not only by how the adaptations are implemented but also how often the nodes cut/add links (compared to the time scales of the epidemic spreading). A simple mean field theory is presented for the selective adaptations which capture the qualitative and some of the quantitative features of the infection phase diagram.
... A wide variety of applications (and this hardness result) have promoted the development of modularity maximization heuristics. In fact, there are numerous algorithms based on various techniques such as greedy procedure [5,11,29], simulated annealing [22,24], spectral optimization [28,30], extremal optimization [15], and mathematical programming [1,7,8,25]. Although some of them are known to perform well in practice, they have no theoretical approximation guarantee at all. ...
Preprint
The modularity is a quality function in community detection, which was introduced by Newman and Girvan (2004). Community detection in graphs is now often conducted through modularity maximization: given an undirected graph G=(V,E), we are asked to find a partition C\mathcal{C} of V that maximizes the modularity. Although numerous algorithms have been developed to date, most of them have no theoretical approximation guarantee. Recently, to overcome this issue, the design of modularity maximization algorithms with provable approximation guarantees has attracted significant attention in the computer science community. In this study, we further investigate the approximability of modularity maximization. More specifically, we propose a polynomial-time (cos(354π)1+58)\left(\cos\left(\frac{3-\sqrt{5}}{4}\pi\right) - \frac{1+\sqrt{5}}{8}\right)-additive approximation algorithm for the modularity maximization problem. Note here that cos(354π)1+58<0.42084\cos\left(\frac{3-\sqrt{5}}{4}\pi\right) - \frac{1+\sqrt{5}}{8} < 0.42084 holds. This improves the current best additive approximation error of 0.4672, which was recently provided by Dinh, Li, and Thai (2015). Interestingly, our analysis also demonstrates that the proposed algorithm obtains a nearly-optimal solution for any instance with a very high modularity value. Moreover, we propose a polynomial-time 0.16598-additive approximation algorithm for the maximum modularity cut problem. It should be noted that this is the first non-trivial approximability result for the problem. Finally, we demonstrate that our approximation algorithm can be extended to some related problems.
Preprint
Full-text available
Disentangling physiopathological mechanisms of biological systems through high-level integration of omics data has become a standard procedure in life sciences. However, platform heterogeneity, batch effects, and the lack of unified methods for single- and multi-omics analyses represent relevant drawbacks that hinder the extrapolation of a meaningful biological interpretation. Statistical meta-analysis is widely used in order to integrate several omics datasets of the same type, leading to the extrapolation of robust molecular signatures within the investigated system. Conversely, statistical meta-analysis does not allow the simultaneous investigation of different molecular layers, and, therefore, the integration of multi-modal data deriving from multi-omics experiments. Although in the last few years a number of valid tools designed for multi-omics data integration have emerged, they have never been combined with statistical meta-analysis tools in a unique analytical solution in order to support meaningful biological interpretation. Network science is at the forefront of systems biology, where the inference of molecular interactomes allowed the investigation of perturbed biological systems, by shedding light on the disrupted relationships that keep the homeostasis of complex systems. Here, we present MUUMI, an R package that unifies network-based data integration and statistical meta-analysis within a single analytical framework. MUUMI allows the identification of robust molecular signatures through multiple meta-analytic methods, inference and analysis of molecular interactomes and the integration of multiple omics layers through similarity network fusion. We demonstrate the functionalities of MUUMI by presenting two case studies in which we analysed 1) 17 transcriptomic datasets on idiopathic pulmonary fibrosis (IPF) from both microarray and RNA-Seq platforms and 2) multi-omics data of THP-1 macrophages exposed to different polarising stimuli. In both examples, MUUMI revealed biologically coherent signatures, underscoring its value in elucidating complex biological processes.
Preprint
Full-text available
Background: The axenisation of phototrophic eukaryotic microalgae has been studied for over a century, with antibiotics commonly employed to achieve axenic cultures. However, this approach often yields inconsistent outcomes and may contribute to the emergence of antibiotic-resistant microbes. A comprehensive review of microalgal species and the methods used to achieve axeny could provide insights into potentially effective workflows and identify gaps for future exploration. Methods: Scholarly databases were systematically searched, supplemented by citation network analysis and AI-assisted tools, to collect studies on achieving axenic phototrophic eukaryotic microalgae cultures. Data about microalgal species, axenisation workflows, outcomes, and related factors (e.g., sampling locations, axenisation confirmation methods) were summarised. Network component analysis was used to identify clusters of commonly reported methods for diatoms, dinoflagellates, and green algae. A scoring framework was developed to assess the quality and reliability of evidence presented in the studies. Results: Emerging patterns suggest that workflows involving filtration, washing, and micropicking are frequently reported for diatoms; micropicking, subculturing, and flow cytometry for dinoflagellates; and anoxy, photosensitisation, and streak plating for green algae. Evidence from the literature indicates that a combination of microscopy (e.g., epifluorescence), cell counting (e.g., agar plating), and sequencing (16S and/or 18S) could enhance confidence in confirming axeny. Conclusion: While antibiotics dominate current practices, alternative pathways for achieving axenic cultures are identifiable through network component analysis. Higher confidence in these methods depends on improved experimental designs and high-quality reporting.
Article
Full-text available
Recently, research on multi-objective optimization algorithms for community detection in complex networks has grown considerably. Community detection based on multi-objective algorithms (MOAs) in complex social networks is a fundamental scheduler, and it supports knowing the dynamics of a society, finding influential groups, and improving information dissemination. The traditional methodologies often cannot cope with the features that real-world network usually present, related to optimizing various and sometimes conflicting objectives. This paper provides an overview of some recent works on MOAs for community detection in complex social networks. This paper will explore the balance of the reached objectives, such as modularity, community size, and edge density. Which are analyzed by 15 different approaches in order to choose from works published during the period 2019–2024. These strengths and limitations of various MOAs are reviewed with a comparative analysis to provide insights into both the effectiveness and computational efficiency of these methods. The present trends and future research are discussed that underline the need for the development of solutions to be more adaptive and scalable in coping with the gradually increasing complexity of social networks.
Article
Implementing the Vehicle-to-Infrastructure with the upgraded electronic toll collection systems (ETC2.0) has transformed Japan’s transportation infrastructure by elevating it into one of the foremost Intelligent Transport Systems. Despite the wealth of data provided by ETC2.0, its application in studying urban mobility patterns under extreme weather conditions remains limited. This study examines the impact of heavy snowfall on the mobility network in Sapporo, Japan, using ETC2.0 probe data. By comparing mobility patterns on selected heavy snow and normal days in February 2022, the study identifies significant changes in network structure and community distribution. Findings reveal that heavy snowfall causes fragmentation of mobility networks, with notable shifts in community locations and node centrality. The study underscores the importance of maintaining connectivity to industrial and commercial areas during extreme weather events and highlights the need for further research into the relationship between community structures and travel behavior.
Article
Full-text available
Bioregionalization consists in the identification of spatial units with similar species composition and is a classical approach in the fields of biogeography and macroecology. The recent emergence of global databases, improvements in computational power and the development of clustering algorithms coming from the network theory have led to several major updates of the bioregionalizations of many taxa. A typical bioregionalization workflow involves five different steps: formatting the input data, computing a (dis)similarity matrix, selecting a bioregionalization algorithm, evaluating the resulting bioregionalization and mapping and interpreting the bioregions. For most of these steps, there are many options available in the methods and R packages. Here, we present bioregion, a package that includes all the steps of a bioregionalization workflow under a single architecture, with an exhaustive list of the bioregionalization algorithms used in biogeography and macroecology. These algorithms include (non‐)hierarchical algorithms as well as community detection algorithms coming from the network theory. Some key methods from the literature, such as the network community detection algorithm Infomap or OSLOM (Order Statistics Local Optimization Method) that were not available in the R language are included in bioregion. By combining different methods coming from different fields to communicate easily, bioregion will allow a reproducible and complete comparison of the different bioregionalization methods, which is still missing in the literature.
Article
This study explores the significance of converting public buildings to wooden construction and employing wooden interiors when considering the expansion of wood utilization. It also examines how forest coverage and the presence or absence of forestry-related departments impact the conversion of main government buildings to wood construction in municipalities, along with the key issues related to this transition. The targeted municipalities were classified into “Wooden”, “Wooden interior”, and “Non-wooden” groups. According to the results, 10.4% municipalities fell into “Wooden” group, 83.2% into “Wooden interior”, and 6.4% into “Non-wooden”. A comparative analysis indicated that municipalities with higher forest coverage and more forestry-related departments were more inclined to adopt wood construction for their main government buildings. A quantitative text analysis of the minutes of committee meetings and other meetings related to the construction of the main government building highlighted issues such as the time required to procure local timber, earthquake and fire resistance, seismic isolation performance of wooden structures, and construction costs, although the use of local timber is expected from the perspective of forestry promotion when constructing main government buildings.
Article
Advances in next‐generation sequencing technology have enabled the high‐throughput profiling of metagenomes and accelerated microbiome studies. Recently, there has been a rise in quantitative studies that aim to decipher the microbiome co‐occurrence network and its underlying community structure based on metagenomic sequence data. Uncovering the complex microbiome community structure is essential to understanding the role of the microbiome in disease progression and susceptibility. Taxonomic abundance data generated from metagenomic sequencing technologies are high‐dimensional and compositional, suffering from uneven sampling depth, over‐dispersion, and zero‐inflation. These characteristics often challenge the reliability of the current methods for microbiome community detection. To study the microbiome co‐occurrence network and perform community detection, we propose a generalized Bayesian stochastic block model that is tailored for microbiome data analysis where the data are transformed using the recently developed modified centered‐log ratio transformation. Our model also allows us to leverage taxonomic tree information using a Markov random field prior. The model parameters are jointly inferred by using Markov chain Monte Carlo sampling techniques. Our simulation study showed that the proposed approach performs better than competing methods even when taxonomic tree information is non‐informative. We applied our approach to a real urinary microbiome dataset from postmenopausal women. To the best of our knowledge, this is the first time the urinary microbiome co‐occurrence network structure in postmenopausal women has been studied. In summary, this statistical methodology provides a new tool for facilitating advanced microbiome studies.
Article
Community detection algorithms have become a valuable tool for analyzing complex social networks. It assists scholars in comprehending network topology intuitively and finding hidden human relations in society network. On the contrary, the community hidden problem is raised because some special organizations need to cooperate in social networks while avoiding discovery by community detection algorithms. To solve this problem, a new safety-based community hiding algorithm is advanced in this paper, which hides the target community by disrupting a limited number of links. First, a safety gain function was designed to find the appropriate link by measuring the denseness of the community. Subsequently, we conducted extensive experiments using eight datasets and five classical community detection algorithms, comparing our method with the latest community hiding algorithms. The experimental results confirm that our proposed algorithm is more efficient than previous community hiding algorithms.
Preprint
Full-text available
3D Gaussian Splatting (3DGS) has emerged as a transformative method in the field of real-time novel synthesis. Based on 3DGS, recent advancements cope with large-scale scenes via spatial-based partition strategy to reduce video memory and optimization time costs. In this work, we introduce a parallel Gaussian splatting method, termed PG-SAG, which fully exploits semantic cues for both partitioning and Gaussian kernel optimization, enabling fine-grained building surface reconstruction of large-scale urban areas without downsampling the original image resolution. First, the Cross-modal model - Language Segment Anything is leveraged to segment building masks. Then, the segmented building regions is grouped into sub-regions according to the visibility check across registered images. The Gaussian kernels for these sub-regions are optimized in parallel with masked pixels. In addition, the normal loss is re-formulated for the detected edges of masks to alleviate the ambiguities in normal vectors on edges. Finally, to improve the optimization of 3D Gaussians, we introduce a gradient-constrained balance-load loss that accounts for the complexity of the corresponding scenes, effectively minimizing the thread waiting time in the pixel-parallel rendering stage as well as the reconstruction lost. Extensive experiments are tested on various urban datasets, the results demonstrated the superior performance of our PG-SAG on building surface reconstruction, compared to several state-of-the-art 3DGS-based methods. Project Web:https://github.com/TFWang-9527/PG-SAG.
Conference Paper
The rapid growth of large-scale datasets in fields like biology and social networks has driven the need for advanced graph analytics techniques. Community detection, a fundamental task in graph analytics, identifies closely connected groups of nodes within a network, providing valuable insights across various disciplines. This study focuses on two classic community detection methods, the Louvain algorithm and Markov Clustering (MCL), and evaluates the performance of two prominent distributed community detection algorithms: HiPDPL-GPU, our prior implementation, and HipMCL. We conduct experiments on GPU-accelerated heterogeneous HPC systems, Summit and Frontier, to assess their performance under varying conditions. Our objective is to identify the strengths and weaknesses of these algorithms in terms of scalability, and quality of solutions. We evaluate these algorithms on a diverse set of 70+ networks spanning 13 domains, with sizes ranging up to 4.2 billion edges. Our results demonstrate that HiPDPL-GPU consistently outperforms HipMCL, especially for large-scale networks. HiPDPL-GPU achieves significantly faster runtimes (47x to 1439x), higher modularity scores, and improved scalability. These findings highlight HiPDPL-GPU as a promising solution for efficient and effective large-scale graph analytics in diverse application domains, and provide insights into the feasibility of using MCL-based approaches for certain application domains.
Article
Full-text available
Social network analysis (SNA) of social media content allows information transfer to be visualised, identifies influential actors, and reveals public opinion. However, to date no research has investigated content related to nutrition on X. This study examined the #nutrition conversations on X (formerly Twitter) utilising SNA and linguistic methods. NodeXL Pro was used for network, semantic and sentiment analyses on English language posts including ‘#nutrition’ collected between 1 and 21 March 2023. The #nutrition network included 17,129 vertices (users) with 26,809 edges (relationships). NodeXL Pro was used to assess the structure of the network and the actors involved by calculating the network metrics. The results show a low density, dispersed network (graph density = 0.001) with most users communicating heavily with a small number of other users. These subgroup community cluster structures restrict information flow outside of the subgroups (modularity = 0.79). These network structures rely on influential users to share information (betweenness centrality range, 0 to 23,375,544). Notably, influential users were typically from both personal and not-for-profit accounts. Semantic analysis identified 97,000 word-pair edges with the most frequently discussed topics related to health, healthy lifestyle and diet, with a positive sentiment found across the network. By using SNA, semantic, and sentiment analyses, this study found a dispersed X network with a high proportion of unconnected users who did not have relationship with other users in the network. The findings reveal a publicly driven debate focused on healthy diets and lifestyle, with information primarily propagated through reposting.
Article
This study evaluates the efficacy of forecasting models in predicting USD/TRY exchange rate fluctuations. We assess Support Vector Machine (SVM), XGBoost, Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) models with 96 and 21 feature sets. Data from 01.01.2010 to 30.04.2024 were sourced from Bloomberg, CBRT, and BDDK. Findings indicate that LSTM and GRU models outperform traditional models, with GRU showing the highest predictive accuracy. SVM performs poorly with highdimensional data, while XGBoost offers moderate predictive power but lacks in capturing intricate patterns. This study highlights the importance of model and feature selection in financial time series forecasting and underscores the advantages of advanced neural networks. The results provide valuable insights for analysts and policymakers in developing robust economic forecasting models.
ResearchGate has not been able to resolve any references for this publication.