On Modularity Clustering

Univ. of Konstanz, Konstanz
IEEE Transactions on Knowledge and Data Engineering (Impact Factor: 1.89). 03/2008; DOI: 10.1109/TKDE.2007.190689
Source: IEEE Xplore

ABSTRACT Modularity is a recently introduced quality measure for graph clusterings. It has immediately received considerable attention in several disciplines, particularly in the complex systems literature, although its properties are not well understood. We study the problem of finding clusterings with maximum modularity, thus providing theoretical foundations for past and present work based on this measure. More precisely, we prove the conjectured hardness of maximizing modularity both in the general case and with the restriction to cuts and give an Integer Linear Programming formulation. This is complemented by first insights into the behavior and performance of the commonly applied greedy agglomerative approach.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Detection of communities within social networks is a nontrivial problem. Allowing communities to overlap—i.e. nodes can belong to more than one community simultaneously—further complicates the problem. Nevertheless, people do belong to multiple social groups simultaneously and being able to detect overlapping communities is an important step into being able to understand and analyze social networks. A common practice in community detection (clustering) is to view the network (graph) as a whole and have a central control process determine how nodes are clustered. That central control, we believe, is a limitation to performance. In our previous work, we showed that the individual’s view of his or hers social groups could be aggregated to produce communities. In this paper, we propose a unique approach to community detection that combines the individual’s view of a community, not having the view the graph as a whole, with swarm intelligence as a means of removing the central control mechanism. Our approach offers a community detection solution that finds overlapping communities while running in O(n log2 n) time.
    Social Network Analysis and Mining. 07/2014; 2(4).
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Human behavior has long been recognized to display hierarchical structure: actions fit together into subtasks, which cohere into extended goal-directed activities. Arranging actions hierarchically has well established benefits, allowing behaviors to be represented efficiently by the brain, and allowing solutions to new tasks to be discovered easily. However, these payoffs depend on the particular way in which actions are organized into a hierarchy, the specific way in which tasks are carved up into subtasks. We provide a mathematical account for what makes some hierarchies better than others, an account that allows an optimal hierarchy to be identified for any set of tasks. We then present results from four behavioral experiments, suggesting that human learners spontaneously discover optimal action hierarchies.
    PLoS Computational Biology 08/2014; 10(8):e1003779. · 4.87 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The amount of graph-structured data has recently experienced an enormous growth in many applications. To transform such data into useful information, fast analytics algorithms and software tools are necessary. One common graph analytics kernel is disjoint community detection (or graph clustering). Despite extensive research on heuristic solvers for this task, only few parallel codes exist, although parallelism will be necessary to scale to the data volume of real-world applications. We address the deficit in computing capability by a flexible and extensible community detection framework with shared-memory parallelism. Within this framework we design and implement efficient parallel community detection heuristics: A parallel label propagation scheme; the first large-scale parallelization of the well-known Louvain method, as well as an extension of the method adding refinement; and an ensemble scheme combining the above. In extensive experiments driven by the algorithm engineering paradigm, we identify the most successful parameters and combinations of these algorithms. We also compare our implementations with state of the art competitors. The processing rate of our fastest algorithm often reaches 50M edges/second, making it suitable for massive data sets with billions of edges. We recommend the parallel Louvain method and our variant with refinement as both qualitatively strong and fast.

Full-text (2 Sources)

Available from