On Modularity Clustering

Univ. of Konstanz, Konstanz
IEEE Transactions on Knowledge and Data Engineering (Impact Factor: 1.89). 03/2008; DOI: 10.1109/TKDE.2007.190689
Source: IEEE Xplore

ABSTRACT Modularity is a recently introduced quality measure for graph clusterings. It has immediately received considerable attention in several disciplines, particularly in the complex systems literature, although its properties are not well understood. We study the problem of finding clusterings with maximum modularity, thus providing theoretical foundations for past and present work based on this measure. More precisely, we prove the conjectured hardness of maximizing modularity both in the general case and with the restriction to cuts and give an Integer Linear Programming formulation. This is complemented by first insights into the behavior and performance of the commonly applied greedy agglomerative approach.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Guillain-Barré syndrome (GBS) is a neurological disorder which has not been explored using clustering algorithms. Clustering algorithms perform more efficiently when they work only with relevant features. In this work, we applied correlation-based feature selection (CFS), chi-squared, information gain, symmetrical uncertainty, and consistency filter methods to select the most relevant features from a 156-feature real dataset. This dataset contains clinical, serological, and nerve conduction tests data obtained from GBS patients. The most relevant feature subsets, determined with each filter method, were used to identify four subtypes of GBS present in the dataset. We used partitions around medoids (PAM) clustering algorithm to form four clusters, corresponding to the GBS subtypes. We applied the purity of each cluster as evaluation measure. After experimentation, symmetrical uncertainty and information gain determined a feature subset of seven variables. These variables conformed as a dataset were used as input to PAM and reached a purity of 0.7984. This result leads to a first characterization of this syndrome using computational techniques.
    Computational and Mathematical Methods in Medicine 01/2014; 2014:432109. · 0.79 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper aims at a conceptual foundation for the development of advanced computational methods for analyzing co-offending networks to identify organized crime structures, that is, any static or dynamic characteristics of a co-offending network that potentially indicate organized crime or refer to criminal organizations. Specifically, we study networks derived from large real-world crime data sets using social network analysis and data mining techniques. Striving for a coherent and consistent framework for defining the problem scope and analysis methods, we propose here a constructive approach that uses mathematical models of crime data and criminal activity as underlying semantic foundation. Organized crime has been defined in a variety of ways, although, so far, there is surprisingly little agreement about its meaning—at least not at a level of detail and precision required for defining this meaning in abstract computational terms.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The detection of communities (internally dense sub-graphs) is a network analysis task with manifold applications. The special task of selective community detection is concerned with finding high-quality communities locally around seed nodes. Given the lack of conclusive experimental studies, we perform a systematic comparison of different previously published as well as novel methods. In particular we evaluate their performance on large complex networks, such as social networks. Algorithms are compared with respect to accuracy in detecting ground truth communities, community quality measures, size of communities and running time. We implement a generic greedy algorithm which subsumes several previous efforts in the field. Experimental evaluation of multiple objective functions and optimizations shows that the frequently proposed greedy approach is not adequate for large datasets. As a more scalable alternative, we propose selSCAN, our adaptation of a global, density-based community detection algorithm. In a novel combination with algebraic distances on graphs, query times can be strongly reduced through preprocessing. However, selSCAN is very sensi-tive to the choice of numeric parameters, limiting its practicality. The random-walk-based PageRankNibble emerges from the comparison as the most successful candidate.
    IEEE BigData '14 - BigGraphs Workshop; 10/2014

Full-text (2 Sources)

1 Download
Available from