Flexible Grid-Based Clustering

DOI: 10.1007/978-3-540-74976-9_33
Source: DBLP


Grid-based clustering is particularly appropriate to deal with massive datasets. The principle is to first summarize the dataset
with a grid representation, and then to merge grid cells in order to obtain clusters. All previous methods use grids with
hyper-rectangular cells. In this paper we propose a flexible grid built from arbitrary shaped polyhedra for the data summary.
For the clustering step, a graph is then extracted from this representation. Its edges are weighted by combining density and
spatial informations. The clusters are identified as the main connected components of this graph. We present experiments indicating
that our grid often leads to better results than an adaptive rectangular grid method.

1 Follower
9 Reads
  • [Show abstract] [Hide abstract]
    ABSTRACT: In recent years, many measurement studies have shown the ubiquity of scanning activities in the Internet and the growing sophistication of probing techniques that became more stealthy by stretching slowly over time or using spoofed source IP addresses. Scans are mainly generated by attackers trying to map the configuration of a target network and by computer worms trying to spread over the Internet. Although, the problem of scan detection has been given a lot of attention by network security researchers, current state-of-the-art methods still suffer from high percentage of false alarms or low ratio of scan detection. In this paper, we propose to detect changes in scanning patterns, by monitor variation of the distribution of scan features in a space spanned by IP source address, IP destination address, source port number, and destination port number. This gives insight on characteristics of scanning activities and exposes the presence of emerging scanning attacks and worms. For that, we propose to use an information theoretic-based approach to detect changes in distributions.
    NETWORKING 2009, 8th International IFIP-TC 6 Networking Conference, Aachen, Germany, May 11-15, 2009. Proceedings; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The ensemble clustering algorithm ECCA (Ensemble of Combined Clustering Algorithms) for processing large datasets is proposed and theoretically substantiated. Results of an experimental study of the algorithm on simulated and real data proving its effectiveness are presented
    Optoelectronics Instrumentation and Data Processing 06/2011; 47(3):245-252. DOI:10.3103/S8756699011030071
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Cluster extraction is a vital part of data mining; however, humans and computers perform it very differently. Humans tend to estimate, perceive or visualize clusters cognitively, while digital computers either perform an exact extraction, follow a fuzzy approach, or organize the clusters in a hierarchical tree. In real data sets, the clusters are not only of different densities, but have embedded noise and are nested, thus making their extraction more challenging. In this paper, we propose a density-based technique for extracting connected rectangular clusters that may go undetected by traditional cluster extraction techniques. The proposed technique is inspired by the human cognition approach of appropriately scaling the level of detail, by going from low level of detail, i.e., one-way clustering to high level of detail, i.e., biclustering, in the dimension of interest, as in online analytical processing. A number of experiments were performed using simulated and real data sets and comparison of the proposed technique made with four popular cluster extraction techniques (DBSCAN, CLIQUE, k-medoids and k-means) with promising results.
    Cognitive Computation 02/2014; 7(1). DOI:10.1007/s12559-014-9281-0 · 1.44 Impact Factor