Conference Paper

Rates of convergence for the cluster tree.

Conference: Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada.
Source: DBLP
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The clusters of a distribution are often defined by the connected components of a density level set. However, this definition depends on the user-specified level. We address this issue by proposing a simple, generic algorithm, which uses an almost arbitrary level set estimator to estimate the smallest level at which there are more than one connected components. In the case where this algorithm is fed with histogram-based level set estimates, we provide a finite sample analysis, which is then used to show that the algorithm consistently estimates both the smallest level and the corresponding connected components. We further establish rates of convergence for the two estimation problems, and last but not least, we present a simple, yet adaptive strategy for determining the width-parameter of the involved density estimator in a data-depending way.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Many current problems dealing with big data can be cast efficiently as function approximation on graphs. The information in the graph structure can often be reorganized in the form of a tree; for example, using clustering techniques. The objective of this paper is to develop a new system of orthogonal functions on weighted trees. The system is local, easily implementable, and allows for scalable approximations without saturation. A novelty of our orthogonal system is that the Fourier projections are uniformly bounded in the supremum norm. We describe in detail a construction of wavelet–like representations and estimate the degree of approximation of functions on the trees.
    Applied and Computational Harmonic Analysis 07/2014; 38(3). DOI:10.1016/j.acha.2014.06.006 · 3.00 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Mode clustering is a nonparametric method for clustering that defines clusters using the basins of attraction of a density estimator's modes. We provide several enhancements to mode clustering: (i) a soft variant of cluster assignment, (ii) a measure of connectivity between clusters, (iii) a technique for choosing the bandwidth, (iv) a method for denoising small clusters, and (v) an approach to visualizing the clusters. Combining all these enhancements gives us a useful procedure for clustering in multivariate problems.


Available from