ArticlePDF Available

Experiments with Computing Geometric Minimum Spanning Trees

Authors:

Abstract and Figures

this paper, we use a dierent MargDistance (see Section 3.2), which is provably a better choice. Callahan and Kosaraju [1995] showed that for a set S of n points in <
Content may be subject to copyright.
0
500
1000
1500
2000
2500
3000
10^3 10^4 10^5 10^6
#BCP calls / n
Number of points (n)
2-d
3-d
4-d
5-d
1.3
1.32
1.34
1.36
1.38
1.4
1.42
10^3 10^4 10^5 10^6
#BCP calls / n
Number of points (n)
2-d
3-d
4-d
5-d
0
10
20
30
40
50
60
70
80
10^3 10^4 10^5 10^6
Normalized CPU-time
Number of points (n)
2-d
LEDA
Qhull
GeoMST
GeoMST2
Triangle
0
50
100
150
200
250
10^3 10^4 10^5
Normalized CPU-time
Number of points (n)
3-d
Qhull
GeoMST
GeoMST2
0
200
400
600
800
1000
1200
1400
10^3 10^4 10^5
Normalized CPU-time
Number of points (n)
4-d
Qhull
Kruskal
GeoMST
GeoMST2
0
500
1000
1500
2000
2500
10^3 10^4 10^5
Normalized CPU-time
Number of points (n)
5-d
Kruskal
GeoMST
GeoMST2
... It can be shown that the WSPD has O(N) pairs of nodes, and that the MST is a subset of the edges formed between the closest pair of points in each pair of nodes. In 2000, Narasimhan and Zachariasen applied WSPD to compute neighbors of components for Boruvka's algorithm to find edges of the MST [19]. However, the constant in the O(N) size of the WSPD grows exponentially with the data dimension and is often very large in practice. ...
... However, the constant in the O(N) size of the WSPD grows exponentially with the data dimension and is often very large in practice. In 2010, March et al. presented a new dual-tree algorithm for efficiently computing the EMST [20], which is superficially similar to the method in [19] except that the WSPD is replaced by the new dualtree data structure and referred to in the following as FEMST algorithm. They used adaptive algorithm analysis to prove the tightest (and possibly optimal) runtime bound for the EMST problem to-date. ...
Chapter
Euclidean minimum spanning tree algorithms run typically with quadratic computational complexity, which is not practical for large scale high dimensional datasets. In this paper, we propose a new two-level approximate Euclidean minimum spanning tree algorithm for high dimensional data. In the first level, we perform outlier detection for a given data set to identify a small amount of boundary points and run standard Prim’s algorithm on the reduced dataset. In the second level, we conduct a k-nearest neighbors search to complete an approximate Euclidean Minimum Spanning Tree construction process. Experimental results on sample data sets demonstrate the efficiency of the proposed method while keeping high approximate precision.
... In 1993, Callahan and Kosaraju's proposed the concept of Well-Separated Pair Decomposition (WSPD) which forms the basis of most recent EMST algorithms [17]. In 2000, Narasimhan and Zachariasen introduced WSPD to Boruvka's algorithm to find edges of an MST [18]. However, the constant in the O(N) size of the WSPD grows exponentially with the data dimension and is often very large in practice. ...
Chapter
Efficient Euclidean minimum spanning tree algorithms have been proposed for large scale datasets which run typically in time near linear in the size of the data but may not usually be feasible for high-dimensional data. For data consisting of sparse vectors in high-dimensional feature spaces, however, the calculations of an approximate EMST can be largely independent of the feature space dimension. Taking this observation into consideration, in this paper, we propose a new two- stage approximate Euclidean minimum spanning tree algorithm. In the first stage, we perform the standard Prim’s MST algorithm using Cosine similarity measure for high-dimensional sparse datasets to reduce the computation expense. In the second stage, we use the MST obtained in the first stage to complete an approximate Euclidean Minimum Spanning Tree construction process. Experimental results for color image segmentation demonstrate the efficiency of the proposed method, while keeping high approximate precision.
... The second step is equivalent to finding Minimum Spanning Tree and (Wong and Moore , 2002) proposed an alternative implementation based on the GeoMS2 algorithm ( Narasimhan et al , 2000). Though Wong and Moore showed the improvement of the CFF algorithm, their algorithm still requires n as an input. ...
Article
We present a fast clustering algorithm for density contour clusters (Hartigan , 1975) that is a modified version of the Cuevas, Febrero and Fraiman (2000) algorithm. By Hartigan's definition, clusters are the connected components of a level set S c ≡ {f > c} where f is the probability density function. We use kernel density estimators and orthogonal series estimators to estimate f and modify the Cuevas, Febrero and Fraiman (2000) Algorithm to extract the connected com-ponents from level set estimators S c ≡ { f > c}. Unlike the original algorithm, our method does not require an extra smoothing parameter and can use the Fast Fourier Transform (FFT) to speed up the cal-culations. We show the cosmological definition of clusters of galaxies is equivalent to density contour clusters and present an application in cosmology.
... It then returned the neighbors of the query point in the resulting MST. Source code used for computing geometric minimum spanning trees in arbitrary dimensions was provided by Giri Narasimhan [17]. In this code, the (un-normalized) Euclidean metric was used as the distance between instances, instead of HVDM. ...
Conference Paper
Previous experiments with low dimensional data sets have shown that Gabriel graph methods for instance-based learning are among the best machine learning algorithms for pattern classification applications. However, as the dimensionality of the data grows large, all data points in the training set tend to become Gabriel neighbors of each other, bringing the efficacy of this method into question. Indeed, it has been conjectured that for high-dimensional data, proximity graph methods that use sparser graphs, such as relative neighbor graphs (RNG) and minimum spanning trees (MST) would have to be employed in order to maintain their privileged status. Here the performance of proximity graph methods, in instance-based learning, that employ Gabriel graphs, relative neighborhood graphs, and minimum spanning trees, are compared experimentally on high-dimensional data sets. These methods are also compared empirically against the traditional k-NN rule and support vector machines (SVMs), the leading competitors of proximity graph methods.
... In literature, several authors proposed a variety of greedy algorithms [2, 3, 4]. The general drawback with these greedy algorithms is that these cannot handle large amount of data and performance bottleneck will happened and later some more advanced algorithms are developed to solve this problem [5]. In this paper, authors evaluated the performance of dual tree algorithmic framework [6] using single linkage clustering [7] and compared the performance of the dual tree framework in the context of kd-tree and ball-tree. ...
Conference Paper
Full-text available
Now a days many algorithms are invented or being inventing to find the solution for Euclidean Minimum Spanning Tree, EMST, problem, as its applicability is increasing in much wide range of fields containing spatial or spatio temporal data viz. astronomy which consists of millions of spatial data. To solve this problem, we are presenting a technique by adopting the dual tree algorithm for finding efficient EMST and experimented on a variety of real time and synthetic datasets. This paper presents the observed experimental observations and the efficiency of the dual tree framework, in the context of kdtree and ball tree on spatial datasets of different dimensions.
Chapter
Outlier detection techniques and clustering techniques are important areas of data mining. Clustering is all about finding groups of data points, whereas outlier analysis is all about finding data points that are far away from these clusters. Clustering and outlier detection, therefore, share a well-known complementary relationship. A simplistic view is that every data point is either a member of a cluster or an outlier. Data points on the boundary regions of a cluster may also be considered weak outliers. However, the study of boundary points is sometimes more meaningful than clusters and outliers. There has been many research work done on boundary point detection. However, with the data obtained becoming more and more complex, existing boundary point detection algorithms have problems such as low precision, parameter dependence, and difficulty in separating outliers. In this chapter, we propose a boundary point detection algorithm, CENTROID-B, based on the concept of kNN-based centroid which has low dependence on parameters, high precision and can detect outliers at the same time. The experimental results on different types of data sets show that the proposed boundary point detection algorithm is effective and manifests high accuracy. Euclidean minimum spanning tree algorithms run typically with quadratic computational complexity, which is not practical for large-scale multi-dimensional datasets. In this chapter, we propose a new two-level approximate Euclidean minimum spanning tree algorithm for large-scale multi-dimensional datasets. In the first level, we perform the proposed outlier and boundary point detection for a given data set to identify a small amount of boundary points. In the second level, we run standard Prim’s algorithm on the reduced dataset to complete an approximate Euclidean minimum spanning tree. Experiments on sample data sets demonstrate the efficiency of the proposed method, while keeping high approximate precision.
Chapter
In this chapter, we present a fast minimum spanning tree-based clustering algorithm for image segmentation and object recognition tasks. We begin with an introduction to the concept of the minimum spanning tree in general and its application to clustering in specific. The proposed clustering method is next described. Finally, the performance evaluation of the proposed method is conducted on an image-patch-based visual percept detection task for an indoor environment and an outdoor environment.
Chapter
Full-text available
Mobile Ad-Hoc Network (MANET) is a wireless network without infrastructure. Self-configurability and easy deployment feature of the MANET resulted in numerous applications in this modern era. Efficient routing protocols will make MANETs reliable. The open and dynamic operational environment of MANET makes it vulnerable to various network attacks. A common type of attacks targets at the underlying routing protocols. Malicious nodes have opportunities to modify or discard routing information or advertise fake routes to attract user data to go through themselves. The aim of the research is to prevent network using secure routing protocols and to study the performance of the secure network.
Article
Given a surface mesh F in R 3 with vertex set S and con-sisting of Delaunay triangles, we want to construct the De-launay tetrahedralization of S. We present an algorithm which constructs the Delaunay tetrahedralization of S given a bounded degree spanning subgraph T of F . It accelerates the incremental Delaunay triangulation construction by exploiting the connectivity of the points on the surface. If the expected size of the De-launay triangulation is linear, we prove that our algorithm runs in O(n log * n) expected time, speeding up the standard randomized incremental Delaunay triangulation algorithm, which is O(n log n) expected time in this case. We discuss how to find a bounded degree spanning subgraph T from surface mesh F and give a linear time algorithm which obtains a spanning subgraph from any triangulated surface with genus g with maximum degree at most 12g for g > 0 or three for g = 0.
Article
Full-text available
The convex hull of a set of points is the smallest convex set that contains the points. This article presents a practical convex hull algorithm that combines the two-dimensional Quickhull Algorithm with the general-dimension Beneath-Beyond Algorithm. It is similar to the randomized, incremental algorithms for convex hull and Delaunay triangulation. We provide empirical evidence that the algorithm runs faster when the input contains nonextreme points and that it uses less memory. Computational geometry algorithms have traditionally assumed that input sets are well behaved. When an algorithm is implemented with floating-point arithmetic, this assumption can lead to serious errors. We briefly describe a solution to this problem when computing the convex hull in two, three, or four dimensions. The output is a set of "thick" facets that contain all possible exact convex hulls of the input. A variation is effective in five or more dimensions.
Book
Full-text available
The lack of standard library of the data structures and algorithms of combinatorial and geometric computing severely limits the impact of this area on computer science. To address this problem, the LEDA project was introduced in 1989 to build a library of the data types and algorithms of combinatorial and geometric computing. Among its many features, LEDA provides a sizable collection of data types and algorithms in a form that allows them to be used by non-experts. Sample applications are code optimization, motion planning, logic synthesis, scheduling, VLSI design, term rewriting systems, semantic nets, machine learning, image analysis, computational biology, etc.
Article
We survey results in geometric network design theory, including algorithms for constructing minimum spanning trees and low-dilation graphs.
Article
In the core computer science areas--data structures, graph and network algorithms, and computational geometry--LEDA is the first library to cover all material found in the standard textbooks. Written in C++ and freely available worldwide on a variety of hardware, the software is installed at hundreds of sites. This book, written by the main authors of LEDA, is the definitive account of how the system operates and how it can be used. The authors supply plentiful examples from a range of areas to show practical uses of the library, making the book essential for all researchers in algorithms, data structures and computational geometry.
Article
A general method is presented for determining the mathematical expectation of the combinatorial complexity and other properties of the Voronoi diagram ofn independent and identically distributed points. The method is applied to derive exact asymptotic bounds on the expected number of vertices of the Voronoi diagram of points chosen from the uniform distribution on the interior of ad-dimensional ball; it is shown that in this case, the complexity of the diagram is ∵(n) for fixedd. An algorithm for constructing the Voronoid diagram is presented and analyzed. The algorithm is shown to require only ∵(n) time on average for random points from ad-ball assuming a real-RAM model of computation with a constant-time floor function. This algorithm is asymptotically faster than any previously known and optimal in the average-case sense.
Conference Paper
Skip lists are data structures that use probabilistic balancing rather than strictly enforced balancing. The structure of a skip list is determined only by the number of elements in the skip list and the results of consulting the random number generator. Skip lists can be used to perform the same kinds of operations that a balanced tree can perform, including the use of search fingers and ranking operations. The algorithms for insertion and deletion in skip lists are much simpler and faster than equipment systems for balanced trees. Included in the article 13 an analysis of the probabilistic performance of skip lists.
Conference Paper
This work is the first to validate theoretically the suspicions of many researchers — that the “average” Voronoi diagram is combinatorially quite simple and can be constructed quickly. Specifically, assuming that dimension d is fixed, and that n input points are chosen independently from the uniform distribution on the unit d-ball, it is proved thatthe expected number of simplices of the dual of the Voronoi diagram is &THgr;(n) (exact constants are derived for the high-order term), anda relatively simple algorithm exists for constructing the Voronoi diagram in &THgr;(n) time.It is likely that the methods developed in the analysis will be applicable to other related quantities and other probability distributions.
Conference Paper
It is shown that a minimum spanning tree of n points in ℝd can be computed in optimal O(Td(n,n)) time under any fixed Lt−metric, where T d (n, m) denotes the time to find a bichromatic closest pair between n red points and m blue points. The previous bound was O(T d (n, n) log n) and it was proved only for the L 2 (Euclidean) metric. Furthermore, for d = 3 it is shown that a minimum spanning tree can be found in optimal O(n log n) time under the L 1 and L ∞-metric. The previous bound was O(n log n log log n).