Conference Paper

Mining complex relationships in the SDSS SkyServer spatial database

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We describe the process of mining complex relationships in spatial databases using the maximal participation index (maxPI), which has a property of discovering low support and high confidence rules. Complex relationships are defined as those involving two or more of: multifeature co-location, self-co-location, one-to many relationships, self-exclusion and multifeature exclusion. We report our results of mining complex relationships in data extracted from the Sloan Digital Sky Survey (SDSS) database.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Recently, co-location pattern has been extended to include positive relationships, self-co-location/ self-exclusion relationships, one-to-many relationships, and multi-feature exclusive relationships [2]. Mining complex spatial patterns from large spatial datasets is an emerging area in data mining [12]. ...
... The other two features, which are the components of set (t − s), are close to each other and inside the Chromium 6 polluted water. FIND t WHERE t.size = 3 (Chromium 6 polluted water) ⊆ t (t -s) [1] containedBy chromium 6 polluted water (t -s) [2] containedBy chromium 6 polluted water (t -s) [1] close to (t -s) [ ...
Conference Paper
Full-text available
The emerging interests in spatial pattern mining lead to the demand for a flexible spatial pattern mining language, on which easy to use and understand visual pattern language could be built. This mo- tivates us to define a pattern mining language called CSPML to allow users to specify complex spatial patterns they are interested in mining from spatial datasets. We describe our proposed pattern mining language in this paper. Unlike general pattern languages proposed in literature, our language is specifically designed for specifying spatial patterns. An interface which allows users to specify the patterns visually is designed. The visual language is based on and goes beyond the visual language proposed in literature in the sense that users use CSPML to retrieve patterns instead of the results of a simple spatial query.
... Using spatial statistics measures, dedicated techniques such as cross k-functions with Monte Carlo simulations, lattice method have been developed to test the collocation of two spatial features. At the outset the studies include, the spatial data mining problem of how to extract a special type of proximity relationship -namely that of distinguishing two clusters of points based on the types of their neighboring features is another study [8][9] [10]. Classes of features are organized into concept hierarchies [11].A reasonable and rather popular approach to spatial data mining is the use of clustering techniques to analyze the spatial distribution of data. ...
Conference Paper
Full-text available
Spatial data mining and spatial data visualization are two comparatively popular technical methods in recent years, in essence, both purpose is to find geography phenomena what spatial data express and find various knowledge and laws implicit in geography entity. It is necessary to combine both organically and form a new research direction-Visualization Spatial Data Mining (VSDM). This paper mainly discusses the key relationships of visualization and spatial data mining, main Application of visualization theories and technologies in spatial data mining, main methods and examples of visualization spatial data mining, we propose a reference model called Visualization Spatial Data.
... Using spatial statistics measures, dedicated techniques such as cross k-functions with Monte Carlo simulations, lattice method have been developed to test the collocation of two spatial features. At the outset the studies include, the spatial data mining problem of how to extract a special type of proximity relationshipnamely that of distinguishing two clusters of points based on the types of their neighboring features is another study [2][6] [8]. Classes of features are organized into concept hierarchies [3].A reasonable and rather popular approach to spatial data mining is the use of clustering techniques to analyze the spatial distribution of data. ...
Article
Full-text available
Information Systems enable us to capture up to date effects due to disaster. It has been widely recognized that spatial data analysis capabilities have not kept up with the need for analyzing the increasingly large volumes of geographic data of various themes that are currently being collected and archived. Our analysis is on disaster management through spatial Maps. Intelligent application algorithms are ideal for finding the rules and unknown information from the vast quantities of computer data. The Intelligence system is to obtain and process the data, to interpret the data, and to design the algorithms for decision makers (Health Companion) as a basis for action. Spatial Map for disaster identification is designed. The Intelligence in each of these algorithms are provided the point and multi-point decision making system to capacitive for evaluation of spreading the dengue. Our contribution in this paper is to design Spatial Maps for Dengue.
... Using spatial statistics measures, dedicated techniques such as cross k-functions with Monte Carlo simulations, lattice method have been developed to test the collocation of two spatial features. At the outset the studies include, the spatial data mining problem of how to extract a special type of proximity relationship -namely that of distinguishing two clusters of points based on the types of their neighboring features is another study [5][6] [7]. Classes of features are organized into concept hierarchies [8].A reasonable and rather popular approach to spatial data mining is the use of clustering techniques to analyze the spatial distribution of data. ...
Article
Full-text available
Recent developments in information technology have enabled collection and processing of vast amounts of personal data, business data and spatial data. It has been widely recognized that spatial data analysis capabilities have not kept up with the need for analyzing the increasingly large volumes of geographic data of various themes that are currently being collected and archived. Our study is carried out on the way to provide the mission-goal strategy (requirements) to predict the disaster. The co-location rules of spatial data mining are proved to be appropriate to design nuggets for disaster identification and the state-of-the-art and emerging scientific applications require fast access of large quantities of data. Here both resources and data are often distributed in a wide area networks with components administrated locally and independently, a framework has been suggested for the above. Our contribution in this paper is to design network architecture for disaster identification.
... However, this approach is affected by the distribution of the data, or more precisely, the number of cut neighbor relations. Arunasalam, Chawla, Sun, and Munro (2004) achieved the mining of complex relationships in a bigger dataset. Subsequently, also presented an improved algorithm called "join-less," which inputs the neighbor relationships into a compressed star model. ...
Article
Real space teems with potential feature patterns with instances that frequently appear in the same locations. As a member of the data-mining family, co-location can effectively find such feature patterns in space. However, given the constant expansion of data, efficiency and storage problems become difficult issues to address. Here, we propose a maximal-framework algorithm based on two improved strategies. First, we adopt a degeneracy-based maximal clique mining method to yield candidate maximal co-locations to achieve high-speed performance. Motivated by graph theory with parameterized complexity, we regard the prevalent size-2 co-locations as a sparse undirected graph and subsequently find all maximal cliques in this graph. Second, we introduce a hierarchical verification approach to construct a condensed instance tree for storing large instance cliques. This strategy further reduces computing and storage complexities. We use both synthetic and real facility data to compare the computational time and storage requirements of our algorithm with those of two other competitive maximal algorithms: “order-clique-based” and “MAXColoc”. The results show that our algorithm is both more efficient and requires less storage space than the other two algorithms.
... Using spatial statistics measures, dedicated techniques such as cross k-functions with Monte Carlo simulations, lattice method have been developed to test the collocation of two spatial features. At the outset the studies include, the spatial data mining problem of how to extract a special type of proximity relationship -namely that of distinguishing two clusters of points based on the types of their neighboring features is another study [2][6] [8]. Classes of features are organized into concept hierarchies [3].A reasonable and rather popular approach to spatial data mining is the use of clustering techniques to analyze the spatial distribution of data. ...
Conference Paper
Full-text available
Information Systems enable us to capture up to date effects due to disaster. It has been widely recognized that spatial data analysis capabilities have not kept up with the need for analyzing the increasingly large volumes of geographic data of various themes that are currently being collected and archived. Our analysis is on disaster management through spatial Maps. Intelligent application algorithms are ideal for finding the rules and unknown information from the vast quantities of computer data. The Intelligence system is to obtain and process the data, to interpret the data, and to design the algorithms for decision makers (Health Companion) as a basis for action. Spatial Map for disaster identification is designed. The Intelligence in each of these algorithms is provided the point and multipoint decision making system to capacitive for evaluation of spreading the dengue disaster.
Article
Recent developments in information technology have enabled collection and processing of vast amounts of personal data, business data and spatial data. It has been widely recognized that spatial data analysis capabilities have not kept up with the need for analyzing the increasingly large volumes of geographic data of various themes that are currently being collected and archived. Our study is carried out on the way to provide the mission-goal strategy (requirements) to predict the disaster, an Intelligence System. Data mining or knowledge discovery is becoming more important as more and more corporate data is being computerized. Intelligent application algorithms ideal for finding the rules and unknown information from the vast quantities of computer data. The Intelligence system is to obtain and process the data, to interpret the data, and to design the algorithms for decision makers (Health Companion) as a basis for action. The distribution technique with in Self Adaptive Disaster Management System establishes the foreground for architectural implementation in heterogeneous environment for computational, contextual cooperative design sets. Network architecture for disaster identification is designed. The Intelligence in each of these algorithms are provided the point and multi-point decision making system to capacitive for evaluation of spreading the cholera and dengue. Our contribution in this paper is to design self adaptive disaster Algorithms to identify spreading of the cholera and Dengue.
Article
During the past few years Special-purpose data mining systems have drawn great attention in the research and industrial area for their application into real environments and more spatial data is used with the application and development of modern science and technology. Therefore, obtaining the spatial knowledge becomes more important and meaningful. Data mining is the process of analyzing data from different perspectives and summarizing it into useful information. Spatial data mining is the extraction of hidden, useful and interesting spatial or non-spatial patterns from large amount of incomplete and noisy spatial databases. Spatial data mining presents new challenges due to the large size of spatial data, the complexity of spatial data types, and the special nature of spatial access methods. By collecting patients' data, we analyze, predict and interpret the data to the health organizations for conducting Campaigns. Our contribution is to design disaster prediction system to identify the Cholera disease using Data mining tools i.e SPSS Modeler and Data mining algorithms.
Conference Paper
Spatial kriging is a widely used predictive model for spatial datasets. In spatial kriging model, the observations are assumed to be Gaussian for computational convenience. However, its predictive accuracy could be significantly compromised if the observations are contaminated by outliers. This deficiency can be systematically addressed by increasing the robustness of spatial kriging model using heavy tailed distributions, such as the Huber, Laplace, and Student's t distributions. This paper presents a novel Robust and Reduced Rank Spatial Kriging Model (R3-SKM), which is resilient to the influences of outliers and allows for fast spatial inference. Furthermore, three effective and efficient algorithms are proposed based on R3-SKM framework that can perform robust parameter estimation, spatial prediction, and spatial outlier detection with a linear-order time complexity. Extensive experiments on both simulated and real data sets demonstrated the robustness and efficiency of our proposed techniques.
Article
Full-text available
This paper presents an efficient method for mining both positive and negative association rules in databases. The method extends traditional associations to include association rules of forms A ⇒ ¬ B, ¬ A ⇒ B, and ¬ A ⇒ ¬ B, which indicate negative associations between itemsets. With a pruning strategy and an interestingness measure, our method scales to large databases. The method has been evaluated using both synthetic and real-world databases, and our experimental results demonstrate its effectiveness and efficiency.
Article
Full-text available
An earlier paper (Szalay et. al. "Designing and Mining MultiTerabyte Astronomy Archives: The Sloan Digital Sky Survey," ACM SIGMOD 2000) described the Sloan Digital Sky Survey's (SDSS) data management needs by defining twenty database queries and twelve data visualization tasks that a good data management system should support. We built a database and interfaces to support both the query load and also a website for ad-hoc access. This paper reports on the database design, describes the data loading pipeline, and reports on the query implementation and performance. The queries typically translated to a single SQL statement. Most queries run in less than 20 seconds, allowing scientists to interactively explore the database. This paper is an in-depth tour of those queries. Readers should first have studied the companion overview paper Szalay et. al. "The SDSS SkyServer, Public Access to the Sloan Digital Sky Server Data" ACM SIGMOND 2002.
Article
Full-text available
Mining co-location patterns from spatial databases may reveal types of spatial features likely located as neighbors in space. In this paper, we address the problem of mining confident co-location rules without a support threshold. First, we propose a novel measure called the maximal participation index. We show that every confident co-location rule corresponds to a co-location pattern with a high maximal participation index value. Second, we show that the maximal participation index is non-monotonic, and thus the conventional Apriori-like pruning does not work directly. We identify an interesting weak monotonic property for the index and develop efficient algorithms to mine confident colocation rules. An extensive performance study shows that our method is both effective and efficient for large spatial databases.
Article
This paper describes the need for mining complex relationships in spatial data. Complex relationships are defined as those involving two or more of: multi-feature co-location, self-co-location, one-to-many relationships, self-exclusion and multi-feature exclusion. We demonstrate that even in the mining of simple relationships, knowledge of complex relationships is necessary to accurately calculate the significance of results. We implement a representation of spatial data such that it contains `weak-monotonic' properties, which are exploited for the efficient mining of complex relationships, and discuss the strengths and limitations of this representation.