ArticlePDF Available

Clustering With GIS: An Attempt to Classify Turkish District Data

Authors:

Abstract and Figures

SUMMARY There is no universally applicable clustering technique in discovering the variety of structures display in data sets. Also, a single algorithm or approach is not adequate to solve every clustering problem. There are many methods available, the criteria used differ and hence different classifications may be obtained for the same data. While larger and larger amounts of data are collected and stored in databases, there is increasing the need for efficient and effective analysis methods. Grouping or classification of measurements is the key element in these data analysis procedures. There are lots of non-spatial clustering techniques in various areas. However, spatial clustering techniques and software are not so common. This study aims comparing different software in non-spatial and spatial clustering techniques, which can be used for different aims such as forming regional politics, constructing statistical integrity or analyzing distribution of funds, in GIS environment and putting forward the facilitative usage of GIS in regional and statistical studies. All districts of Turkey, which is 923 units, were chosen as an application area in this study. Some limitations such as population were specified for clustering of Turkey's districts. Firstly, different clustering techniques for spatial classification were researched. Afterward, database of Turkey's statistical datum was formed and analyzed joining with geographical data in the GIS environment. Different clustering software, SPSS, ArcGIS, CrimeStat and Matlab, were applied according to conclusion of clustering techniques research. Self Organizing Maps (SOM) algorithm, which is the best and most common spatial clustering algorithm in recent years, and CrimeStat K-Means clustering were used in this study as spatial clustering methods. SPSS K-Means and ArcGIS reclassify were used for non-spatial examples.
Content may be subject to copyright.
TS 47 – GIS Applications – Special Issues
Ece Aksoy
Clustering With GIS: An Attempt to Classify Turkish District Data
Shaping the Change
XXIII FIG Congress
Munich, Germany, October 8-13, 2006
1/16
Clustering With GIS: An Attempt to Classify Turkish District Data
Ece AKSOY, Turkey
Key words: Spatial Clustering Techniques, Classification of Statistical Region Units,
Geographical Information Systems (GIS), SOM Algorithm.
SUMMARY
There is no universally applicable clustering technique in discovering the variety of structures
display in data sets. Also, a single algorithm or approach is not adequate to solve every
clustering problem. There are many methods available, the criteria used differ and hence
different classifications may be obtained for the same data. While larger and larger amounts
of data are collected and stored in databases, there is increasing the need for efficient and
effective analysis methods. Grouping or classification of measurements is the key element in
these data analysis procedures. There are lots of non-spatial clustering techniques in various
areas. However, spatial clustering techniques and software are not so common. This study
aims comparing different software in non-spatial and spatial clustering techniques, which can
be used for different aims such as forming regional politics, constructing statistical integrity
or analyzing distribution of funds, in GIS environment and putting forward the facilitative
usage of GIS in regional and statistical studies. All districts of Turkey, which is 923 units,
were chosen as an application area in this study. Some limitations such as population were
specified for clustering of Turkey’s districts. Firstly, different clustering techniques for spatial
classification were researched. Afterward, database of Turkey’s statistical datum was formed
and analyzed joining with geographical data in the GIS environment. Different clustering
software, SPSS, ArcGIS, CrimeStat and Matlab, were applied according to conclusion of
clustering techniques research. Self Organizing Maps (SOM) algorithm, which is the best and
most common spatial clustering algorithm in recent years, and CrimeStat K-Means clustering
were used in this study as spatial clustering methods. SPSS K-Means and ArcGIS reclassify
were used for non-spatial examples.
TS 47 – GIS Applications – Special Issues
Ece Aksoy
Clustering With GIS: An Attempt to Classify Turkish District Data
Shaping the Change
XXIII FIG Congress
Munich, Germany, October 8-13, 2006
2/16
Clustering With GIS: An Attempt to Classify Turkish District Data
Ece AKSOY, Turkey
1. INTRODUCTION
Classification is a basic human conceptual activity. Clustering concept is very important for
spatial data visualization and spatial data. The development of improved clustering algorithms
has received a lot of attention in the last decade. Even though there is an increasing interest in
the use of clustering methods in pattern recognition, image processing and information
retrieval, clustering has a rich history in other disciplines such as biology, psychiatry,
psychology, archaeology, geology, geography, and marketing.
Spatial analysis has been used for many years in various fields. However, the connection to
GIS has only recently emerged. GIS provides the decision maker with a powerful set of tools
for the manipulation and analysis of spatial information. The use of GIS as a visual tool
allows the researcher to explore statistical output that would otherwise be difficult to interpret.
GIS as a box of tools for handling geographical data is useful but is not complete for
statistical and spatial studies. Since the GIS community grows larger, the need to perform
spatial statistical analysis on GIS data will become greater. For that reason, it is critical to
integrate spatial statistical functions into GIS.
Analysis of spatial data emerges as an important functional requirement of both GIS and data
mining including spatial data. Clustering is one of the important techniques in data mining
and geographic knowledge discovery. Clustering is to organize a set of objects into clusters
such that objects in the same group are similar to each other and different from those in other
groups. Clusters in large databases can be used for visualization, in order to help human
analysts in identifying groups and subgroups that have similar characteristics.
In this study, different clustering techniques for spatial classification were researched and
classification experiment for Turkey’s districts was performed.. All districts of Turkey, which
is 923 units, were chosen as an application area. Geographical and tabular data of districts for
Turkey were collected and organized. All data were connected to each other with the help of
obtained database. Principal component analysis was used in statistical process for database.
The purpose of the PCA method is to reduce the dimensionality of the data vectors and to
summarize of large data sets. All collected geographical data were assumed as limitations for
clustering. Also, NUTS, which is statistical clustering concept to identify regional differences,
population thresholds were taken as a limitation for district classification and this helped us
for specifying cluster number. 85 clusters were used for all analysis because of the calculation
of population thresholds.
Four different clustering methods were applied. These methods were applied by using SPSS,
ArcGIS, CrimeStat and Matlab software. Self Organizing Maps (SOM) algorithm, which is
the best and most common spatial clustering algorithm in recent years, CrimeStat K-Means
TS 47 – GIS Applications – Special Issues
Ece Aksoy
Clustering With GIS: An Attempt to Classify Turkish District Data
Shaping the Change
XXIII FIG Congress
Munich, Germany, October 8-13, 2006
3/16
clustering , SPSS K-Means and ArcGIS reclassify were used. Using the ‘Districting’ module
of ArcGIS software is the basis for methods. Briefly, four different software were tested by
using ArcGIS Districting Module, with the help of GIS.
2. CLUSTERING
2.1 Spatial Clustering
There are lots of non-spatial clustering techniques in various areas. However, spatial
clustering techniques and software are not so common. While clustering is one of the most
important tasks in various areas, spatial clustering has also long been used as an important
process in geographic analysis. The vast spatial data explosion of the late 1980s and 1990s
caused by the GIS revolution, the computerization of key information sources, and the
availability of digital map information has greatly increased the opportunity and need for
good spatial classification methods for both research and applied purposes.”(Openshaw and
Turton, 1996) The principal problems in spatial data classification are outlined below
according to them.
Large numbers of areas
Large numbers of variables
Non-normal variable distributions (most geographic data usually have very complex
frequency distributions)
Non linear relationships
Spatial dependency
Data uncertainty is an important feature
Small number problems (It is very important that small zones and small number effects
should not dominate or dictate the characteristics of the spatial classification.)
Variable specific levels of uncertainty
Systematic non random variations in spatial representation
To identify clusters over geographical space, various approaches have been developed, based
on statistics; Delaunay triangulation, a density-based notion, a grid-based division, random
walks, a gravity-based division, etc. “However, existing spatial clustering methods can only
deal with a low-dimensional space (usually 3-D space: 2 spatial dimensions, e.g., location of a
city, and a non-spatial dimension, e.g., the population of the city). On the other hand, general-
purpose high-dimensional clustering methods mainly deal with non-spatial feature spaces and
have very limited power in recognizing spatial patterns that involve neighbors. For spatial
clustering, it is important to be able to identify high-dimensional spatial clusters, which
involves both the spatial dimensions and several non-spatial dimensions.” (Diansheng, 2002)
To meet this need, it is very important to find a way to integrate of both spatial and high-
dimensional clustering methods.
2.1.1 K-Means Clustering Algorithm
K-means is one of the simplest unsupervised learning algorithms that solve the well known
clustering problem. This method is developed by Mac Queen in 1967. He suggests the name
TS 47 – GIS Applications – Special Issues
Ece Aksoy
Clustering With GIS: An Attempt to Classify Turkish District Data
Shaping the Change
XXIII FIG Congress
Munich, Germany, October 8-13, 2006
4/16
K-means for describing his algorithm that assigns each item to the cluster having the nearest
centroid (mean). This process consists of three steps:
Partition the items into k initial clusters
Proceed through the list of items, assigning an item to the cluster whose centroid (mean) is
nearest. Recalculate the centroid for the cluster receiving in the new item and for the
cluster losing the item.
Repeat step 2 until no more assignments take place
The K-Means procedure follows a simple and easy way to classify a given data set through a
certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k
centroids, one for each cluster. K-means is an iterative algorithm.
2.1.2 Kohonen Algorithm and Self Organizing Maps
‘Kohonen Algorithm’ and his ‘Self-Organizing Maps (SOM)’ are among the most important
spatial clustering techniques. Kohonen formalized the self-organizing process in 1981 and
1982 into an algorithmic form that is now being called the Self-Organizing (Feature) Map
(SOM) for effectively creating globally ordered maps. Briefly, a Kohonen map is created
using Artificial Neural Network techniques. SOM is a new, effective software tool for the
visualization of high-dimensional data. The main applications of the SOM are:
The visualization of complex data in a two-dimensional display,
Creation of abstractions like in many clustering techniques.
Advantages and disadvantages of SOM are also explained. Advantages:
Very simple to implement
“Topology-preserving” feature superior to k-means methods
Can be very effective for visualizing high-D spaces
Fast learning
Can incorporate new data quickly
Disadvantages
The output space topology is predefined
Can converge to poor clustering depending on; Initialization and Learning rate
There are more advantages that defined by other authors in the literature:
The k-means algorithm and its neural implementation, the Kohonen net, are most
successfully used on large data sets. This is because k-means algorithm is simple to
implement and computationally attractive because of its linear time complexity. However,
it is not feasible to use even this linear time algorithm on large data sets. (Jain et al, 1999)
With the self-organising map approach small zones and number effects can be readily
handled so that classification is performed on noisy data. (Openshaw and Turton, 1996)
TS 47 – GIS Applications – Special Issues
Ece Aksoy
Clustering With GIS: An Attempt to Classify Turkish District Data
Shaping the Change
XXIII FIG Congress
Munich, Germany, October 8-13, 2006
5/16
2.2 Clustering With GIS
2.2.1 Usage of GIS Packages in Clustering Analysis
There are various GIS packages in the market. GIS packages, such as ArcInfo, have very
good facilities for many types of analysis, but are currently weak in the statistical analysis of
spatial data and the use of scientific visualisation techniques. In most GIS packages spatial
analytical functionality, lies mainly in the ability to perform deterministic overlay and buffer
functions. There is a need to develop clustering analysis function of GIS packages. The
integration of GIS with analytical techniques will be a valuable addition in GIS toolbox.
Progress in clustering area is inevitable and future developments will continue to place
increasing emphasis upon the analytical capabilities of GIS.
2.2.1.1 ArcGIS-Districting Module
The Districting Extension for ArcGIS allows you to create defined groupings of geographic
data, such as Census tracts, ZIP Codes, and precincts, by creating a districting plan. The
Districting Extension has a simple user interface for fast configuration of geographic
representation and analysis of configuration alternatives. Once you have the base data
established, such as counties or ZIP Codes, you can group the units by simply selecting them
in the ArcGIS. The Districting Extension can help you analyze population densities, housing
breakdowns, income and race statistics, and other data.
During the districting or redistricting process, statistics are updated for each selection of
source geography units. You can assign the selected units immediately to the district or
preview the statistics before making the assignment to the district.
2.2.1.2 Crimestat II- Hotspot Analysis II- K-Means Clustering
CrimeStat is a spatial statistics package that can analyze crime incident location data. Its
purpose is to provide a variety of tools for the spatial analysis of crime incidents or other point
locations. It can interface with most desktop geographic information systems. It is designed to
operate with large crime incident data sets collected by metropolitan police departments.
However, it can be used for other types of applications involving point locations, such as the
location of arrests, motor vehicle crashes, emergency medical service pickups, or facilities
(e.g., police stations).
The K-means routine in CrimeStat makes an initial guess about the K locations and then
optimizes the distribution locally. The procedure that is adopted makes initial estimates about
location of the K clusters (seeds), assigns all points to its nearest seed location, re-calculates a
center for each cluster which becomes a new seed, and then repeats the procedure all over
again. The procedure stops when there are very few changes to the cluster composition. The
default K-means clustering routine follows an algorithm for grouping all point locations in to
one, and only one, of these K groups. Finally, the K-means clustering routine (Kmeans)
outputs clusters as ellipses.
TS 47 – GIS Applications – Special Issues
Ece Aksoy
Clustering With GIS: An Attempt to Classify Turkish District Data
Shaping the Change
XXIII FIG Congress
Munich, Germany, October 8-13, 2006
6/16
2.2.1.3 Matlab 6.2 – SomToolbox2
Public-domain software package that was intended for a general purpose SOM development
tool is the SOM Toolbox for the Matlab computing environment by the researchers of the
Laboratory of Computer and Information Science of Helsinki University of Technology.
“The Toolbox contains functions for the creation, visualization and analysis of Self-
Organizing Map.” “The Toolbox can be used to preprocess data, initialize and train SOMs
using a range of different kinds of topologies, visualize SOMs in various ways, and analyze
the proportias of the SOMs and data….” [1]
Because Matlab features a high-level programming language, powerful visualization,
graphical user interface tools and a very efficient implementation of matrix calculus, SOM
Toolbox can be used with Matlab program.
3. DATA AND PROCESSING
3.1 Geographical Data
Classification experiment in this thesis was carried out for Turkey’s district data. While
performing clustering algorithm on the data, some limitations are inevitable. All collected
geographical data were assumed as limitations for clustering. There are seven types of maps
were used in this study;
District Map (Polygon Data)
Center of the Districts Map (Point Data with X, Y Coordinates)
Contour Maps (Polygon Data)
Turkey’s Border Map
300 Contour Map
600 Contour Map
1200 Contour Map
1800 Contour Map
2400 Contour Map
3000 Contour Map
3600 Contour Map
Physical Map of Turkey (Image from Mineral Research and Exploration, MRE)
Active Faults of Turkey Map (Image from Mineral Research and Exploration, MRE)
Geographical Borders Map (1st Congress of Geography, 1943, Polyline Data)
Basin Development Plans Maps (Polyline Data)
3.2 Tabular Data
It was possible to find information about 142 variables at the district level. Some of the data
were not be used because of changing number of districts in 1990 and 2000 censuses. Some of
the data also were not be used because of the study contents. 36 indicators were selected in
point of view of consistency and reliability of indicators from between the whole data.
TS 47 – GIS Applications – Special Issues
Ece Aksoy
Clustering With GIS: An Attempt to Classify Turkish District Data
Shaping the Change
XXIII FIG Congress
Munich, Germany, October 8-13, 2006
7/16
Table 1: Selected Variables
KOD VARIABLE YEAR UNIT
A3 Annual Growth Rate Of Population in 1990-
2000 Period
1990-2000 PERCENT%
A4 Annual Growth Rate Of Urban Population
Of District in 1990-2000 Period
1990-2000 PERCENT%
A6 Urban Population Percentage 2000 PERCENT%
A11 Total Population Of District 2000 PERSON
A14 0-4 Age Population Percentage 2000 PERCENT%
A15 65+ Age Population Percentage 2000 PERCENT%
A16 0-14 Age Population Percentage 2000 PERCENT%
A17 15-64 Age Population Percentage 2000 PERCENT%
A18 Population Density 2000 PERSON/km²
AP1 Young Dependency Rate 2000 PERCENT%
AP2 Old Dependency Rate 2000 PERCENT%
AP3 Total/Population Dependency Rate 2000 PERCENT%
B16 Illiterate Population Percentage 2000 PERCENT%
B17 Illiterate Men Population Percentage 2000 PERCENT%
B18 Illiterate Women Population Percentage 2000 PERCENT%
B19 Primary Education Population Percentage 2000 PERCENT%
B110 High-School Education Pop. Percentage 2000 PERCENT%
F1 Book Number of Public Library per
thousand Person
2000 PER THOUSAND
F2 Literacy Rate Who Utilize Public Library 2000 PERCENT%
G3 Gross Domestic Product Per Capita 1996 USA DOLLAR
H12 Proportion of H1 to HIF (Economically
Active Population)
2000 PERCENT%
H22 Proportion of H2 to HIF 2000 PERCENT%
H32 Proportion of H3 to HIF 2000 PERCENT%
H42 Proportion of H4 to HIF 2000 PERCENT%
H52 Proportion of H5 to HIF 2000 PERCENT%
H62 Proportion of H6 to HIF 2000 PERCENT%
H72 Proportion of H7 to HIF 2000 PERCENT%
H82 Proportion of H8 to HIF 2000 PERCENT%
H92 Proportion of H9 to HIF 2000 PERCENT%
H102 Proportion of H10 to HIF 2000 PERCENT%
H111 Workers Who Work Outside of Agricultural
Activities
2000 PERCENT%
H222 Women Workers, Who Work Outside of
Agricultural Activities, Rate
2000 PERCENT%
H333 Proportion of Total Workers to
Economically Active Population
2000 PERCENT%
I1 Average Household Size 2000 PERSON
Y61 Total Bed Number per Thousand Person 2002 PER THOUSAND
Y91 Total Doctor Number per Thousand Person 2002 PER THOUSAND
TS 47 – GIS Applications – Special Issues
Ece Aksoy
Clustering With GIS: An Attempt to Classify Turkish District Data
Shaping the Change
XXIII FIG Congress
Munich, Germany, October 8-13, 2006
8/16
3.3 Database Management Processing
Which steps were passed through in database management process was explained in schema.
Schema 1: Database Management Process
GENERATION OF THE GEOGRAPHIC DATABASE
OUTPUT
There were many problems while entering the data because of the large number of districts-
there are 923 districts in Turkey. Moreover, there were also problems about the time series
data. GIS allows us to use attribute data with joining geographic data. SPSS was used to
perform PCA. PCA analysis was performed for data standardization and data reduction.
Principal component analysis on these 36 input variables shows that 9 principal components
explain more than 79% of the variance, and the variables load nicely on each component,
making them interpretable. The principal component analysis suggests that the young
population and employment variables in the data set (Component 1) is the most important in
the classification, explaining 33,2% of the variance in the data.
Tabular data were collected.
Collected data were processed.
Processed data were grouped.
Almost all of the grouped data were
become proportional.
Selection of Variables
PCA was done
Addition or Abstraction of Variables
PCA was done
9 Components Were
Getting After PCA
Analysis
TS 47 – GIS Applications – Special Issues
Ece Aksoy
Clustering With GIS: An Attempt to Classify Turkish District Data
Shaping the Change
XXIII FIG Congress
Munich, Germany, October 8-13, 2006
9/16
4. METHOD AND ANALYSIS
4.1 Method
Two different methods and algorithms are chosen and used in this study with the aim of
forming different alternatives for Turkey’s district clustering. Outputs, which were obtained
as a result of each clustering technique, were clustered in the Districting Module. In other
words, each clustering techniques was applied individually and their outcomes were clustered
in the Districting module. In addition, geographical data were used as layers for each method
in the Districting module. Each method contains different software. Applied methods were
described detailed in Schema 2.
4.2 Analyses
There are mainly two types of algorithm and methods, and three analyses in this section. Two
outputs of these analyses are clustered by using ArcGIS Districting module. These outputs
are:
CrimeStat K-Means Classification Analysis Map (Polygon Data)
Matlab SOM Algorithm Map (Thematic Map, Polygon Data)
There is an important point to in specifying the number of clusters. NUTS population
thresholds were taken as a limitation for district classification and this helped us for
specifying number of clusters.
4.2.1 Method 1: Clustering through K-Means in CrimeStat and Using Its Output in
Districting Module
Component 1, which was obtained from PCA, was used in K-Means Clustering analysis in
CrimeStat. Component 1 was joined with center of the districts map database by using
districts code and then it was used in analysis.
Figure 1: CrimeStat Analysis Map
TS 47 – GIS Applications – Special Issues
Ece Aksoy
Clustering With GIS: An Attempt to Classify Turkish District Data
Shaping the Change
XXIII FIG Congress
Munich, Germany, October 8-13, 2006
10/16
Schema 2: Methods
METHOD 1 METHOD 2
Inputs Input
Center of the
districts map
PCA All
Join
CrimeStat
K-Means
Clustering
Matlab
SOM
Algorithm
New Variable
(U-Matrix)
SPSS K-
Means
Clustering
ARCGIS DISTRICTING MODULE
Contour
Maps
Physical
Map of
Turkey
Active
Faults of
Turkey
Map
Geographic
Borders
Map
Basin
Development
Plans Map
Thematic Map of
this
j
oinin
g
Center of the
districts map
PCA
Component
1
TS 47 – GIS Applications – Special Issues
Ece Aksoy
Clustering With GIS: An Attempt to Classify Turkish District Data
Shaping the Change
XXIII FIG Congress
Munich, Germany, October 8-13, 2006
11/16
There is need to see which points relate which of those ellipses. Because the K-means
analysis of CrimeStat software assigns points to one, and only one, cluster, thematic map of
the ellipses can be performed by joining output of this analysis with the District map in
ArcGIS software. Finally, all points were assigned to clusters that are polygon. It is seen that
from the Figure 1 different districts in different ellipses have different color.
Figure 2: Thematic Map of CrimeStat Analysis Map
This CrimeStat thematic map, center of the district thematic map, other geographical data and
districting plan that was created for this analysis output were opened in the same view as seen
in Figure 2.
Figure 3: All Layers In the Same View Before Starting to Create CrimeStat Groupings
Clustering was performed in ArcGIS Districting environment by looking same characteristics
with showing by color. Selecting, assigning and creating new districts steps were performed.
Figure 4 is the final districts according to CrimeStat software classification method. There are
84 new districts in after classification.
TS 47 – GIS Applications – Special Issues
Ece Aksoy
Clustering With GIS: An Attempt to Classify Turkish District Data
Shaping the Change
XXIII FIG Congress
Munich, Germany, October 8-13, 2006
12/16
Figure 4: Final Districts According to Method 1
4.2.2 Method 2: Using SOM Algorithm for Clustering in Matlab Environment and Using Its
Output in Districting Module
Tabular data with 36 variables was used in this analysis. Afterward, relevant codes to make
SOM were written. After these steps, the SOM is visualized using related codes. The U-
matrix is shown along with all component planes.
Figure 5: Visualization of the SOM of Turkey’s Districts Data.
However, clustering and U-matrix visualizations are schematic; these are not maps. SOM
assigns new values for each unit and district after processing. These new values are the final
attribute of combination of all variables. But there is a problem of the step that joining with
the final output of SOM algorithm and geographical data. After the making SOM algorithm
procedure there is also something to need to visualize those values. Those new values for each
district were used in SPSS and clustered. Finally, this new database joined with district map
and thematic map of SOM Algorithm Map was made.
TS 47 – GIS Applications – Special Issues
Ece Aksoy
Clustering With GIS: An Attempt to Classify Turkish District Data
Shaping the Change
XXIII FIG Congress
Munich, Germany, October 8-13, 2006
13/16
Figure 6: Final Thematic Map of SOM Algorithm Classification
Figure 7: All Layers In the Same View Before Starting to Create SOM Algorithm Groupings
Figure 8 is the final districts according to SOM Algorithm classification method. There are 87
new districts in after classification.
Figure 8: Final Districts According to Method 2
TS 47 – GIS Applications – Special Issues
Ece Aksoy
Clustering With GIS: An Attempt to Classify Turkish District Data
Shaping the Change
XXIII FIG Congress
Munich, Germany, October 8-13, 2006
14/16
5. CONCLUSION
84 districts were obtained in Method 1 by using K-Means algorithm of the CrimeStat software
and 87 districts were obtained in Method 2 by using the SOM algorithm in Matlab software.
Each grouping is differentiated in each Method, as seen in the Figure 9.
Method 1 has more regular in terms of size of the new clusters because each district assign to
the group that has the closest centroid at the same distance from the center of the centroid in
Method 1. Method 2 has different cluster sizes.
METHOD1
METHOD2
Figure 9: Final Districts for each Method one by one.
Method 1 was the most quicker and easy method because; it was already groupings thanks to
software ‘spatial’ clustering routine in CrimeStat classification, Method 1. The routine tries to
find the best positioning of each center and then assigns each point to the center that is
nearest. Those groupings only were divided according to population thresholds actually. For
example, there was only one grouping for total Istanbul after classification analysis according
to closeness, finally there were 6 new districts after districting. But this method is over the
limit of population thresholds.
Method 2, SOM Algorithm classification, had good suited and visibly groupings and had best
logical classification. It was easy to help geographical limitations while clustering Method 2
outputs, in other words, the best clusters, which were the best fit with the geography, were
obtained by the Method 2. Because the SOM algorithm is also for ‘spatial clustering’, while it
TS 47 – GIS Applications – Special Issues
Ece Aksoy
Clustering With GIS: An Attempt to Classify Turkish District Data
Shaping the Change
XXIII FIG Congress
Munich, Germany, October 8-13, 2006
15/16
is calculating new values for total of the data it takes into consideration being neighbor. These
new values are the final attribute of combination of all variables. But there is a problem of the
step that joining with the final output of SOM algorithm and geographical data. After the
making SOM algorithm procedure there is also something to need to visualize those values.
SPSS K-Means classification was used to classify those new values. However, there is need to
improve new method to join with SOM algorithm output and geography to better
visualization.
The use of GIS as a visual tool allows the researcher to explore statistical output that would
otherwise be difficult to interpret. Spatial analysis has been used in various fields. However,
the connection to GIS has only recently emerged. Since the GIS community grows larger, the
need to perform spatial statistical analysis on GIS data will become greater. For that reason, it
is critical to integrate spatial statistical functions into GIS. This is also very important for
social and economical analysis, like generating ‘regions aiming for statistic only’ proposal,
since data are aggregated for different geographical areas or zones, like census tracts and
counties. In addition, there is a need for improving clustering analysis in GIS.
Kohonen’s self-organizing maps can be used effectively in geo-demographic studies and
spatial clustering. Although it has some disadvantages, it can be coping with spatial data’s
problems and it provides significant easiness. More productive studies can be made and
disadvantages or insufficiencies of Kohonen algorithm can be eliminate with integrating and
developing algorithm with other methods by taken Kohonen’s algorithm as the basis.
Furthermore, there is need to improve new method to join with SOM algorithm output and
geography to better visualization. Because, there is a problem of the step that joining with the
final output of SOM algorithm and geographical data. After the making SOM algorithm
procedure there is also something to need to visualize those values. SPSS K-Means
classification was used to classify those new values.
REFERENCES
Diansheng, G. (2002) “Spatial Cluster Ordering and Encoding for High-Dimensional
Geographic Knowledge Discovery”, UCGIS2, Summer, 2002
Available at: http://www.cobblestoneconcepts.com/ucgis2summer2002/guo/guo.html (Last
accessed date is 30 May 2004)
Jain, A.K., Murty, M.N. and Flynn, P.J. (1999) “Data Clustering: A Review”, ACM
Computing Surveys, Vol. 31, No. 3, September 1999
Kohonen, T. (2001)”Self-Organizing Maps”, Springer Series in Information Sciences, 3rd
Edition, Berlin, Germany, 2001
Openshaw, S. and Turton, I. (1996)“A parallel Kohonen algorithm for the classification of
large spatial datasets”, Centre for Computational Geography, School of Geography,
Leeds University, Leeds, Taylor and Francis, London, 1996
WEB Adresses: [1] http://www.cis.hut.fi/projects/somtoolbox (Last accessed 15 June 2004)
TS 47 – GIS Applications – Special Issues
Ece Aksoy
Clustering With GIS: An Attempt to Classify Turkish District Data
Shaping the Change
XXIII FIG Congress
Munich, Germany, October 8-13, 2006
16/16
CONTACTS
Ece Aksoy
Akdeniz University
Akdeniz University Agricultural Faculty Soil Department 07070
Antalya
TURKEY
Tel. + 90 2423102411/6537
Fax + 90 2422274564
Email: eceaksoy@hotmail.com
... In absence of a globally optimal method, the choice primarily depends on the researcher, subject to the specific problem being solved [9]. A focused review of the clustering techniques suited for spatial data suggested frequent usage of techniques ranging from traditional k-means clustering [10,12,19], DBSCAN [4] to more recent evolutionary approaches such as genetic graph-based clustering [22], medoid-based clustering using ant colony optimization [23] and hybrid approaches such as adaptive k-means for overlapped graph clustering [15]. ...
... The evolutionary approaches often generated superior results empirically due to their flexi- bility in approximating any arbitrary non-linear regions [22], while k-means offered simplicity and better interpretability over other advanced techniques such as self-organizing maps [10]. As a result, k-means clustering was chosen for unsupervised learning in this pilot study with a view that more advanced techniques could be leveraged in subsequent phases to drive and quantify further improvements. ...
Conference Paper
Full-text available
Agricultural activities in India are heavily reliant on the monsoon rainfall during July–September every year. Indian Meteorological Department has been issuing rainfall forecasts since 1886. These predictions at a country or broad region level have limited benefits since different areas may see wide variations even when the overall average for India remains stable. This study explored possibilities of creating a cluster of districts as a more granular yet cohesive unit for rainfall forecast, by using different weather and atmospheric variables for past 12 months. Analytically, Principal Component Analysis (PCA) was used to reduce data dimensionality before creating an optimal cluster solution. Subsequently, a set of cluster-level linear regression models was found to perform better than a single regression model based on the entire sample. While district-level predictions showed limited value, the sequential combination of unsupervised and supervised techniques showed promising results at an overall level. These results will serve as a strong baseline for the planned extension of this pilot study which will use advanced machine learning techniques to improve upon the prediction performance further.
... Cluster analysis was conducted using an extension of GIS, and the cluster calculation was based on the K-means method. This method uses an algorithm that assigns each point to the cluster whose center, or known centroid, is nearest [18]. The center is the average of all the points in the cluster, and the coordinates of the points are the arithmetic mean for each dimension separately over all the points in the cluster. ...
Article
Full-text available
Green algae phytoplankton (Chlorophyceae) have a wide aquatic distribution, including saltwater and freshwater environments. Compared to the ones living in saltwater, green algae diversity in freshwater ecosystems in rivers is influenced by stream gradients, water quality, and land uses. Meanwhile, in Jakarta, 17 rivers have the potential to provide a habitat for green algae communities. Due to anthropogenic activities, river streams have been affected by influences that may affect the water quality and green algae community along stream gradients. One of the critical rivers in Jakarta is the Krukut river, which has the most extended stream spanning over 40 km and downstream in Jakarta bay. This study aims to model the diversity and distribution pattern of green algae in the Krukut river from its upstream segment in Jakarta city, surrounded by settlements, to the downstream segments in Jakarta bay. The distribution model uses the Cluster Analysis and Markov Chain Model to elaborate the probabilities of green algae phytoplankton distribution in downstream, midstream, and upstream segments of the Krukut river. The results show that 7 species of Chlorophyceae have been recorded in the Krukut river. All species had a high likelihood of being found downstream, particularly Cosmarium sp., Eudorina sp., Spyrogyra sp., and Volvox sp. Regarding distribution, all phytoplankton species have a high probability (4%–31%) and tendency to be distributed from upstream and midstream to downstream rather than from downstream to midstream and upstream, with probability ranges of 2%–27%. The probability and tendency of phytoplankton distribution towards downstream directions avoiding upstream were related to the deteriorating water quality in the upstream, characterized by high turbidity, low dissolved oxygen, and more acidic water.
... Cluster analysis was conducted using an extension of GIS and the cluster calculation was based on the K-means method. This method uses an algorithm that assigns each point to the cluster whose center or known centroid is nearest (Aksoy, 2006). The center is the average of all the points in the cluster, and the coordinates of the points are the arithmetic mean for each dimension separately over all the points in the cluster. ...
Preprint
Full-text available
Urban areas in the Southeast Asia Region are characterized by rainfall, river networks, and rodent infestations. Combinations of these adverse conditions will lead to the increasing risk of leptospirosis as usually contained in rodents. Then this study aims to assess the spatial pattern of rodent infestations and estimate the potential leptospirosis risks using environmental variables including distance to the river and rainfall in a city in SE Asia. The spatial modeling of rodent infestations was developed based on GIS and interpolation analysis. Meanwhile, the cluster modeling of rodent infestations was developed using the K-means clustering method. The results revealed the rodent infestations represented by two rodent species were Rattus rattus and Rattus norvegicus. R. rattus has a higher abundance than R. norvegicus. In contrast, R. norvegicus has wider distribution areas than R. rattus. Regarding the distribution areas, both species overlapped in the Southern parts of the city. R. rattus and R. norvegicus showed a distinct cluster characterized by a high rodent population with affinity for the nearest river, and this indicates the urban inhabitants near the river have more leptospirosis risk. The model of leptospirosis risks estimated an urban area of 35.182 km2 or 17.56% having leptospirosis potential.
... Hence, the functionality Another related work is done by ESRI itself, ESRI [23] declared that it had developed GIS interactive map to map Zika virus infections within US. Ece [24], had discussing clustering with GIS by two different methods to classify Turkish District. Asad and others [25] processed HCV data gathered from Ghadezai Tehsil in Pakistan in GIS platform to do a spatial analysis of the infections. ...
Article
Lots of studies worldwide have been carried out to check out the prevalence of Hepatitis C Virus (HCV) in human populations. Spatial data analysis and clustering detection is a vital process in HCV monitoring to discover the area of high risk and to help involved decision makers to draw hypotheses about the cause of disease. Egypt is declared as one of the countries having the highest prevalence rate of HCV worldwide. The anomaly of the HCV infection's distribution in Egypt allowed several researches to identify the reasons that contributed to such widespread of HCV in this country. One way that can help in identification of areas with highest diseases is to give a detailed knowledge about the geographical distribution of HCV in Egypt. To achieve that goal,
... Phân cụm được coi như một công cụ độc lập để xem xét phân bố dữ liệu, làm bước tiền xử lý cho các thuật toán khác. Phân cụm ứng dụng rất nhiều trong các lĩnh vực như phân tích hình ảnh (Pappas, 1992), thông tin địa lý (Aksoy, 2006), khai phá web (Runkler & Bezdek, 2003) … Phương pháp phân cụm mờ là sự kết hợp của kỹ thuật phân cụm với lý thuyết mờ của Zadeh (1965) đang phát triển và được ứng dụng rộng rãi trong thực tiễn, ví dụ như phân tích rủi ro, dự báo nguy cơ phá sản cho ngân hàng và nhiều bài toán khác. Nhưng những vấn đề được quan tâm nhiều vẫn là nâng cao chất lượng phân cụm (Chen & Ludwig, 2014), tính toán thông qua một số độ đo chất lượng cụ thể… Và một số nghiên cứu ứng dụng thuật toán tối ưu tiến hóa phân cụm mờ như là nghiên cứu phân cụm mờ bằng PSO (Runkler & Katz, 2006) Giải bài toán ( , ) min m J u z  với ràng buộc sau: ...
Article
Full-text available
Bài báo này ứng dụng thuật toán tối ưu tiến hóa bầy đàn mờ cho bài toán phân tích nhu cầu khách hàng. Đây là bài toán có ý nghĩa ứng dụng lớn trong hoạt động sản xuất kinh doanh. Áp dụng thuật toán tối ưu tiến hóa bầy đàn mờ vào bài toán cụ thể là một công ty chuyên cung cấp thiết bị y tế của Mỹ muốn phân tích nhu cầu 500 bệnh viện trong khu vực về các thiết bị và vật tư y tế, hỗ trợ công ty đưa ra chiến lược kinh doanh phù hợp nhất với từng bệnh viện để đạt doanh thu cao.
... Bununla birlikte son zamanlarda mekânsal veri analizi teknikleri kullanılarak bazı önemli çalışmalar yapılmıştır (Özgür ve Aydın, 2011, s. 30). Bu çalışmalar; nüfus, göç, bölge bilimi, çevre, siyaset bilimi, ekonomi alanlarında, Özgür ve Aydın (2011), Yakar (2011), Yakar (2013), Gezici ve Hewings (2002), Aksoy (2006), Işık ve Pınarcıoğlu (2006), Tat (2008), Yıldırım, vd., (2009), Çelebioğlu ve Dall'erba (2009), Işık ve Pınarcıoğlu (2010), Keser (2010), Tağıl ve Alevkayalı (2013) tarafından yapılmıştır. Turizm alanında mekânsal veri analizi teknikleri kullanılarak yurtdışında bazı çalışmalar, Weng ve Yang (2007), Vasiliadis ve Kobotis (1999), Beedasy ve Whyatt (1999) tarafından yapılsa da, dünyanın en önemli turizm destinasyonlarından biri olan Türkiye'de bu tekniklerin turizm çalışmalarında yeteri kadar kullanılmadığı görülmektedir. ...
Article
Full-text available
Turizmin mekânsal dağılımında zamanla ortaya çıkan değişimin gözlemlenmesi ve etkilerinin doğru yorumlanması, turizm politikaları ve turizm planlaması uygulamalarına daha fazla katkı sağlayacaktır. Bu çalışmada, Türkiye’de ilçelere göre konaklama sayılarının 2000-2013 yılları arası mekânsal dağılımı analiz edilmiştir. Çalışma, 2000-2013 yılları arasında ilçelere göre konaklamanın boyutunu, dağılımını, mekânsal kümelenmesini ve kümelenmenin geçen 14 yıllık süreçteki değişimini ortaya koymak amacıyla hazırlanmıştır. İlçelere göre konaklama sayıları verileri Kültür ve Turizm Bakanlığı ve TÜİK’den elde edilmiştir ve 2000-2013 yılları arasını kapsamaktadır. Verilerin zamansal ve mekânsal analizi ArcGIS 10.1 programı kullanılarak gerçekleştirilmiştir. 2000, 2005 ve 2013 yıllarında bakanlık belgeli tesislerde konaklayan turist sayılarının ilçelere göre dağılışı haritalandırılmıştır ve ilçelerin turizm gelişim modeli oluşturulmuştur. Ortalama Merkez, Ağırlıklı Ortalama Merkez, Standart Mesafe ve Standart Sapma Elipsi kullanılarak ilçelere göre konaklamanın mekansal dağılımı değerlendirilmiştir, ayrıca Mekânsal Otokorelasyon (Moran’s I, LISA, Getis Ord Gi*) yöntemi kullanılarak konaklamanın mekansal kümelenmesi analiz edilmiştir.
... The cluster dynamics obtained from the different clustering results (starting from 2 clusters up to 60 clusters) were observed, and analysed; eventually, working with six clusters was identified as optimum for the K-means clustering analysis in the framework of this study. In data mining, k-means clustering is a cluster analysis method which aims at partitioning n observations into k clusters in which each observation belongs to the cluster with the nearest mean (Aksoy, 2006). K-means clustering is an iterative algorithm and, in general, rather easy to implement and apply even on large data sets. ...
Technical Report
Full-text available
Executive Summary Context and objectives Today, 52 % of the global population lives in cities, another 33 % in towns and suburbs, a trend that is going to continue (EC and UN-Habitat, 2016). On a European level, around 72 % of the population is already living in cities and towns, also with the expectation to increase (EC and UN-Habitat, 2016). Compared to the global situation, however, Europe is characterised by a much higher number of medium- and small-sized cities (EC and UN-Habitat, 2016). In this context, the main challenge ahead is to find a way to accommodate a greater number of people while at the same time reducing impacts upon and from the environment and improving the quality of life of cities’ residents. This report makes an attempt to characterise 385 European cities with respect to their main environmental and socio-economic conditions. To this end, we identified and selected 41 parameters from different thematic domains (urban dimension and land use, urban form and distribution, climate, socio-economics, waste, water, air quality, transport and mobility, as well as governance) and calculated clusters of cities based on those parameters. The resulting typology should help to analyse the characteristics of cities in similar situations (i.e. cities from the same group or cluster) because there are simply too many cities in Europe. An individual analysis of each city would not provide the information needed at the European level to lay the ground for appropriate policy- and decision-making. The study is to a certain extent data-driven, i.e. the final selection of data has pragmatically been led by their availability, reliability, quality and the time period. The general reference year is 2012 whereas information changes cover the period from 2006 to 2012. However, although data driven, the analysis covers enough important fields to give an idea of the environmental performance of the studied cities and, more broadly, their sustainability. The report is intended as ‘food for thought’ and information source for policy- and decision-makers at national, sub-national and municipal levels, and for researchers and interested citizens alike. The report highlights the strengths and weaknesses of each of the city clusters. In addition, cities that are member of a certain group get to know their positioning with respect to other cities and groups. Altogether, this allows cities on the one hand to assess their own situation and on the other hand to compare themselves to other cities in similar situations or to cities of similar general characteristics that have taken a different development path. There is currently no regular environmental reporting on urban areas and this report intends to fill a knowledge gap. It can be considered as the first step of a long process of the analysis of the environmental performance of cities over time. The basics of urban sustainability Sustainable development should meet “the needs of the present without compromising the ability of future generations to meet their own needs” (Brundtland Commission, 1987). In that sense, the challenge of urban sustainability is to meet the needs of current and future inhabitants without imposing unsustainable demand on local and global resources and without exporting pollution and waste (Alberti, 1996). Assessing circular economy aspects, an ideal sustainable city would be one for which the inflow of material and energy resources, and the disposal of wastes, do not exceed the capacity of the city’s surrounding environment (Kennedy et al., 2007). In the context of further increasing numbers of urban dwellers both in Europe and globally, this also means to decouple the expected growth from resource use. However, urban systems are inherently complex which needs to be recognised in order to properly address sustainability challenges. The urban system is a socio-ecological-technical system (McPhearson et al., 2016) that is characterised by the impact of society, i.e. the inhabitants and their lifestyles and demands, the natural environment as basis for the provision of much needed ecosystem services, and the technical responses and infrastructures in the cities. Today, European cities face a number of challenges that pose a risk to their sustainable development. These are related to health (in particular health risk due to poor air quality and noise pollution), the urban environment (e.g. high use of natural resources, waste or land consumption by urban sprawl) and climate change and the need to adapt. To ensure or increase the quality of life of their citizens, policy- and decision makers need to respond to these challenges by identifying appropriate solutions and provide the regulatory basis. Urban planning and policies play a fundamental role in the way forward. This report and the typology aim at providing a contribution to the information that is required to be able to respond appropriately. Selected findings Cities can be more or less similar. When looking at cities at a European perspective, some cities have enough properties in common to be considered as having roughly a comparable potential of transformation. This coarse assumption is acceptable at a European scale given the high number of cities and the lack of information on them. A typology in general is a system to put specific objects into groups based on similarities. In this report, 385 cities have been grouped, using the 41 parameters and a clustering algorithm, into five clusters or groups of cities (four bigger and a smaller one) and three sub-clusters for each big cluster. The sixth cluster consists of London alone as a one-city cluster and is therefore considered as an outlier rather than a separate cluster. The typology is understood as both quantitative and qualitative characterisation of cities, which should be structured in hierarchical systems providing a broad view on cities, their situation and basic functions, their individual performance and main activities, their threats and their most important changes (i.e. potential pressures and development paths). The major difficulty of this approach was to find comparable and relevant data for the same time period and for a significant number of cities. The final selection of indicators and analysed cities was driven by data availability rather than by analysing all dimensions of urban sustainability. However, given the large number of data and covered domains, this approach can be considered as a good approximation to analysing urban sustainability. The Urban Audit database (1) and the Copernicus Urban Atlas (2) are the main sources of data. They both cover the same number of cities and the same areas. In 2006, Urban Audit and Urban Atlas included 321 Larger Urban Zones from EU-27; in 2012, 695 Functional Urban Areas (most of EU-28 cities over 50,000 inhabitants) are covered. Looking at the results of the typology, it becomes clear that all five (or six, if London is counted) clusters show specificities that differentiate them from one another and allow creating an interesting picture of European cities. First of all, London always appears as a stand-alone city that does not belong to any of the other clusters, irrespective of how many clusters were used during the calculation of the typology. This means that London possesses many strong characteristics that sets it apart from all other cities: size, number of inhabitants, urban sprawl problems, relatively few green spaces, high levels of soil sealing. Therefore, London is not considered to be a cluster, but can be counted as an outlier city when compared to all other clusters. Two clusters have a very strong geographic pattern that is directly related to climatic, political and socio-economic impacts on those cities that shaped their urban development in the recent and more distant past. Cluster A is almost exclusively composed of cities from the former Eastern Bloc, i.e. the former socialist or communist countries in eastern Europe. In the past years, they all experienced a strong population loss and today consist of a relatively young population. The common political past is obviously the main reason for them being in one cluster. Only the capital cities of four of those countries (i.e. Warsaw, Prague, Budapest and Bucharest) have managed to develop into economically attractive metropolitan cities and are therefore located in the Cluster E. The second cluster with a clear geographical pattern is Cluster B that consists of cities of three Mediterranean countries, Portugal, Spain and Italy. Due to the urban development history, Mediterranean cities tend to be very compact and are very much characterised by their specific climatic conditions. However, it is likely that the most determinant factor for their grouping is the impact of the financial crisis of the years 2007 to 2009 on their inhabitants. The cities of this cluster have the highest unemployment rates, the strongest decrease in their government effectiveness and the highest old-age dependency, i.e. the highest proportions of older citizens. By consequence, they also possess the highest at-risk of-poverty rate. While the largest Cluster C is the most heterogeneous one and does also not show a clear geographical pattern, it is the group with the highest share of green spaces, but at the same time experiencing a sprawling, low-density development pattern. On the other hand, Cluster D, which is also geographically heterogeneous, coincides with Europe’s most prosperous regions, possesses the highest government effectiveness index and is the only cluster with cities which do not have an aging population. This is most likely due to their attractiveness as university and economically active cities. Finally, Cluster E consists of some of the biggest, mostly capital cities in Europe and shows the lowest unemployment and at-risk-of-poverty rates, so is a kind of counterpart to Cluster B. With only 14 cities, it moreover could be considered a second-tier cluster to the London outlier as these cities also have very remarkable and similar characteristics that set them apart from the large majority of the cities, but group them into a distinct cluster. Concluding, this study provides extensive and relevant information for filling knowledge and information gaps on the environmental performance of cities on a European level using cluster analysis, typologies and indicators. Therefore, it supports both the 7th EAP priority objective 5 on the need for knowledge and information and priority objective 8 asking for the development of a set of indicators for urban sustainability. Because of several, to a large extent data-related issues, further research is needed. However, this study provides a sound basis for European analysis and follow-up work.
... Ancak, Türkiye'de sosyal bilimler araştırmalarında mekânsal istatistik araçları çok yaygın bir kullanıma sahip değildir. Son zamanlarda mekânsal veri analizi tekniklerine odaklanarak yapılmış bazı ampirik çalışmalar, Gezici ve Hewings (2002), Aksoy (2006), Işık ve Pınarcıoğlu (2006), Yıldırım, vd., (2009aYıldırım, vd., ( , 2009b, Çelebioğlu ve Dall'erba (2009) ...
... The analyses in this study have been conducted by using spatial data analysis (SDA) techniques, which are common in social science studies (Aksoy, 2006;Chi & Zhu, 2008;Ç elebioglu & Dall'erba, 2009;Ezcurra, Pascual, & Rapún, 2007;Gallo & Ertur, 2003;Gezici & Hewings, 2002;Işık & Pınarcıoglu, 2006, 2010Johnson, Voss, Hammer, Fuguitt, & Mcniven, 2005;Kalogirou & Hatzichristos, 2007;Muniz, 2009;Yıldırım, Ö cal, & Korucu, 2009;Yıldırım, Ö cal, & Ö zyıldırım, 2009). However, there seems to be no study in the literature that uses SDA techniques in marriage migration. ...
Article
Full-text available
Even though recent societal phenomena such as a heightened sense of individualism, economic well-being, and institutionalization have caused a shift in people's life course and made their lives less standard in developed societies, marriage still remains an important life course event in Turkey. Between the years 1995 and 2000 marriage migration comprised 7.4% of the interprovincial migration in Turkey, and of these marriage migrants 94% were woman. Young Turkish women tend toward marriage migration to escape the patriarchal family structure and gain more autonomy, economic security, and well-being. Focusing on the spatial patterns of marriage migration relationships in Turkey, this study seeks to reveal the economic and sociocultural background of male and female marriage migration and to visualize, explore, and model spatial data by using spatial data analysis (SDA) techniques. The results showed that marriage migration in different regions of Turkey varied by gender. Even though SDA techniques have previously been used in other social sciences studies, no other marriage migration study in the literature uses these techniques, thus enabling the article to contribute to the literature.
Conference Paper
Full-text available
Self-organizing maps have a connection with traditional vector quantization. A characteristic which makes them resemble certain biological brain maps, however, is the spatial order of their responses which is formed in the learning process. Two innovations are discussed: dynamic weighting of the input signals at each input of each cell, which improves the ordering when very different input signals are used, and definition of neighborhoods in the learning algorithm by the minimum spanning tree, which provides a far better and faster approximation of prominently structured density functions. It is cautioned that if the maps are used for pattern recognition and decision processes, it is necessary to fine-tune the reference vectors such that they directly define the decision borders.< >
Article
The paper describes the development of Kohonen-net-based methods suitable for the classification of large spatial datasets suitable for parallel processing. Parallelising the Kohonen net is not easy because the degree of natural parallelism is finely grained. This paper presents a new algorithm and demonstrates its performance on the Cray T3D parallel supercomputer.
Article
This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval
Spatial Cluster Ordering and Encoding for High-Dimensional Geographic Knowledge Discovery
  • G Diansheng
Diansheng, G. (2002) "Spatial Cluster Ordering and Encoding for High-Dimensional Geographic Knowledge Discovery", UCGIS2, Summer, 2002
Data Clustering: A Review
  • A K Jain
  • M N Murty
  • P J Flynn
Jain, A.K., Murty, M.N. and Flynn, P.J. (1999) " Data Clustering: A Review ", ACM Computing Surveys, Vol. 31, No. 3, September 1999