Conference PaperPDF Available

Machine Learning for Land Classification - A SOM Case Study of Broken Hill

Authors:
Proceedings of the 13th SEGJ International Symposium, 2018
Machine Learning for Land Classification - A SOM Case Study of Broken Hill
Tasman Gillfeather-Clark(1) and Luke Smith(2)
(1) University of Western Australia (tasman.gc@gmail.com), (2) Macquarie University(LukeSmith.geo@gmail.com).
ABSTRACT
Machine learning (ML) has come to refer to a diverse
range of algorithms and functions designed with the
intent of learning a given problem and producing some
evaluable result, as opposed to being explicitly
programmed to solve said problem. Typically, they are
iterative, and improve their conclusions over time. The
definition commonly introduced is paraphrased from
the work of Samuel (1959), who worked on some of
the earliest examples of ML, by applying what we now
refer to as a decision tree to teach a computer to play
checkers. The usage of ML in the geoscience industry
has the potential to become an indispensable tool for
geoscientists in all stages of the exploration process. As
the mining and resource evaluation industry faces an
ever-growing data glut, ML presents a range of tools to
work with increasingly large, multivariate datasets.
Self-Organising Maps (SOM) is an unsupervised
learning algorithm, used in this work to complete
landmass classification analysis of the area to the North
of Broken Hill. An examination of current ML
landmass classification methodologies is introduced
followed by a brief review of SOM. Applications of
SOM for mineralisation targeting and data QC are
identified in a data rich setting. The results of the study
confirm the efficiency of the SOM algorithm for
clustering lithological groups in land classification
studies. Perhaps most notable is SOM’s ability to
highlight variation in cover without needing to assign
labels, which has been identified as a key aspect
moving forward in Australia’s mining future, when
considering the vast expanses of Australia which is
composed of sub cropping rock.
KEY WORDS: SOM , Machine Learning, Data
Integration, Broken Hill, Remote Sensing
INTRODUCTION
Broadly, ML aims to: classify data, predict trends or
simplify visualisation. As we enter an age of
information rich geoscience datasets large scale
mining and environmental programs accrue a diverse
and expansive set of data.
SOM is by no means a recent innovation, and
has been used for many different goals; from cluster
analysis in data mining (Fraser et al., 2006), as well as
being used to overcome gaps in datasets via the
production of ‘fuzzy’ observations (Wang, 2003), to
the analysis of ecological communities for exploring
the ordination of a species and also providing a
visualisation of that species’ abundance (Giraudel and
Lek, 2001). SOM has had a noted impact on the ML
space however it remains lesser known than the
techniques identified in the next section.
LANDMASS CLASSIFICATION WITH ML
An issue perfectly suited to machine learning is the
classification of landmass using remote sensing data.
This classification process has many outputs from land
usage classification for urban planning, agricultural and
biological management via the mapping of invasive
plant species to the obvious geological application of
surface geological mapping.
Cracknell et al. used Random Forrests (RF) and
Support Vector Machines (SVM) to identify
lithological contact zones from airborne geophysics
and satellite data and went on to compare other ML
algorithms in an extremely thorough review (Cracknell
and Reading, 2013, 2014). For more information on
SVM and remote sensing consider (Mountrakis, Im and
Ogole, 2011; Shao and Lunetta, 2012).
Examples of the applications of land
classification problems include; mapping the lithology
in Canada’s arctic using Neural Networks (NN), SVM,
RF and Maximum Likelihood Classifier (MLC) (He et
al., 2015), mapping seabed sediments with RF (Diesing
et al., 2014), and tracking airborne dust particulates
and their sources using SOM (Lary et al., 2016). There
are also applications to mapping landslides, which has
significant value in hazard reduction (Stumpf and
Kerle, 2011). Discussion is ongoing to understand the
best way to deploy ML for land classification and a
recent analysis of major issues and future directions
can be found here (Khatami, Mountrakis and Stehman,
2016). Typically, remote sensing data is continuous
and quite dense making for excellent input in ML
algorithms. However, understanding the clustering or
classification process is a prerequisite when
interpreting the resulting grouping.
This paper is based on the SOM technique
described in (Vesanto, Himberg and Alhoniemi, 1999)
which has been used extensively since then (Dickson,
1995; Gulson et al., 2007). This work outlines and
provided users with a workflow that allows for the
SOM process to operate on any multivariate dataset
that can be loaded into it. Our project aimed to
establish a simple workflow, which integrates well
with existing exploration industry software. The goal
Proceedings of the 13th SEGJ International Symposium, 2018
was to transition data from industry standard sources,
through the SOM process in MATLAB, and then back
out into a usable format for industry end users.
SELF ORGANISING MAPS
SOM is a type of Artificial Neural Network, which
uses multiple layers of perceptrons to simulate the
neurons of the brain. Details about SOM as a ML
technique can be found in Agarwal and Skupin, (2008).
The main strength of SOM is dimensionality reduction
and being able to visualise attribute distributions via
component planes. As the SOM operates, it trains data
into best matching units (BMU) without human
supervison. This BMU is just a data cluster which is an
inherent feature of the data itself. In our process this
dimensionality reductions allows multiple layers of
information to be visualised as a single colour. For
example for a given point all information shown in
Figure 2 is now associated with a single colour in the
BMU key, see examples in Figure 3.
This is the strength of a clustering algorithum like
SOM compared to a classification algorithm like
Random Forests (RF), which requires supervision
during training and can introduce operator bias.
However the strength of SOM is also it’s weakness: a
lack of user input can mean that you can’t target a
feature you may be interested in. You also can’t train a
SOM on one area and apply it to another. It is
important to consider the desired outcome of your
analysis before choosing your ML technique.
METHOD
Our dataset was provided as part of the Frank Arnott
Award by the Department of Primary Industries, NSW.
The relative wealth of data in this area is in part due to
the Broken Hill Exploration Initiative (BHEI). Due to
the high density of data it represents a great subject for
our SOM process due to the wealth of overlapping
datasets.
Figure 1 (Left) Shows from top left
to bottom right: Radiometrics,
Hymapper, Density Modelling,
Surface Geology, Mag (Analyitical
Signal), Digital Terrain Model,
Mineral occurance maps and
Regolith Map. (Above) Point
Sampled grid, coloured using
geology.
The data were loaded as overlapping layers in Discover
Fig 1. left to create a point grid, Fig 1. right.
Effectively each point comes with two grid co-
ordinates (X, Y), as well as 25 different other values
(Mag AS, Density, K-Count…etc). This data set is
saved as a delimited list and then loaded into a SOM
toolbox for Matlab 5 data structure to interpret in
MATLAB. These values aren’t restricted to be being
numerical, they can be any standard data type, or Null
values. The presence of missing data doesn’t have any
weight on the SOM process and is thus ignored. In
summation the number of points fed into the SOM is in
excess of 500,000. SOMs with various numbers of
BMUs can be seen in Fig 2.
Figure 2 - Comparison of the same SOM area with different numbers of BMU. Note the changing complexity of the maps with
increasing numbers of BMUs. They key is provided in the bottom right of each map to give an approximate visualisation of the
various nodes. This would be used in conjunction to the component plane key to characterise the different BMUs.
Proceedings of the 13th SEGJ International Symposium, 2018
Each variable or column of the CSV is then
normalised, allowing for the varied datasets to be
compared with one another. Without weighting each
dataset, a profile that contains larger numbers would
simply overwhelm the others during SOM
computation. The operation of the SOM process is
described in (Vesanto, Himberg and Alhoniemi, 1999),
and a more practical explanation can be found here
(Dickson, 1995)
We used the SOM toolbox for Matlab 5, an
opensource implementation of the SOM process
(Vesanto, Himberg and Alhoniemi, 1999). The dataset
was processed on a Quad core Intel i7-2677M, base
clock 1.8GHz, with 4 GB ram (standard laptop
computer) and returned a result in under 10 seconds,
with the import and export of data being the main CPU
time sinks. During the initialisation process the
operational parameters are automatically calculated
requiring only the number of expected map units (we
used 250), and we performed no subsequent clustering
due to our interest in gradational change in the cover.
RESULTS
Figure 3 shows the completed SOM map. In which it is
clear that what is produced is not a conventional
geological map. We see some firm unit boundaries
likely caused by different neighbouring geological
units, however it’s clear that multiple processes are
contributing to grouping we see in the map. This is
expected, as surface mapping data doesn’t just capture
data about geological boundaries but also variation in
terrain and sedimentary cover. This makes it
challenging to identify the process that gave rise to a
given BMU. For example it’s possible to identify
drainage systems clearly.
Figure 3 - The resultant SOM map. The SOM toolbox
creates a SOM domain representation of how
component clusters correlate. It does not give any
spatial information. Thus the grouping of the colours is
linked only to ‘true correlation’ of the data.
The SOM process also produces an error map referred
to as Quantization error (Q-error), which measures how
well the unit fits into the BMU it has been matched
with, higher values corresponding to greater variance.
The grayscale image in Fig. 4 shows the quantization
error with brighter spots having higher Q-error values.
It is of note that the Silver King formation,
outlined in red, is anomalously high, indicating the
formation does not entirely fit with the best matching
unit it was mapped with. Streams also pop out across
the map arising from poor fits to their respective
BMUs. You would expect this for rivers and streams as
they act as the sink for all the weathering products
from the units they act as catchments for. Thus, you are
seeing a mix of different BMU’s all in a specific
location.
Figure 4 Q-error map showing areas that vary
notably from the BMU they were mapped as. The area
outlined in red is a single geological unit (Silver King
Frm.)
This has potential to identifying potentially anomalous
areas meriting further investigation. Further analysis
can be found in (Smith and Gillfeather-Clark, 2018).
Our analysis revealed several areas, which merited
followup even identifying mapping errors with
geological maps we were provided with.
CONCLUSIONS
Our project took a large amount of complex data and
reduced it to a single map with available software and
existing tools. It also showed the value in the practical
application of applying unsupervised learning
techniques to trend mapping in the GIS space.
Ultimately this techniques application is limited to
areas with high quality overlapping datasets. We
foresee however that such areas will become more and
more common and SOM may represent an effective
way to interpret and rapidly characterise areas of
interest as datasets cover more and overlap each other.
Thus, we find that SOM is a useful tool for any analyst
working with GIS mapping moving forward.
Proceedings of the 13th SEGJ International Symposium, 2018
ACKNOWLEDGEMENTS
This work was completed as part of the Frank Arnott
Award under the apprentice category, in which we
ultimately placed 3rd. As such we’d like to thank all
those who’ve contributed to the completion of this
work. David Pratt of Tensor Research who mentored us
through this project and not only shared his time and
council but also his software. We’d also like to thank
our co-mentor Bruce Dickson who, through his
thorough knowledge of the SOM technique, gave
invaluable insight into the process of SOM. We’d also
like to thank Pitney Bownes who provided their
software free of charge for the duration of this
competition, as well as providing us which great
technical and operational assistance to fast track our
ability to utilise the software. Finally, we’d like to
thank the NSW Dept. of Primary Industry including Dr
John Greenfield, Astrid Carlton and Rosemary
Hegarty, who provided us with access to the dataset
which made this work possible but also for their feed
back on the output maps.
REFERENCES
Agarwal, P. and Skupin, A. (2008) Self-Organising
Maps: Applications in Geographic Information
Science, Self-Organising Maps: Applications in
Geographic Information Science.
Cracknell, M. J. and Reading, A. M. (2013) ‘The
upside of uncertainty: Identification of lithology
contact zones from airborne geophysics and
satellite data using random forests and support
vector machines’, Geophysics, 78(3), pp.
WB113-WB126.
Cracknell, M. J. and Reading, A. M. (2014)
‘Geological mapping using remote sensing data:
A comparison of five machine learning
algorithms, their response to variations in the
spatial distribution of training data and the use of
explicit spatial information’, Computers and
Geosciences. Elsevier, 63, pp. 2233.
Dickson, B. L. (1995) ‘Analysis and Visualization of
Multiple Data Sets using Self- Organizing Maps’,
CSIRO Exploration & Mining, pp. 14.
Diesing, M. et al. (2014) ‘Mapping seabed sediments:
Comparison of manual, geostatistical, object-
based image analysis and machine learning
approaches’, Continental Shelf Research.
Elsevier, 84, pp. 107119.
Fraser, S. J. et al. (2006) ‘Data mining mining data -
Ordered vector quantisation and examples of its
application to mine geotechnical data sets’, 6th
International Mining Geology Conference,
Rising to the Challenge, (August), pp. 259268.
Giraudel, J. L. and Lek, S. (2001) ‘A comparison of
self-organizing map algorithm and some
conventional statistical methods for ecological
community ordination’, Ecological Modelling,
146(13), pp. 329339.
Gulson, B. et al. (2007) ‘Comparison of lead isotopes
with source apportionment models, including
SOM, for air particulates’, Science of the Total
Environment, 381(13), pp. 169179.
He, J. et al. (2015) ‘A comparison of classification
algorithms using Landsat-7 and Landsat-8 data
for mapping lithology in Canada’s Arctic’,
International Journal of Remote Sensing, 36(8),
pp. 22522276.
Khatami, R., Mountrakis, G. and Stehman, S. V.
(2016) ‘A meta-analysis of remote sensing
research on supervised pixel-based land-cover
image classification processes: General
guidelines for practitioners and future research’,
Remote Sensing of Environment. Elsevier Inc.,
177, pp. 89100.
Lary, D. J. et al. (2016) ‘Machine learning in
geosciences and remote sensing’, Geoscience
Frontiers. Elsevier Ltd, 7(1), pp. 310.
Mountrakis, G., Im, J. and Ogole, C. (2011) ‘Support
vector machines in remote sensing: A review’,
ISPRS Journal of Photogrammetry and Remote
Sensing. Elsevier B.V., 66(3), pp. 247259.
Samuel, A. L. (1959) ‘Some studies in machine
learning using the game of checkers’, Ibm
Journal, 3(3), p. 210.
Shao, Y. and Lunetta, R. S. (2012) ‘Comparison of
support vector machine, neural network, and
CART algorithms for the land-cover
classification using limited training data points’,
ISPRS Journal of Photogrammetry and Remote
Sensing. International Society for
Photogrammetry and Remote Sensing, Inc.
(ISPRS), 70, pp. 7887.
Smith, L. and Gillfeather-Clark, T. (2018) ‘Self
Organising Maps - A Case Study of Broken Hill’,
ASEG Extended Abstracts, 2018(1), pp. 16.
Stumpf, A. and Kerle, N. (2011) ‘Object-oriented
mapping of landslides using Random Forests’,
Remote Sensing of Environment. Elsevier Inc.,
115(10), pp. 25642577.
Vesanto, J., Himberg, J. and Alhoniemi, E. (1999)
‘Self-organizing map in Matlab: the SOM
Toolbox’, Proceedings of the Matlab DSP
conference, 99.
Wang, S. (2003) ‘Application of self-organising maps
for data mining with incomplete data sets’,
Neural Computing and Applications, 12(1), pp.
4248.
... Previous work on the automatic detection of geological structures on maps is limited. A greater body of work exists on the related problem of automatic classification of lithology from remote sensing and airborne geophysical data, as in de Carvalho Carneiro et al. (2012), Reading (2013, 2014), Kuhn et al. (2018), Gillfeather-Clark and Smith (2018), and Bressan et al. (2020), and some work on fault and lineament detection from such datasets (e.g., Vasuki et al., 2014;Middleton et al., 2015;Aghaee et al., 2021). Another problem that has received greater attention is automatic interpretation of seismic reflection data, including the identification of faults (e.g., Wu et al., 2019Wu et al., , 2020Cunha et al., 2020;An et al., 2021An et al., , 2023Gao et al., 2022;Wang et al., 2023) and salt structures (e.g., Shi et al., 2019;Muller et al., 2022). ...
Article
Full-text available
The increasing availability of large geological datasets and modern methods of data analysis facilitate a data science approach to geology in which inferences are drawn from geological data using automated methods based on statistics and machine learning. Such methods offer the potential for faster and less subjective interpretations of geological data than are possible from a human interpreter, but translating the understanding of a trained geologist to an algorithm is not straightforward. In this paper, we present automated workflows for detecting geological folds from map data using both unsupervised and supervised machine learning. For the unsupervised case, we use regular expression matching to identify map patterns suggestive of folds along lines crossing the map. We then use the HDBSCAN clustering algorithm to cluster these possible fold identifications into a smaller number of distinct folds. This clustering algorithm is chosen because it does not require the number of clusters to be known a priori. For the supervised learning case, we use synthetic models of folds to train a convolutional neural network to identify folds using map and topographic data. We test both methods on synthetic and real datasets, where they both prove capable of identifying folds. We also find that distinguishing folds from similar map patterns produced by topography is a major issue that must be accounted for with both methods. The unsupervised method has advantages, including the explainability of its results, and provides clearly better results in one of the two real-world test datasets, while the supervised learning method is more fully automated and likely more easily extensible to other structures. Both methods demonstrate the ability of machine learning to interpret folds on geological maps and have potential for further development targeting a wider range of structures and datasets.
Preprint
Full-text available
The increasing availability of large geological datasets together with modern methods of data analysis facilitate a data science approach to geology in which inferences are drawn from geological data using automated methods based on statistics and machine learning. Such methods offer the potential for faster and less subjective interpretations of geological data than are possible from a human interpreter, but translating the understanding of a trained geologist to an algorithm is not straightforward. In this paper, we present automated workflows for detecting geological folds from map data using both unsupervised and supervised machine learning. For the unsupervised case, we use regular expression matching to identify map patterns suggestive of folds along lines crossing the map. We then use the hdbscan clustering algorithm to cluster these possible fold identifications into a smaller number of distinct folds, the number of which is not known a priori. For the supervised learning case, we use synthetic models of folds to train a convolutional neural network to identify folds using map and topographic data. We test both methods on synthetic and real datasets.
Article
Full-text available
The principle aim of the research was to overcome the challenges faced by modern geophysical data analysts, particularly those working with large multivariate datasets using Self Organising Maps (SOM). SOM is an unsupervised learning technique for multivariate data, which works by taking multiple geophysical datasets for an area of interest, and integrating them to illustrate trends. Once developed, our method drastically lowered the time required for an analyst to examine and identify trends and relations across a broad range of geophysical, geochemical and other data layers. It also revealed hidden relations and distinct populations within correlated layers. Our study shows that SOM continues to be a powerful tool in accelerating the interpretation process. This includes the separation of features into distinct geological units, even without any preliminary map inputs to the SOM process. It also highlights SOM’s ability to highlight variation in cover, which has been identified as a key aspect moving forward in Australia’s mining future, when considering the vast expanses of Australia covered in sub cropping rock. In the future as data continue to grow and overlap, SOM will play an important role in highlighting these relations in soil cover and outcrop geology.
Article
Full-text available
Learning incorporates a broad range of complex procedures. Machine learning (ML) is a subdivision of artificial intelligence based on the biological learning process. The ML approach deals with the design of algorithms to learn from machine readable data. ML covers main domains such as data mining, difficult-to-program applications, and software applications. It is a collection of a variety of algorithms (e.g. neural networks, support vector machines, self-organizing map, decision trees, random forests, case-based reasoning, genetic programming, etc.) that can provide multivariate, nonlinear, nonparametric regression or classification. The modeling capabilities of the ML-based methods have resulted in their extensive applications in science and engineering. Herein, the role of ML as an effective approach for solving problems in geosciences and remote sensing will be highlighted. The unique features of some of the ML techniques will be outlined with a specific attention to genetic programming paradigm. Furthermore, nonparametric regression and classification illustrative examples are presented to demonstrate the efficiency of ML for tackling the geosciences and remote sensing problems. Ó 2015, China University of Geosciences (Beijing) and Peking University. Production and hosting by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/ licenses/by-nc-nd/4.0/).
Article
Full-text available
To map Arctic lithology in central Victoria Island, Canada, the relative performance of advanced classifiers (Neural Network (NN), Support Vector Machine (SVM), and Random Forest (RF)) were compared to Maximum Likelihood Classifier (MLC) results using Landsat-7 and Landsat-8 imagery. A ten-repetition cross-validation classification approach was applied. Classification performance was evaluated visually and statistically using the global classification accuracy, producer’s and user’s accuracies for each individual lithological/spectral class, and cross-comparison agreement. The advanced classifiers outperformed MLC, especially when training data were not normally distributed. The Landsat-8 classification results were comparable to Landsat-7 using the advanced classifiers but differences were more pronounced when using MLC. Rescaling the Landsat-8 data from 16 bit to 8 bit substantially increased classification accuracy when MLC was applied but had little impact on results from the advanced classifiers.
Article
Full-text available
Marine spatial planning and conservation need underpinning with sufficiently detailed and accurate seabed substrate and habitat maps. Although multibeam echosounders enable us to map the seabed with high resolution and spatial accuracy, there is still a lack of fit-for-purpose seabed maps. This is due to the high costs involved in carrying out systematic seabed mapping programmes and the fact that the development of validated, repeatable, quantitative and objective methods of swath acoustic data interpretation is still in its infancy. We compared a wide spectrum of approaches including manual interpretation, geostatistics, object-based image analysis and machine-learning to gain further insights into the accuracy and comparability of acoustic data interpretation approaches based on multibeam echosounder data (bathymetry, backscatter and derivatives) and seabed samples with the aim to derive seabed substrate maps. Sample data were split into a training and validation data set to allow us to carry out an accuracy assessment. Overall thematic classification accuracy ranged from 67% to 76% and Cohen’s kappa varied between 0.34 and 0.52. However, these differences were not statistically significant at the 5% level. Misclassifications were mainly associated with uncommon classes, which were rarely sampled. Map outputs were between 68% and 87% identical. To improve classification accuracy in seabed mapping, we suggest that more studies on the effects of factors affecting the classification performance as well as comparative studies testing the performance of different approaches need to be carried out with a view to developing guidelines for selecting an appropriate method for a given dataset. In the meantime, classification accuracy might be improved by combining different techniques to hybrid approaches and multi-method ensembles.
Article
Full-text available
Machine learning algorithms (MLAs) are a powerful group of data-driven inference tools that offer an automated means of recognizing patterns in high-dimensional data. Hence, there is much scope for the application of MLAs to the rapidly increasing volumes of remotely sensed geophysical data for geological mapping problems. We carry out a rigorous comparison of five MLAs: Naive Bayes, k-Nearest Neighbors, Random Forests, Support Vector Machines, and Artificial Neural Networks, in the context of a supervised lithology classification task using widely available and spatially constrained remotely sensed geophysical data. We make a further comparison of MLAs based on their sensitivity to variations in the degree of spatial clustering of training data, and their response to the inclusion of explicit spatial information (spatial coordinates). Our work identifies Random Forests as a good first choice algorithm for the supervised classification of lithology using remotely sensed geophysical data. Random Forests is straightforward to train, computationally efficient, highly stable with respect to variations in classification model parameter values, and as accurate as, or substantially more accurate than the other MLAs trialed. The results of our study indicate that as training data becomes increasingly dispersed across the region under investigation, MLA predictive accuracy improves dramatically. The use of explicit spatial information generates accurate lithology predictions but should be used in conjunction with geophysical data in order to generate geologically plausible predictions. MLAs, such as Random Forests, are valuable tools for generating reliable first-pass predictions for practical geological mapping applications that combine widely available geophysical data.
Article
Classification of remotely sensed imagery for land-cover mapping purposes has attracted significant attention from researchers and practitioners. Numerous studies conducted over several decades have investigated a broad array of input data and classification methods. However, this vast assemblage of research results has not been synthesized to provide coherent guidance on the relative performance of different classification processes for generating land cover products. To address this problem, we completed a statistical meta-analysis of the past 15 years of research on supervised per-pixel image classification published in five high-impact remote sensing journals. The two general factors evaluated were classification algorithms and input data manipulation as these are factors that can be controlled by analysts to improve classification accuracy. The meta-analysis revealed that inclusion of texture information yielded the greatest improvement in overall accuracy of land-cover classification with an average increase of 12.1%. This increase in accuracy can be attributed to the additional spatial context information provided by including texture. Inclusion of ancillary data, multi-angle and time images also provided significant improvement in classification overall accuracy, with 8.5%, 8.0%, and 6.9% of average improvements, respectively. In contrast, other manipulation of spectral information such as index creation (e.g. Normalized Difference Vegetation Index) and feature extraction (e.g. Principal Components Analysis) offered much smaller improvements in accuracy. In terms of classification algorithms, support vector machines achieved the greatest accuracy, followed by neural network methods. The random forest classifier performed considerably better than the traditional decision tree classifier. Maximum likelihood classifiers, often used as benchmarking algorithms, offered low accuracy. Our findings will help guide practitioners to decide which classification to implement and also provide direction to researchers regarding comparative studies that will further solidify our understanding of different classification processes. However, these general guidelines do not preclude an analyst from incorporating personal preferences or considering specific algorithmic benefits that may be pertinent to a particular application.
Article
Computational techniques are needed to assist in the analysis and interpretation of the increasing amounts of geoscientific and mining related data and information that are routinely gathered at mine sites. New interpretation methods are needed to provide an integrated approach so as to establish relationships (cause and effect) or to be predictive about possible outcomes based on acquired data. Traditional multivariate statistical methods are often confused by variable relationships that are non-linear, data distributions that are non-normal, and by the data themselves that may contain missing values, text and both continuous and discontinuous numeric values. Modern data mining methods are available to understand the relationships within and between diverse data sets, and identify relationships or trends associated with processes, measurements and mine design parameters in mines. The self-organizing map (SOM) is a data mining approach with the advantage that all input data samples are represented as vectors in a data-space defined by the observations (variables). The SOM procedure is an exploratory analysis tool that highlights patterns and relationships. Results are internally derived, in an unsupervised fashion based on measures of vector similarity. Our SOM outputs are highly visual, which assists in understanding and illustrating the data’s structure and internal relationships. Two studies are presented. The first uses SOM to investigate the relationships between a range of measured and derived rock property parameters and their observed behaviours during mining. Our aim was to determine whether a SOM analysis would allow a better understanding of observed and predicted rock behaviours from their rock properties. The SOM analysis showed that there were indeed relationships between ‘brittle’, ‘sugary’, ‘slabby’ and ‘squeezing’ rock behaviours and a number of the input parameters. The computed SOM ‘structure’ was then used to assess the likely behaviour of unknown samples by assigning them into behavioural fields based on their measured and derived geotechnical responses. The second study was aimed at assessing whether a SOM analysis of microseismic data at the Kalgoorlie Consolidated Gold Mine’s (KCGM) Mt Charlotte mine can contribute towards understanding and potentially forecasting/minimising seismicity at the mine-design stage. The database comprised 40 sample ‘clusters’ of microseismic parameters from the MS-RAP software package, which were derived from 2500 individual seismic events. The SOM analysis showed that some of the initial mining conditions are related to the seismic parameters. In each of the studies, the SOM procedures assisted in understanding the data and provided new insights. Because of its ‘ordered vector quantisation’ foundation, the SOM procedure is a substantial improvement over traditional statistical methods in three main areas: • its ability to analyse data from disparate sources, including different data types; • its ability to analyse sparse data sets with missing values; and • its ability to analyse and visualise non-linear relationships within a data set.
Article
Support vector machine (SVM) was applied for land-cover characterization using MODIS time-series data. Classification performance was examined with respect to training sample size, sample variability, and landscape homogeneity (purity). The results were compared to two conventional nonparametric image classification algorithms: multilayer perceptron neural networks (NN) and classification and regression trees (CART). For 2001 MODIS time-series data, SVM generated overall accuracies ranging from 77% to 80% for training sample sizes from 20 to 800 pixels per class, compared to 67-76% and 62-73% for NN and CART, respectively. These results indicated that SVM's had superior generalization capability, particularly with respect to small training sample sizes. There was also less variability of SVM performance when classification trials were repeated using different training sets. Additionally, classification accuracies were directly related to sample homogeneity/heterogeneity. The overall accuracies for the SVM algorithm were 91% (Kappa = 0.77) and 64% (Kappa = 0.34) for homogeneous and heterogeneous pixels, respectively. The inclusion of heterogeneous pixels in the training sample did not increase overall accuracies. Also, the SVM performance was examined for the classification of multiple year MODIS time-series data at annual intervals. Finally, using only the SVM output values, a method was developed to directly classify pixel purity. Approximately 65% of pixels within the Albemarle-Pamlico Basin study area were labeled as "functionally homogeneous" with an overall classification accuracy of 91% (Kappa = 0.79). The results indicated a high potential for regional scale operational land-cover characterization applications.
Book
Self-Organising Maps: Applications in GI Science brings together the latest geographical research where extensive use has been made of the SOM algorithm, and provides readers with a snapshot of these tools that can then be adapted and used in new research projects. The book begins with an overview of the SOM technique and the most commonly used (and freely available) software; it is then sectioned to look at the different uses of the technique, namely clustering, data mining and cartography, from a range of application-areas in the biophysical and socio-economic environments. Only book that takes SOM algorithm to the GIS and Geography research communities The Editors draw together expert contributors from the UK, Europe, USA, New Zealand, and South Africa Covers a range of techniques in clustering, data mining cartography, all featuring an appropriate case study.