Content uploaded by Tasman Gillfeather-Clark
Author content
All content in this area was uploaded by Tasman Gillfeather-Clark on Aug 22, 2019
Content may be subject to copyright.
Proceedings of the 13th SEGJ International Symposium, 2018
Machine Learning for Land Classification - A SOM Case Study of Broken Hill
Tasman Gillfeather-Clark(1) and Luke Smith(2)
(1) University of Western Australia (tasman.gc@gmail.com), (2) Macquarie University(LukeSmith.geo@gmail.com).
ABSTRACT
Machine learning (ML) has come to refer to a diverse
range of algorithms and functions designed with the
intent of learning a given problem and producing some
evaluable result, as opposed to being explicitly
programmed to solve said problem. Typically, they are
iterative, and improve their conclusions over time. The
definition commonly introduced is paraphrased from
the work of Samuel (1959), who worked on some of
the earliest examples of ML, by applying what we now
refer to as a decision tree to teach a computer to play
checkers. The usage of ML in the geoscience industry
has the potential to become an indispensable tool for
geoscientists in all stages of the exploration process. As
the mining and resource evaluation industry faces an
ever-growing data glut, ML presents a range of tools to
work with increasingly large, multivariate datasets.
Self-Organising Maps (SOM) is an unsupervised
learning algorithm, used in this work to complete
landmass classification analysis of the area to the North
of Broken Hill. An examination of current ML
landmass classification methodologies is introduced
followed by a brief review of SOM. Applications of
SOM for mineralisation targeting and data QC are
identified in a data rich setting. The results of the study
confirm the efficiency of the SOM algorithm for
clustering lithological groups in land classification
studies. Perhaps most notable is SOM’s ability to
highlight variation in cover without needing to assign
labels, which has been identified as a key aspect
moving forward in Australia’s mining future, when
considering the vast expanses of Australia which is
composed of sub cropping rock.
KEY WORDS: SOM , Machine Learning, Data
Integration, Broken Hill, Remote Sensing
INTRODUCTION
Broadly, ML aims to: classify data, predict trends or
simplify visualisation. As we enter an age of
information rich geoscience datasets – large scale
mining and environmental programs accrue a diverse
and expansive set of data.
SOM is by no means a recent innovation, and
has been used for many different goals; from cluster
analysis in data mining (Fraser et al., 2006), as well as
being used to overcome gaps in datasets via the
production of ‘fuzzy’ observations (Wang, 2003), to
the analysis of ecological communities for exploring
the ordination of a species and also providing a
visualisation of that species’ abundance (Giraudel and
Lek, 2001). SOM has had a noted impact on the ML
space however it remains lesser known than the
techniques identified in the next section.
LANDMASS CLASSIFICATION WITH ML
An issue perfectly suited to machine learning is the
classification of landmass using remote sensing data.
This classification process has many outputs from land
usage classification for urban planning, agricultural and
biological management via the mapping of invasive
plant species to the obvious geological application of
surface geological mapping.
Cracknell et al. used Random Forrests (RF) and
Support Vector Machines (SVM) to identify
lithological contact zones from airborne geophysics
and satellite data and went on to compare other ML
algorithms in an extremely thorough review (Cracknell
and Reading, 2013, 2014). For more information on
SVM and remote sensing consider (Mountrakis, Im and
Ogole, 2011; Shao and Lunetta, 2012).
Examples of the applications of land
classification problems include; mapping the lithology
in Canada’s arctic using Neural Networks (NN), SVM,
RF and Maximum Likelihood Classifier (MLC) (He et
al., 2015), mapping seabed sediments with RF (Diesing
et al., 2014), and tracking airborne dust particulates
and their sources using SOM (Lary et al., 2016). There
are also applications to mapping landslides, which has
significant value in hazard reduction (Stumpf and
Kerle, 2011). Discussion is ongoing to understand the
best way to deploy ML for land classification and a
recent analysis of major issues and future directions
can be found here (Khatami, Mountrakis and Stehman,
2016). Typically, remote sensing data is continuous
and quite dense making for excellent input in ML
algorithms. However, understanding the clustering or
classification process is a prerequisite when
interpreting the resulting grouping.
This paper is based on the SOM technique
described in (Vesanto, Himberg and Alhoniemi, 1999)
which has been used extensively since then (Dickson,
1995; Gulson et al., 2007). This work outlines and
provided users with a workflow that allows for the
SOM process to operate on any multivariate dataset
that can be loaded into it. Our project aimed to
establish a simple workflow, which integrates well
with existing exploration industry software. The goal
Proceedings of the 13th SEGJ International Symposium, 2018
was to transition data from industry standard sources,
through the SOM process in MATLAB, and then back
out into a usable format for industry end users.
SELF ORGANISING MAPS
SOM is a type of Artificial Neural Network, which
uses multiple layers of perceptrons to simulate the
neurons of the brain. Details about SOM as a ML
technique can be found in Agarwal and Skupin, (2008).
The main strength of SOM is dimensionality reduction
and being able to visualise attribute distributions via
component planes. As the SOM operates, it trains data
into best matching units (BMU) without human
supervison. This BMU is just a data cluster which is an
inherent feature of the data itself. In our process this
dimensionality reductions allows multiple layers of
information to be visualised as a single colour. For
example for a given point all information shown in
Figure 2 is now associated with a single colour in the
BMU key, see examples in Figure 3.
This is the strength of a clustering algorithum like
SOM compared to a classification algorithm like
Random Forests (RF), which requires supervision
during training and can introduce operator bias.
However the strength of SOM is also it’s weakness: a
lack of user input can mean that you can’t target a
feature you may be interested in. You also can’t train a
SOM on one area and apply it to another. It is
important to consider the desired outcome of your
analysis before choosing your ML technique.
METHOD
Our dataset was provided as part of the Frank Arnott
Award by the Department of Primary Industries, NSW.
The relative wealth of data in this area is in part due to
the Broken Hill Exploration Initiative (BHEI). Due to
the high density of data it represents a great subject for
our SOM process due to the wealth of overlapping
datasets.
Figure 1 – (Left) Shows from top left
to bottom right: Radiometrics,
Hymapper, Density Modelling,
Surface Geology, Mag (Analyitical
Signal), Digital Terrain Model,
Mineral occurance maps and
Regolith Map. (Above) Point
Sampled grid, coloured using
geology.
The data were loaded as overlapping layers in Discover
Fig 1. left to create a point grid, Fig 1. right.
Effectively each point comes with two grid co-
ordinates (X, Y), as well as 25 different other values
(Mag AS, Density, K-Count…etc). This data set is
saved as a delimited list and then loaded into a SOM
toolbox for Matlab 5 data structure to interpret in
MATLAB. These values aren’t restricted to be being
numerical, they can be any standard data type, or Null
values. The presence of missing data doesn’t have any
weight on the SOM process and is thus ignored. In
summation the number of points fed into the SOM is in
excess of 500,000. SOMs with various numbers of
BMUs can be seen in Fig 2.
Figure 2 - Comparison of the same SOM area with different numbers of BMU. Note the changing complexity of the maps with
increasing numbers of BMUs. They key is provided in the bottom right of each map to give an approximate visualisation of the
various nodes. This would be used in conjunction to the component plane key to characterise the different BMUs.
Proceedings of the 13th SEGJ International Symposium, 2018
Each variable or column of the CSV is then
normalised, allowing for the varied datasets to be
compared with one another. Without weighting each
dataset, a profile that contains larger numbers would
simply overwhelm the others during SOM
computation. The operation of the SOM process is
described in (Vesanto, Himberg and Alhoniemi, 1999),
and a more practical explanation can be found here
(Dickson, 1995)
We used the SOM toolbox for Matlab 5, an
opensource implementation of the SOM process
(Vesanto, Himberg and Alhoniemi, 1999). The dataset
was processed on a Quad core Intel i7-2677M, base
clock 1.8GHz, with 4 GB ram (standard laptop
computer) and returned a result in under 10 seconds,
with the import and export of data being the main CPU
time sinks. During the initialisation process the
operational parameters are automatically calculated
requiring only the number of expected map units (we
used 250), and we performed no subsequent clustering
due to our interest in gradational change in the cover.
RESULTS
Figure 3 shows the completed SOM map. In which it is
clear that what is produced is not a conventional
geological map. We see some firm unit boundaries
likely caused by different neighbouring geological
units, however it’s clear that multiple processes are
contributing to grouping we see in the map. This is
expected, as surface mapping data doesn’t just capture
data about geological boundaries but also variation in
terrain and sedimentary cover. This makes it
challenging to identify the process that gave rise to a
given BMU. For example it’s possible to identify
drainage systems clearly.
Figure 3 - The resultant SOM map. The SOM toolbox
creates a SOM domain representation of how
component clusters correlate. It does not give any
spatial information. Thus the grouping of the colours is
linked only to ‘true correlation’ of the data.
The SOM process also produces an error map referred
to as Quantization error (Q-error), which measures how
well the unit fits into the BMU it has been matched
with, higher values corresponding to greater variance.
The grayscale image in Fig. 4 shows the quantization
error with brighter spots having higher Q-error values.
It is of note that the Silver King formation,
outlined in red, is anomalously high, indicating the
formation does not entirely fit with the best matching
unit it was mapped with. Streams also pop out across
the map arising from poor fits to their respective
BMUs. You would expect this for rivers and streams as
they act as the sink for all the weathering products
from the units they act as catchments for. Thus, you are
seeing a mix of different BMU’s all in a specific
location.
Figure 4 – Q-error map showing areas that vary
notably from the BMU they were mapped as. The area
outlined in red is a single geological unit (Silver King
Frm.)
This has potential to identifying potentially anomalous
areas meriting further investigation. Further analysis
can be found in (Smith and Gillfeather-Clark, 2018).
Our analysis revealed several areas, which merited
followup even identifying mapping errors with
geological maps we were provided with.
CONCLUSIONS
Our project took a large amount of complex data and
reduced it to a single map with available software and
existing tools. It also showed the value in the practical
application of applying unsupervised learning
techniques to trend mapping in the GIS space.
Ultimately this techniques application is limited to
areas with high quality overlapping datasets. We
foresee however that such areas will become more and
more common and SOM may represent an effective
way to interpret and rapidly characterise areas of
interest as datasets cover more and overlap each other.
Thus, we find that SOM is a useful tool for any analyst
working with GIS mapping moving forward.
Proceedings of the 13th SEGJ International Symposium, 2018
ACKNOWLEDGEMENTS
This work was completed as part of the Frank Arnott
Award under the apprentice category, in which we
ultimately placed 3rd. As such we’d like to thank all
those who’ve contributed to the completion of this
work. David Pratt of Tensor Research who mentored us
through this project and not only shared his time and
council but also his software. We’d also like to thank
our co-mentor Bruce Dickson who, through his
thorough knowledge of the SOM technique, gave
invaluable insight into the process of SOM. We’d also
like to thank Pitney Bownes who provided their
software free of charge for the duration of this
competition, as well as providing us which great
technical and operational assistance to fast track our
ability to utilise the software. Finally, we’d like to
thank the NSW Dept. of Primary Industry including Dr
John Greenfield, Astrid Carlton and Rosemary
Hegarty, who provided us with access to the dataset
which made this work possible but also for their feed
back on the output maps.
REFERENCES
Agarwal, P. and Skupin, A. (2008) Self-Organising
Maps: Applications in Geographic Information
Science, Self-Organising Maps: Applications in
Geographic Information Science.
Cracknell, M. J. and Reading, A. M. (2013) ‘The
upside of uncertainty: Identification of lithology
contact zones from airborne geophysics and
satellite data using random forests and support
vector machines’, Geophysics, 78(3), pp.
WB113-WB126.
Cracknell, M. J. and Reading, A. M. (2014)
‘Geological mapping using remote sensing data:
A comparison of five machine learning
algorithms, their response to variations in the
spatial distribution of training data and the use of
explicit spatial information’, Computers and
Geosciences. Elsevier, 63, pp. 22–33.
Dickson, B. L. (1995) ‘Analysis and Visualization of
Multiple Data Sets using Self- Organizing Maps’,
CSIRO Exploration & Mining, pp. 1–4.
Diesing, M. et al. (2014) ‘Mapping seabed sediments:
Comparison of manual, geostatistical, object-
based image analysis and machine learning
approaches’, Continental Shelf Research.
Elsevier, 84, pp. 107–119.
Fraser, S. J. et al. (2006) ‘Data mining mining data -
Ordered vector quantisation and examples of its
application to mine geotechnical data sets’, 6th
International Mining Geology Conference,
Rising to the Challenge, (August), pp. 259–268.
Giraudel, J. L. and Lek, S. (2001) ‘A comparison of
self-organizing map algorithm and some
conventional statistical methods for ecological
community ordination’, Ecological Modelling,
146(1–3), pp. 329–339.
Gulson, B. et al. (2007) ‘Comparison of lead isotopes
with source apportionment models, including
SOM, for air particulates’, Science of the Total
Environment, 381(1–3), pp. 169–179.
He, J. et al. (2015) ‘A comparison of classification
algorithms using Landsat-7 and Landsat-8 data
for mapping lithology in Canada’s Arctic’,
International Journal of Remote Sensing, 36(8),
pp. 2252–2276.
Khatami, R., Mountrakis, G. and Stehman, S. V.
(2016) ‘A meta-analysis of remote sensing
research on supervised pixel-based land-cover
image classification processes: General
guidelines for practitioners and future research’,
Remote Sensing of Environment. Elsevier Inc.,
177, pp. 89–100.
Lary, D. J. et al. (2016) ‘Machine learning in
geosciences and remote sensing’, Geoscience
Frontiers. Elsevier Ltd, 7(1), pp. 3–10.
Mountrakis, G., Im, J. and Ogole, C. (2011) ‘Support
vector machines in remote sensing: A review’,
ISPRS Journal of Photogrammetry and Remote
Sensing. Elsevier B.V., 66(3), pp. 247–259.
Samuel, A. L. (1959) ‘Some studies in machine
learning using the game of checkers’, Ibm
Journal, 3(3), p. 210.
Shao, Y. and Lunetta, R. S. (2012) ‘Comparison of
support vector machine, neural network, and
CART algorithms for the land-cover
classification using limited training data points’,
ISPRS Journal of Photogrammetry and Remote
Sensing. International Society for
Photogrammetry and Remote Sensing, Inc.
(ISPRS), 70, pp. 78–87.
Smith, L. and Gillfeather-Clark, T. (2018) ‘Self
Organising Maps - A Case Study of Broken Hill’,
ASEG Extended Abstracts, 2018(1), pp. 1–6.
Stumpf, A. and Kerle, N. (2011) ‘Object-oriented
mapping of landslides using Random Forests’,
Remote Sensing of Environment. Elsevier Inc.,
115(10), pp. 2564–2577.
Vesanto, J., Himberg, J. and Alhoniemi, E. (1999)
‘Self-organizing map in Matlab: the SOM
Toolbox’, Proceedings of the Matlab DSP
conference, 99.
Wang, S. (2003) ‘Application of self-organising maps
for data mining with incomplete data sets’,
Neural Computing and Applications, 12(1), pp.
42–48.