Conference PaperPDF Available

Application of SOMs and k-means clustering to geophysical mapping -Lessons learned

Authors:
Application of SOMs and k-means clustering to geophysical mapping Lessons learned
Angela Carter-McAuslan* and Colin Farquharson, Memorial University of Newfoundland
Summary
Machine learning techniques are of growing interest to the
geosciences. We discuss the use of self-organizing maps and
k-means clustering as techniques for the analysis of potential
field and radiometric data in the production of predictive
maps. By looking at examples of predictive maps produced
for surface geology on the Baie Verte Peninsula,
Newfoundland, Canada as well as basement geology of the
mid-continent rift in Decorah, Iowa, USA we show the
benefits of a combined SOM – k-means clustering method
over using SOMs or k-means clustering as stand-alone
methods.
Introduction
Machine learning is a broad and fascinating area of study
with varied applications within the earth sciences. One such
application is the generation of automated predictive maps
from remote sensing and geophysical data. In this
presentation we will be discussing two types of unsupervised
machine learning techniques: self-organizing maps (SOMs)
and k-means clustering. K-means clustering (Macqueen,
1967) is a well understood and simple machine learning
technique. However, k-means clustering cannot be applied
to incomplete datasets. Self-organizing maps are a type of
unsupervised neural network algorithm used for the analysis,
visualization and interpretation of multi-dimensional
datasets first developed by Kohonen (1982). They have been
applied to a number of geoscience problems including
bedrock mapping (Carneiro et al. 2012) but are less straight
forward than k-means clustering. In this study we look at the
use of SOMs and k-means clustering separately and in
conjunction with one another for the task of producing
predictive geological maps.
We will use examples from two studies, one producing a
surface geophysical mapping of surface geology in Baie
Verte, Newfoundland, and one for buried geology in
Decorah, Iowa. The Baie Verte study allows for a
comparison between the results of the SOM process and the
combined SOM – k-means clustering process when applied
to legacy geophysical data for the purposes of mapping near
surface geology. Due to issues with data completeness the k-
means clustering as a stand-alone process is not applicable.
The Decorah, Iowa study allows for the comparison between
maps produced using the SOM process, pure k-means
clustering and the combined SOM k-means clustering
process on relatively modern co-located datasets.
Self-Organizing Maps Theory
In this presentation, we will be showing maps produced
using the CSIRO SOM implementation SiroSOM. SiroSOM
is based on the MATLAB Toolkit (Leväniemi et al., 2017)
developed by Kohonen et al. (1996). The following is a
general mathematical description of the SOM algorithm
based primarily on Kohonen et al. (1996) and Kohonen
(1998).
All applications of SOM algorithms begin with a set of p
data vectors
!" #" $%
&
!' %
&
"' ( ( ( ' %
&
#)
,
where each
%
&
$
is a D-dimensional vector
%
&
$# $*!' *"' + ' *%)
.
Each element,
*&
, is the value of a different data type. In this
study the
*&
are different types of geophysical data.
SOM neural nets consist of a set of N neurons (also called
computational units or model vectors) of D-dimensions
, #
$
-
.
.
&
!' -
.
.
&
"' + ' -
.
.
&
'
)
where N < p. The goal is to train” the neural net such that
the neural vectors exist on a D-dimensional data space
mimicking the distribution of the observation vectors, X.
Training is a recursive regression where each step involves
the presentation of a sample observation vector
%
&
(
to the
neural net to determining the neural vector most similar to
%
&
(
(which then becomes known as the best matching unit or
BMU associated with
%
&
(
) denoted as
-
.
.
&
)*(
. The BMU is
determined using the criteria
/
-
.
.
&
)*( 0 %
&
(
/
1
/
-
.
.
$*( 0 -
.
.
&
)*(
/(
"
The neural net is updated as per the objective function
-
.
.
&
)*(+! # -
.
.
&
$*( 2 3)
,
-
.
*$
4
%
&
(0 -
.
.
&
$*(
5(
"
The objective function combines competitive and co-
operative learning through the neighborhood function
3)
,
-
.
*$
. The neighborhood function encodes the magnitude of
the change to
-
.
.
&
$*(
based on its proximity in the data space to
-
.
.
&
)*(
. The exact form that the neighborhood function takes is
specific to each SOM implementation. However, in all
predictive maps, the BMU is modified the most. The amount
of change to other neurons in the net is dependent on their
proximity in neural net space to the BMU. The learning rate
(amount of modification to the neural vectors) and size of
the affected neighborhoods generally decreases with each
iteration of training. In this implementation both the learning
rate and neighborhood function decrease linearly (Kohonen
et al.,1996).
Once trained, the data are clustered using the neural net by
grouping data points according to their BMU. As such, the
neurons in the neural net become representative of small
clusters of data. If trained correctly, the neural net should
mimic the topology of the original dataset in data space (i.e.
SOM and k-means ClusteringLessons learned
similar data which are close together in data space should
have BMUs that are close together on the neural net). If a
more in-depth explanation of SOM theory is desired, please
refer to Kohonen (1982).
K-means Clustering Theory
K-means clustering (Macqueen, 1967) is a simple, well
understood machine learning technique for the partitioning
of a set of p observations
!" #"
6
%
&
!' %
&
"' ( ( ( ' %
&
#
7
into k clusters represented by a set of centroids
8" #"
6
9
&
!' 9
&
"' ( ( ( ' 9
&
#
7
where
9
&
$# $:!' :"' + ' :%)
by minimizing the objective function
; #
< < 4/
%
&
$0 9
&
&
/5
"
/
&0!
#
$0!
.
The type of optimization used to minimize J is specific to the
k-means cluster implementation.
Example 1: Baie Verte, Newfoundland Canada
The Baie Verte Peninsula is a region of complex geology
consisting of siliclastic schist and felsic plutonic rocks
associated with the coastline of ancestral Laurentia separated
by the Baie Verte Line, a suture associated with the Taconic
orogeny, from seafloor and ocean island arc rocks in the east.
The area is host to base metal and gold deposits both
historically mined and currently in production.
A suite of legacy geophysical data (Figure 1) is available
from the Geological Survey of Canada for the peninsula. In
this study we used gravity, reduced-to-pole magnetic, and
radiometric data compiled from the 1987 Springdale Survey,
the 1988 Baie Verte Peninsula Survey, and the 2007 Baie
Verte Survey.
Figure 1: Datasets used for the Baie Verte SOM.
The SOM predictive maps were created using a neural net of
52x43 neurons, and the k-means clustering was carried out
using 14 centroids. Figure 2 b and c show the results for the
stand-alone SOM and the SOM – k-means cluster combined
processes respectively.
(a)
(b)
(c)
Figure 2: (a) Generalized geology of the Baie Verte Peninsula
(after Coleman-Sadd, 1996) and the results of the SOM carried
out using (b) only the SOM process and (c) a combination of the
SOM and k-means clustering.
The straight SOM result (Figure 2b) shows a good amount
of success in the differentiation of the different lithological
units, particularly, the Dunamagan Granite (labelled C in
Figure 2a) and the distinction between the schist and felsic
intrusive rocks of the Humber zone. With the addition of
secondary k-means clustering (Figure 2c) additional units
SOM and k-means ClusteringLessons learned
are delineated. For example, the Lushes Bight and
Springdale groups (labelled E and D respectively on Figure
2a) are clearly differentiated with the extent of the groups
well replicated by the k-means clusters.
Example 2, Decorah, Iowa, USA
The mid-continent rift is a failed Precambrian rift system
(Stein et al., 2014). A suite of high-resolution airborne
magnetic, gravity and gravity gradiometry data were
collected by the USGS (Figure 3) over the Decorah area in
northeast Iowa and southeast Minnesota where the basement
rocks, associated with the mid-continent rift, are buried
beneath up to 700m of Paleozoic limestones of the Michigan
basin. Drenth et al. (2015) produced a traditional geological
interpretation from the geophysical data (Figure 4a).
Figure 3: Data used in the Decorah SOM.
The SOM and k-means clustering procedures were applied
to the high-resolution USGS magnetic, gravity, and gravity
gradiometry datasets. The SOM predictive maps were
created using a neural map of 62 neurons x 59 neurons in a
hexagonal formation laid out on the surface of a toroid. The
k-means clustering was carried out using 7 cluster centroids.
Figure 4b and c show the results of the stand-alone SOM
process and the combined SOM and k-means clustering
process respectively.
The stand-alone SOM results (Figure 4b) and the SOM
results with secondary clustering (Figure 4c) produce fairly
comparable results. Both locate and delineate many of the
geological units including the Decorah complex (labelled A)
and the mafic intrusions (labelled C, D, F, and G). However,
the results with secondary clustering delineate the silicic
pluton (labelled E) better and differentiate the Decorah
complex from the mafic intrusions.
(a)
(b)
(c)
Figure 4: (a) Geology of the Decorah, Iowa, area (after Drenth
et al., 2015) and the results of the SOM carried out using (a) only
the SOM process and (b) a combination of the SOM and k-means
clustering.
Conclusions
When good choices are made for the SOM and k-means
clustering parameters both are useful machine learning tools
for the interpretation of geophysical data. Making good
choices in the data used as well as the parameters selected
for running the algorithms allows for the production of good
predictive maps. In both examples presented the stand-alone
SOM was able to reproduce the geology to some degree, but
the addition of the k-means clustering resulted in a clear
improvement.
Acknowledgments
We would to acknowledge CSIRO for their provision of
SiroSOM at reduced rates as well as for their technical
support.
... K-means clustering is a well-known and straightforward machine learning method that is only suitable for areas without any gaps in the coverage of the data sets. In contrast, self-organising maps (SOMs) are unsupervised neural network algorithms designed to analyse, visualise, and interpret multidimensional datasets (Carter-McAuslan & Farquharson, 2020). SOMs are artificial neural networks that transform complex, nonlinear statistical relationships among high-dimensional data into simple geometric relationships on a low-dimensional display. ...
Conference Paper
Full-text available
The internal structures and discontinuities of cratons hold considerable economic value due to their tendency for reactivation and different horizontal stress, serving as conduits for fluid flow and mineral deposition over time. Detecting these structures at various depths is critical for accurately mapping prospective zones of metallic mineralisation. This study demonstrates the effectiveness of integrating signal processing, feature extraction, and clustering on magnetic and gravity data for mapping the internal structures of the Gawler Craton, which has undergone rifting, sedimentation, extension, and orogenic processes. This combined approach results in precise internal structural mapping. Validated by three distinct metrics and geological maps, the resulting clustered maps can serve as foundational tools for further exploration and support decision-making in mineral exploration. Our findings indicate that most known metallic mineral occurrences, including all significant ones, are formed near the boundaries of these clusters. Therefore, mapping and targeting these boundaries can significantly reduce the search area for structurally controlled, extension-related mineral systems. Our proposed framework addresses the challenges of mapping hidden shallow and deep crustal structures, enhancing the capabilities of exploration geophysicists and geologists to investigate geological settings and the interiors of cratons. It provides a rapid, reliable, and cost-efficient method for generating geophysical features, which can be used as input to supervised prospectivity mapping workflows to identify favourable sites for mineralisation at any stage of an exploration program.
Chapter
As the current education scenario has transformed itself to online mode, all learning and assessment activities including quizzes, report submissions, problem solving, peer assessment are done online. Identification of students' characteristics in terms of their academic performance and attitude is the need of the hour for personal tutoring. Also, collaborative learning, which forms an integral part of learning, has group formation as an influential activity for the success of learning. This work proposes an intelligent solution to group learners based on their outcomes and participation in various online assessment activities. This chapter considers the online assessment results of the learners and uses Kohonen self-organizing map neural network (SOM) to group the learners. The proposed method is experimented with a student set in the course “Digital Systems” (n=84). MATLAB is used for implementing SOM and the results obtained from simulations confirm the efficacy of the proposed network with 93.33% performance metric.
Article
Full-text available
Large amplitude gravity and magnetic highs over northeast Iowa are interpreted to reflect a buried intrusive complex composed of mafic/ultramafic rocks, the northeast Iowa intrusive complex (NEIIC), intruding Yavapai province (1.8-1.72 Ga) rocks. The age of the complex is unproven, although it has been considered to be Keweenawan (~1.1 Ga). Because only four boreholes reach the complex, which is covered by 200-700 m of Paleozoic sedimentary rocks, geophysical methods are critical to developing a better understanding of the nature and mineral resource potential of the NEIIC. Lithologic and cross cutting relations interpreted from high resolution aeromagnetic and airborne gravity gradient data are presented in the form of a preliminary geologic map of the basement Precambrian rocks. Numerous magnetic anomalies are coincident with airborne gravity gradient (AGG) highs, indicating widespread strongly magnetized and dense rocks of likely mafic/ultramafic composition. A Yavapai age metagabbro unit is interpreted to be part of a layered intrusion with subvertical dip. Another presumed Yavapai unit has low density and weak magnetization, observations consistent with felsic plutons. Northeast-trending, linear magnetic lows are interpreted to reflect reversely magnetized diabase dikes, and have properties consistent with Keweenawan rocks. The interpreted dikes are cut in places by normally magnetized mafic/ultramafic rocks, suggesting that the latter represent younger Keweenawan rocks. Distinctive horseshoe-shaped magnetic and AGG highs correspond with a known gabbro, and surround rocks with weaker magnetization and lower density. Here informally called the Decorah complex, the source body has notable geophysical similarities to Keweenawan alkaline ring complexes, such as the Coldwell and Killala Lake Complexes, and Mesoproterozoic anorogenic complexes, such as the Kiglapait, Hettasch, and Voisey’s Bay intrusions in Labrador. Results presented here suggest thatmuch of the the NEIIC is composed of such complexes, and broadly speaking may be a discontinuous group of several intrusive bodies. Most units are cut by suspected northwest-trending faults imaged as magnetic lineaments, and one produces apparent sinistral fault separation of a dike in the eastern part of the survey area. The location, trend, and apparent sinistral sense of motion are consistent with the suspected faults being part of the Belle Plaine fault zone, a complex transform fault zone within the Midcontinent rift system that is here proposed to correspond with a major structural discontinuity.
Article
Full-text available
A self-organizing map (SOM) approach has been used to provide an integrated spatial analysis and classification of airborne geophysical data collected over the Brazilian Amazon. Magnetic and gamma-ray spectrometric data were used to extract geophysical signatures related to the spatial distribution of rock types and to produce a geologic map over the prospective Anapu-Tuere region. Particular emphasis wars given to discriminating and identifying rock types, and the processes related to gold mineralization, which are known to occur in the Anapu-Tuere region. SOM was able to identify and map distinctive geophysical signatures related to the various geologic units identified on the published geologic map. Furthermore, SOM was able to identify and enhance very subtle signatures derived jointly from the magnetic and gamma ray spectrometric data that could be related to geologic processes present in the area. These results demonstrate the effectiveness of using SOM as a tool for geophysical data analysis and for semiautomated mapping in regions such as the Amazon.
Article
This study investigated gold prospectivity in the Paleoproterozoic Häme Belt, located in southwestern Finland. The Häme Belt comprises calc-alkaline and tholeitic volcanic rocks, migmatites, granitoids, and mafic to ultramafic intrusions. Mineral exploration in the region has resulted in the discovery of several gold occurrences during recent decades; however, no prospectivity modeling for gold has yet been conducted. This study integrated till geochemical and geophysical data to examine and extract data characteristics critical for gold occurrences. Modeling was guided by self-organizing map (SOM) analysis to define essential data associations and to aid in model input data selection and generation. The final fuzzy logic prospectivity model map yielded high predictability values for most known Au or Cu-Au occurrences, but also highlighted new targets for exploration.
Article
[1] The ~1.1 Ga Mid-Continent Rift (MCR), the 3000-km long largely-buried feature causing the largest gravity and magnetic anomaly within the North American craton, is traditionally considered a failed rift formed by isolated midplate volcanism and extension. We propose instead that the MCR formed as part of the rifting of Amazonia (Precambrian northeast South America) from Laurentia (Precambrian North America) and became inactive once seafloor spreading was established. A cusp in Laurentia's apparent polar wander path near the onset of MCR volcanism, recorded by the MCR's volcanic rocks, likely reflects the rifting. This scenario is suggested by analogy with younger rifts elsewhere and consistent with the MCR's extension to southwest Alabama along the East Continent Gravity High, southern Appalachian rocks having Amazonian affinities, and recent identification of contemporaneous large igneous provinces in Amazonia.
Article
This work contains a theoretical study and computer simulations of a new self-organizing process. The principal discovery is that in a simple network of adaptive physical elements which receives signals from a primary event space, the signal representations are automatically mapped onto a set of output responses in such a way that the responses acquire the same topological order as that of the primary events. In other words, a principle has been discovered which facilitates the automatic formation of topologically correct maps of features of observable events. The basic self-organizing system is a one- or two-dimensional array of processing units resembling a network of threshold-logic units, and characterized by short-range lateral feedback between neighbouring units. Several types of computer simulations are used to demonstrate the ordering process as well as the conditions under which it fails.
Article
An overview of the self-organizing map algorithm, on which the papers in this issue are based, is presented in this article.