PosterPDF Available

Abstract

Understanding the distribution of surface geochemistry at broad scales is a significant challenge. This is due to the sparse nature of point data observations and the complex characteristics of the critical zone, the near surface zone between tree tops and groundwater aquifers. This is where the biogeochemical processes that alter the substrate occur. Creating predictive maps of the major elements deepens our understanding of their spatial distribution within exposed rocks and regolith. These maps have applications in understanding the near-surface processes of the Australian continent as well as improving geochemical exploration and environmental management. They also provide insights into the complex interactions and processes occurring within the critical zone.
© Commonwealth of Australia (Geoscience Australia) 2018. This material is released under the Creative Commons Attribution 4.0 International Licence. (http://creativecommons.org/licenses/by/4.0/legalcode)
A machine learning approach to national
scale predictive geochemical maps
For Further Information: John Wilford
Email: john.wilford@ga.gov.au
Ph: +61 2 6249 9455 Web: www.ga.gov.au
1. Why model surface geochemistry?
Understanding the distribution of surface geochemistry at broad scales is a
significant challenge. This is due to the sparse nature of point data observations
and the complex characteristics of the critical zone, the near surface zone
between tree tops and groundwater aquifers. This is where the biogeochemical
processes that alter the substrate occur. Creating predictive maps of the major
elements deepens our understanding of their spatial distribution within exposed
rocks and regolith. These maps have applications in understanding the near-
surface processes of the Australian continent as well as improving geochemical
exploration and environmental management. They also provide insights into the
complex interactions and processes occurring within the critical zone.
2. The Uncover ML approach
The predictive modelling approach we designed is called Uncover ML. It employs
high-performance computing infrastructure and ‘big data’ analytics to leverage a
suite of machine learning algorithms including random forests and cubist. We
used a large number of environmental datasets as covariates in conjunction with
this model to understand their relationships with point observations of the major
elements. The covariate datasets were pre-processed using multi-scaling to
improve the detection of patterns at different landscape levels. Uncover ML
models continuous geochemical surfaces that provide a more nuanced
understanding of the geochemical landscape than typical interpolation methods
such as kriging or inverse distance weighted interpolation.
Sean Chua, John Wilford, Patrice de Caritat, David Champion
GA PP-xxxx | eCat xxxxx
1The model is open source and accessible at:
https://github.com/GeoscienceAustralia/uncover-ml
Figure 1: Predictive map of surface Potassium oxide (K20)
3. Data pre-processing
A key limitation is the reliability of the geochemical point observations used as the
training dataset. A significant effort was made to clean this data and increase the
reliability of our findings. A large number of datasets from State and national
governments along with private contractors were collated. This training dataset
was processed using a robust methodology that standardised units, checked for
consistent chemical analysis methods, grain size, outliers and GPS reliability. We
also ensured matching projected co-ordinate systems between input datasets and
calculated the median of geochemical observations when there were duplicates
recorded across the input data. This data cleaning processing improved the
reliability of our results by an average of 20%.
4. Initial findings
We generated national predictive maps for each of the major elements at 90 meter
resolution. This approach generates uncertainties of model predictions and
provides a repeatable and transparent workflow. Thus this process can be
systematically applied to updated covariates and target data as they become
available. Figure 1 represents a national map of K20 as an example output of our
modelling workflow. The model in Figure 1 was created using a multi-random
forests algorithm with 10 forests and 20 N estimators. This map produced an r2 of
.57 from a 10-fold cross-validation process. Our research demonstrates the value
of big data analytics and high-performance computing in the field of geochemical
modelling.
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.