ArticlePublisher preview available

Application of geophysical and multispectral imagery data for predictive mapping of a complex geo-tectonic unit: a case study of the East Vardar Ophiolite Zone, North-Macedonia

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

The Random Forest (RF) and K nearest neighbors (KNN) machine learning (ML) algorithms were evaluated for their ability to predict ophiolite occurrences, in the East Vardar Zone (EVZ) of central North Macedonia. A predictive map of the investigated area was created using three data sources: geophysical data (digital elevation model, gravity and geomagnetic), multispectral optical satellite images (Landsat 7 ETM + and their derivatives), and geological data (distance to fault map and ophiolite outcrops map). The research included a comparison and discussion on the statistical and geological findings derived from different training dataset class ratios in relation to a testing dataset characterized by significant class imbalance. The results suggest that the precise selection of a suitable class balance for the training dataset is a critical factor in achieving accurate ophiolite prediction with RF and KNN algorithms. The analysis of feature importance revealed that the Bouguer gravity anomaly map, total intensity of the Earth’s magnetic field reduced to the pole map, distance to fault map, band ratio BR3 map obtained from multispectral satellite images, and digital elevation model are the most significant features for predicting ophiolites within the EVZ. KNN showed poorer results compared to RF in terms of both the evaluation metrics and visual analysis of prediction maps. The methods applied in this research can be applied for predictive mapping of complex geo-tectonic units covered by dense vegetation, and may indicate the presence of these units even if they were not previously mapped, particularly when geophysical data are used as features.
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
Earth Science Informatics (2024) 17:1625–1644
https://doi.org/10.1007/s12145-024-01243-4
RESEARCH
Application ofgeophysical andmultispectral imagery data
forpredictive mapping ofacomplex geo‑tectonic unit: acase study
oftheEast Vardar Ophiolite Zone, North‑Macedonia
FilipArnaut1· DraganaĐurić2· UrošĐurić3· MilevaSamardžić‑Petrović3· IgorPeshevski4
Received: 12 July 2023 / Accepted: 28 January 2024 / Published online: 20 February 2024
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024
Abstract
The Random Forest (RF) and K nearest neighbors (KNN) machine learning (ML) algorithms were evaluated for their abil-
ity to predict ophiolite occurrences, in the East Vardar Zone (EVZ) of central North Macedonia. A predictive map of the
investigated area was created using three data sources: geophysical data (digital elevation model, gravity and geomagnetic),
multispectral optical satellite images (Landsat 7 ETM + and their derivatives), and geological data (distance to fault map
and ophiolite outcrops map). The research included a comparison and discussion on the statistical and geological findings
derived from different training dataset class ratios in relation to a testing dataset characterized by significant class imbal-
ance. The results suggest that the precise selection of a suitable class balance for the training dataset is a critical factor in
achieving accurate ophiolite prediction with RF and KNN algorithms. The analysis of feature importance revealed that the
Bouguer gravity anomaly map, total intensity of the Earth’s magnetic field reduced to the pole map, distance to fault map,
band ratio BR3 map obtained from multispectral satellite images, and digital elevation model are the most significant fea-
tures for predicting ophiolites within the EVZ. KNN showed poorer results compared to RF in terms of both the evaluation
metrics and visual analysis of prediction maps. The methods applied in this research can be applied for predictive mapping
of complex geo-tectonic units covered by dense vegetation, and may indicate the presence of these units even if they were
not previously mapped, particularly when geophysical data are used as features.
Keywords Random Forest· K nearest neighbors· Remote Sensing· Geophysical data· Predictive mapping· East Vardar
Zone
Introduction
In machine learning algorithms (ML), the automatic induc-
tive approach was used to recognize patterns in data, and
the learned pattern relationships were then applied to other
similar data or the same datasets but in different domains
to generate predictions for data-driven classification and
regression problems (Cracknell and Reading 2014). Insitu-
ations involving the prediction of spatially dispersed catego-
ries in extremely complex processes, these algorithms have
proven to be immensely useful (Kanevski etal. 2009). The
Random Forest (RF) algorithm is widely used for predictive
Communicated by H. Babaie.
* Filip Arnaut
filip.arnaut@ipb.ac.rs
Dragana Đurić
dragana.djuric@rgf.bg.ac.rs
Uroš Đurić
udjuric@grf.bg.ac.rs
Mileva Samardžić-Petrović
mimas@grf.bg.ac.rs
Igor Peshevski
pesevski@gf.ukim.edu.mk
1 University ofBelgrade, Institute ofPhysics Belgrade,
Pregrevica 118, 11080Belgrade, Serbia
2 University ofBelgrade, Faculty ofMining andGeology,
Đušina 7, 11000Belgrade, Serbia
3 University ofBelgrade, Faculty ofCivil Engineering,
Bulevar Kralja Aleksandra 73/1, 11000Belgrade, Serbia
4 Ss. Cyril andMethodius University inSkopje, Faculty
ofCivil Engineering MK, Bulevar Partizanski Odredi 24,
1000Skopje, NorthMacedonia
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... The RF model was introduced by Breiman in 2001 [34] and has since become one of the most widely used ML algorithms. The versatility of the RF algorithm in the Earth sciences can be seen in its wide utilization across many fields, such as near-Earth physics [35][36][37][38], lithological prediction [39,40], mineral prospectivity [41][42][43], and land classification [44]. The RF model is a tree-based model that can be seen as a progression and enhancement of decision trees (DTs). ...
Article
Full-text available
Forecasting the future levels of air pollution provides valuable information that holds importance for the general public, vulnerable populations, and policymakers. High-quality data are essential for precise and reliable forecasts and investigations of air pollution. Missing observations arise when the sensors utilized for assessing air quality parameters experience malfunctions, which result in erroneous measurements or gaps in the dataset and hinder the data quality. This research paper presents a novel approach for imputing missing values in air quality data in a univariate approach. The algorithm employs the random forest (RF) algorithm to impute missing observations in a bi-directional (forward and reverse in time) manner for air quality (particulate matter less than 2.5 μm (PM2.5)) data from the Republic of Serbia. The algorithm was evaluated against simple methods, such as the mean and median imputation methods, for missing observations over durations of 24, 48, and 72 h. The results indicate that our algorithm yielded comparable error rates to the median imputation method for all periods when imputing the PM2.5 data. Ultimately, the algorithm’s higher computational complexity proved itself as not justified considering the minimal error decrease it achieved compared with the simpler methods. However, for future improvement, additional research is needed, such as utilizing low-code machine learning libraries and time-series forecasting techniques.
Article
Full-text available
Remotely sensed data has become an effective, operative and applicable tool that provide critical support for geological surveys and studies by reducing the costs and increasing the precision. Advances in remote-sensing data analysis methods, like machine learning algorithms, allow for easy and impartial geological mapping. This study aims to carry out a rigorous comparison of the performance of three supervised classification methods: Random Forest, k-Nearest Neighbor and maximum likelihood using remote sensing data and additional information in Souk El Had N’Befourna region. The enhancement of remote sensing geological classification by using geomorphometric features, principal component analysis, gray level co-occurrence matrix (GLCM) and multispectral data of the Sentinel-2A imagery was highlighted. The Random Forest algorithm showed reliable results and discriminated limestone, dolomite, conglomerate, sandstone and rhyolite, silt and Alluvium, ignimbrite, granodiorite, Lutite, granite, and quartzite. The best overall accuracy (~91%) was achieved by Random Forest algorithm.
Article
Full-text available
Remote sensing technology has advanced significantly, making it useful for geological applications such as structural and lithological mapping, as well as mineral prospecting and mapping. This study looks at how certain enhancement techniques on Landsat 7 ETM+ data can be used to map surface geology, resulting in color composite imagery that can be interpreted and validated through field mapping exercises. Because some of these rocks are poorly exposed and some portions of the research region are inaccessible, field mapping was supplemented by remote sensing lithological mapping techniques. False Color Composite images for bands (7:4:1 and7:5:4), Principal Component analysis (PC1, 2, 3 and PC4, 5, 6) and RGB composite images of Landsat band ratios (1/3:5/7:3/5 and5/1:5/7:4) proved useful in determining the approximate boundaries of the various lithology in the research area. Gneiss and quartzites occur in the central and western quadrants of the study area, mylonites and schist in the eastern quadrant, and amphibolite in the southern and southeastern quadrants, according to GPS sample locations of the individual rock types found in the study area plotted on the processed images. The contact relationships between these rocks are mostly gradational and interlayered. As a result, a thorough examination of satellite optical imagery, such as Landsat data, can greatly aid lithological inquiry and the creation of more detailed geological maps in areas that are inadequately understood.
Article
Full-text available
Remotely sensed data such as satellite photos and radar images can be used to produce geological maps on arid regions, where the vegetation coverage does not have a significant effect. In central Tunisia, the Jebel Meloussi area has unique geological features and characteristic morphology (i.e. flat areas with dune fields in contrast with hills of folded and eroded stratigraphic sequences), which makes it an ideal area for testing new methods of automatic terrain classification. For this, data from the Sentinel 2 satellite sensor and the SRTM-based MERIT DEM (digital elevation model) were used in the present study. Using R scripts and the random forest classification method, modelling was performed on four lithological variables—derived from the different bands of the Sentinel 2 images—and two morphometric parameters for the area of the 1:50,000 geological map sheet no. 103. The four lithological variables were chosen to highlight the iron-bearing minerals since the spectral parameters of the Sentinel 2 sensors are especially useful for this purpose. The training areas of the classification were selected on the geological map. The results of the modelling identified Eocene and Cretaceous evaporite-bearing sedimentary series (such as the Jebs and the Bouhedma Formations) with the highest producer accuracy (> 60% of the predicted pixels match with the map). The pyritic argillites of the Sidi Khalif Formation were also recognized with the same accuracy, and the Quaternary sebhkas and dunes were also well predicted. The study concludes that the classification-based geological map is useful for field geologist prior to field surveys.
Article
Full-text available
Lithological maps derived from traditional pixel-based image analysis are widely established in remote sensing. Recent development of new segmentation-based approaches such as object-based image analysis (OBIA) has led to significant improvements of accuracy in land cover classifications. However, additional enhancements have been reported by coupling OBIA with machine learning (ML) techniques. ML which automates computer algorithms and model building, enables learning, reasoning, and decision making through analysis and interpretation of data patterns and structures. This study is an attempt for evaluating and comparing the performances of four ML classifiers, including support vector machine (SVM), normal Bayes (NB), k-nearest neighbor (k-NN), and random forest (RF). This approach was tested with medium resolution imagery using an object-based classification procedure in a metamorphic complex, SW Iran. The multi-band input datasets were assembled form individual and multi-sensor layers from ASTER, Landsat 8 OLI, Sentinel-1 and Sentinel-2A. Results revealed that combined data sets obtain more reasonable classes for lithology, and combination of ASTER + simulated panchromatic of Sentinel-2A data shows the most efficient results. We also showed that, in general, RF and SVM algorithms were the most advantageous classifiers when using an object-based image analysis approach. When integrating the data sets aiming improvement of spatial resolution, the maximum accuracies of RF, SVM, k-NN and NB algorithms increased from 90% to 98%, 81% to 97%, 52 to 75%, and 40% to 64%, respectively. We also revealed that the integration of Sentinel-1 and Sentinel-2A data was promising and allowed for additional extraction of rock relief.
Article
Full-text available
The importance of land use and cover change (LUCC) has gradually attracted more attention due to its influence on the climate and ecosystem. Consequently, the necessity of accurate LUCC mapping has become increasingly apparent. Over the past decades, although a large number of machine learning algorithms have been developed to improve the accuracy and reliability of remote sensing image classification, especially for LUCC classification, there is a lack of studies that assess the performance of machine learning algorithms in arid desert-oasis mosaic landscapes. In this study, the main objective is to provide a reference for the extraction of LUCC information in dryland regions with oasis-desert mosaic landscapes by comparing the performances of the k-nearest neighbor (KNN), random forest (RF), support vector machine (SVM) and artificial neural network (ANN) for the LUCC classification of the Dengkou Oasis, China. Landsat-8 Operational Land Imager (OLI) image data were used with spectral indices and auxiliary variables that were derived from a digital terrain model to classify 7 different land cover categories. The highest overall accuracy was produced by the ANN (97.16%), which was closely followed by the RF (96.92%), SVM (96.20%), and finally KNN (93.98%); statistically similar accuracies were obtained for the ANN, SVM and RF. The RF algorithm performed well across several aspects, such as stability, ease of use and processing time during the parameter tuning. Overall, the random forest algorithm is a good first choice method for land-cover classification in this study area, and the elevation and some spectral indices, such as the NDVI, MSAVI2 and MNDWI, should be used as variables to improve the overall accuracy.
Article
Full-text available
The study reports and synthesizes the available geological and geochemical data on the East Vardar ophiolites comprising most known occurrences from the South Apuseni Mountains in Romania to the tip of the Chalkidiki Pen-insula in Greece. The summarized geological data suggest that the East Vardar ophiolites are mostly composed of the magmatic sequences, whereas the mantle rocks are very subordinate. The members of the magmatic sequences are characterized by the presence of abundant acid and intermediate volcanic and intrusive rocks. The age of these ophio-lites is paleontologically and radiometrically constrained and these data suggest that the East Vardar ophiolite formed as a short-lived oceanic realm that was emplaced before the uppermost Kimmeridgian. A relatively weak adakitic affinity is shown by intra-ophiolitic acid and intermediate rocks in many East Vardar provinces. It can be taken as evidence that the subduction of the young and hot slab, most likely along the earlier spreading ridge has occurred. A paleo-tectonic reconstruction consisting of four stages is proposed. It involves: a) an early/mid-Jurassic north-northeastward subduction of the West Vardar oceanic plate; b) the formation of a mid-Jurassic volcanic arc and a narrow back-arc oceanic stripe of East Vardar behind it; c) the mid-/Upper Jurassic initiation of East Vardar subduc-tion along the ridge axis, and d) complex and heterogeneous emplacement of the East Vardar ophiolites. So far avail-able data allow for having relatively clear ideas about the origin and evolution of the East Vardar ophiolites. Howev-er, in order to provide better understanding of all aspects of its evolution we need to answer additional questions re-lated to the true structural position of the East Vardar ophiolites slices in Serbia, the exact nature of subduction that caused back-arc spreading (intraoceanic vs subduction under continent?) and the full significance of the adakitic sig-nature shown by rocks in the East Vardar provinces other than Demir Kapija.
Article
Full-text available
As a source of data continuity between Landsat and SPOT, Sentinel-2 is an Earth observation mission developed by the European Space Agency (ESA), which acquires 13 bands in the visible and near-infrared (VNIR) to shortwave infrared (SWIR) range. In this study, a Sentinel-2A imager was utilized to assess its ability to perform lithological classification in the Shibanjing ophiolite complex in Inner Mongolia, China. Five conventional machine learning methods, including artificial neural network (ANN), k-nearest neighbor (k-NN), maximum likelihood classification (MLC), random forest classifier (RFC), and support vector machine (SVM), were compared in order to find an optimal classifier for lithological mapping. The experiment revealed that the MLC method offered the highest overall accuracy. After that, Sentinel-2A image was compared with common multispectral data ASTER and Landsat-8 OLI (operational land imager) for lithological mapping using the MLC method. The comparison results showed that the Sentinel-2A imagery yielded a classification accuracy of 74.5%, which was 2.5% and 5.08% higher than those of the ASTER and OLI imagery, respectively, indicating that Sentinel-2A imagery is adequate for lithological discrimination, due to its high spectral resolution in the VNIR to SWIR range. Moreover, different data combinations of Sentinel-2A + ASTER + DEM (digital elevation model) and OLI + ASTER + DEM data were tested on lithological mapping using the MLC method. The best mapping result was obtained from Sentinel-2A + ASTER + DEM dataset, demonstrating that OLI can be replaced by Sentinel-2A, which, when combined with ASTER, can achieve sufficient bandpasses for lithological classification.
Article
We briefly review the state-of-the-art machine learning (ML) algorithms for mineral exploration, which mainly include random forest (RF), convolutional neural network (CNN), and graph convolutional network (GCN). In recent years, RF, a representative shallow machine learning algorithm, and CNN, a representative deep learning approach, have been proved to be powerful tools for ML-based mapping for mineral exploration. In the future, GCN deserves more attention for ML-based mapping for mineral exploration because of its ability to capture the spatial anisotropy of mineralization and its applicability within irregular study areas. Finally, we summarize the original contributions of the six papers comprising this special issue.
Article
Multidisciplinary exploration data have been widely and successfully applied when using machine learning methods to conduct geological mapping. However, in covered areas such as Jining, Inner Mongolia, China, where remote sensing and geophysical data are unavailable or difficult to obtain, geochemical data become more important. In addition, previous studies have often selected data labels based on geological maps, which are generally obtained by interpolation or extrapolation of field lithological points and so are inherently uncertain. This study collected seven types of 2341 field lithological points and evaluated the errors of each lithological unit, based on a confusion matrix. Using these field lithological points, we applied the random forest (RF) and support vector machine (SVM) methods to delineate basalt in the Jining region by integrating 1:50,000 stream sediment geochemical data. The evaluation indexes of accuracy, precision, recall, and the receiver operating characteristic curve (ROC) all indicated that RF outperformed SVM. Based on the predictions of RF, five types of target areas were generated, which were further verified using Sentinel-2 images. This research highlights that using lithological points as data labels and trace-element stream sediment data as a training dataset can provide encouraging results when conducting lithological mapping in covered areas.
Book
The second edition of The Chemistry of Soils, published in 2008, has been used as a main text in soil-science courses across the world, and the book is widely cited as a reference for researchers in geoscience, agriculture, and ecology. The book introduces soil into its context within geoscience and chemistry, addresses the effects of global climate change on soil, and provides insight into the chemical behavior of pollutants in soils. Since 2008, the field of soil science has developed in three key ways that Sposito addresses in this third edition. For one, research related to the Critical Zone (the material extending downward from vegetation canopy to groundwater) has undergone widespread reorganization as it becomes better understood as a key resource to human life. Secondly, scientists have greatly increased their understanding of how organic matter in soil functions in chemical reactions. Finally, the study of microorganisms as they relate to soil science has significantly expanded. The new edition is still be comprised of twelve chapters, introducing students to the principal components of soil, discussing a wide range of chemical reactions, and surveying important human applications. The chapters also contain completely revised annotated reading lists and problem sets.