ArticlePublisher preview available

Swarm optimization based heterogeneous machine learning techniques for enhanced landslide susceptibility assessment with comprehensive uncertainty quantification

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Landslide susceptibility assessment has been a comprehensive tool for decision makers. However, the efficacy of susceptibility model depends on factor selection and the scientific trustworthiness of the results yielded is varying. This research was objectified to select the factors for model construction through an ensemble of genetic algorithm and Boruta algorithm. 1,888 landslides and 1,888 non-landslides points were collected and randomly split into 70:30 ratio for model training and validation purpose. Twenty selected environmental factors were utilized for model construction. Six advanced machine learning models, Sparse Partial Least Square, Bayesian Generalized Linear Model, Neural Network with Principal Component Analysis, Multivariate Adaptive Regression Spline, Boosted Decision Tree and Extreme Gradient Boosting, were used for susceptibility map preparation with their hyperparameters optimized through Particle Swarm Optimization. The models attained astounding prediction results with testing dataset having AUCROC score of 0.84, 0.85, 0.89, 0.89, 0.87, and 0.95 respectively. Following AUCROC, the model performances were validated through the Quality Sum Index (Q’s), which resulted highest quantification for XGBoost model (3.54), which proved the model excellence. The model’s discrimination capability was quantified through Kolmogorov-Smirnov (KS) statistics, which showed XGBoost as the most efficient model having a KS value of 95.8%, following which came the MARS model with KS value of 65.9%. Furthermore, the uncertainty of the model was computed and confidence map (CNFM) was generated for actual susceptibility map. The regional policy makers for disaster mitigation will be greatly benefitted from the findings of this research.
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
Earth Science Informatics (2025) 18:145
https://doi.org/10.1007/s12145-024-01617-8
RESEARCH
Swarm optimization based heterogeneous machine learning
techniques forenhanced landslide susceptibility assessment
withcomprehensive uncertainty quantification
SumonDey1,2 · SwarupDas2
Received: 30 June 2024 / Accepted: 21 September 2024
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2025
Abstract
Landslide susceptibility assessment has been a comprehensive tool for decision makers. However, the efficacy of suscep-
tibility model depends on factor selection and the scientific trustworthiness of the results yielded is varying. This research
was objectified to select the factors for model construction through an ensemble of genetic algorithm and Boruta algorithm.
1,888 landslides and 1,888 non-landslides points were collected and randomly split into 70:30 ratio for model training and
validation purpose. Twenty selected environmental factors were utilized for model construction. Six advanced machine learn-
ing models, Sparse Partial Least Square, Bayesian Generalized Linear Model, Neural Network with Principal Component
Analysis, Multivariate Adaptive Regression Spline, Boosted Decision Tree and Extreme Gradient Boosting, were used for
susceptibility map preparation with their hyperparameters optimized through Particle Swarm Optimization. The models
attained astounding prediction results with testing dataset having AUCROC score of 0.84, 0.85, 0.89, 0.89, 0.87, and 0.95
respectively. Following AUCROC, the model performances were validated through the Quality Sum Index (Q’s), which
resulted highest quantification for XGBoost model (3.54), which proved the model excellence. The model’s discrimination
capability was quantified through Kolmogorov-Smirnov (KS) statistics, which showed XGBoost as the most efficient model
having a KS value of 95.8%, following which came the MARS model with KS value of 65.9%. Furthermore, the uncertainty
of the model was computed and confidence map (CNFM) was generated for actual susceptibility map. The regional policy
makers for disaster mitigation will be greatly benefitted from the findings of this research.
Keywords Landslide susceptibility assessment· Genetic algorithm-Boruta algorithm ensemble· Machine learning·
Particle Swarm Optimization· Model performance evaluation· Uncertainty Analysis
Introduction
Landslides, also known as mass movements of earth materi-
als downward and outwards due to various triggering fac-
tors that may be topographic, geological, and anthropogenic
in nature (Poudyal etal. 2010; Kalantar etal. 2017). The
consequences of landslides can be devastating leading to loss
of lives, community displacement and economic loss (Chen
and Chen 2021). The event of landslides directly impacts
the hilly regions throughout the world. As a quantification,
according to the Centre of Research on Epidemiology of Dis-
asters (CRED), within 1995 to 2014, approximately 3,800
major landslide events took place with more than 11,000
deaths and approximately 1,63,000 fatalities (https:// www.
cred. be/ sites). As documented by Glade etal. (2006), nearly
95% of the overall landslide events took place in underdevel-
oped nations and there exists considerable amount of dam-
age which is nearly 0.05% of the country’s annual income.
Consequently, proper understanding of an area’s suscepti-
bility to landslides becomes crucial for assessment of risk
associated to it, which may result in proper identification of
high-risk zones. Additionally, it may majorly contribute to
the development of comprehensive early warning systems
Communicated by: Hassan Babaie
* Swarup Das
sd.csa@nbu.ac.in
1 Department ofComputer Science & Engineering, Akal
College ofEngineering & Technology, Eternal University,
BaruSahib, HimachalPradesh, India
2 Department ofComputer Science & Technology, University
ofNorth Bengal, Raja Rammohunpur, Darjeeling,
WestBengal, India
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Article
Full-text available
Natural Disasters like landslides affect livelihood and nature. To mitigate this hazard, scientists developed Landslide Susceptibility Mapping (LSM), which helps to identify landslide-prone zones. With the advancements in Geographical Information Systems, machine learning approaches have taken over heuristic techniques for LSM. However, model uncertainty has yet to be considered. This study focused to use the advantage the uncertainty analysis to generate more precise LSM. The present study considered twenty-one geo-environmental factors to evaluate LSM in the Darjeeling Himalayas. 1,888 landslide locations were used to prepare the landslide inventory, and 1,888 non-landslide points were carefully created for model training purposes. Seven advanced machine learning methods, viz., naive Bayes, boosted decision tree, linear discriminant analysis, flexible discriminant analysis, monotone multilayer perceptron, gradient boosting machine, and extreme gradient boosting, were utilized for preparing landslide susceptibility maps. The constructed maps were then categorized into five susceptibility classes, viz., very low, low, moderate, high, and very high, and these were validated through the Area Under Receiver Operating Characteristics curve, Kolmogorov–Smirnov statistics, and Quality Sum method. The machine learning model's performance was evaluated through classification metrics, viz., overall accuracy, sensitivity (recall), specificity, precision, and F1-score. With AUCROC values greater than 0.90 for both the training and testing datasets, KS statistics values of 94.6 and 74.5, respectively, and Quality sum index of 2.671 and 2.058, respectively, XGBoost and GBM were found to be better performing than the rest of the utilized models. An uncertainty analysis was attempted using the coefficient-of-variation method and aleatoric uncertainty (lowest value of 0.024 for XGBoost and highest value of 0.25 for LDA). A confidence map for each susceptibility map was generated, which can be utilized as a reference for policymakers to formulate landslide mitigation strategies on a regional scale.
Article
Full-text available
Landslide susceptibility mapping (LSM) is essential for determining risk regions and guiding mitigation strategies. Machine learning (ML) techniques have been broadly utilized, but the uncertainty and interpretability of these models have not been well-studied. This study conducted a comparative analysis and uncertainty assessment of five ML algorithms—Random Forest (RF), Light Gradient-Boosting Machine (LGB), Extreme Gradient Boosting (XGB), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM)—for LSM in Inje area, South Korea. We optimized these models using Bayesian optimization, a method that refines model performance through probabilistic model-based tuning of hyperparameters. The performance of these algorithms was evaluated using accuracy, Kappa score, and F1 score, with accuracy in detecting landslide-prone locations ranging from 0.916 to 0.947. Among them, the tree-based models (RF, LGB, XGB) showed competitive performance and outperformed the other models. Prediction uncertainty was quantified using bootstrapping and Monte Carlo simulation methods, with the latter providing a more consistent estimate across models. Further, the interpretability of ML predictions was analyzed through sensitivity analysis and SHAP values. We also expanded our investigation to include both the inclusion and exclusion of predictors, providing insights into each significant variable through a comprehensive sensitivity analysis. This paper provides insights into the predictive uncertainty and interpretability of ML algorithms for LSM, contributing to future research in South Korea and beyond.
Article
Full-text available
Rainfall-induced landslides are a major hazard in the Three Gorges Reservoir area (TGRA) of China, encompassing 19 districts and counties with extensive coverage and significant spatial variation in terrain. This study introduces the Gradient Boosting Decision Tree (GBDT) model, implemented on the Google Earth Engine (GEE) cloud platform, to dynamically assess landslide risks within the TGRA. Utilizing the GBDT model for landslide susceptibility analysis, the results show high accuracy with a prediction precision of 86.2% and a recall rate of 95.7%. Furthermore, leveraging GEE’s powerful computational capabilities and real-time updated rainfall data, we dynamically mapped landslide hazards across the TGRA. The integration of the GBDT with GEE enabled near-real-time processing of remote sensing and meteorological radar data from the significant “8–31” 2014 rainstorm event, achieving dynamic and accurate hazard assessments. This study provides a scalable solution applicable globally to similar regions, making a significant contribution to the field of geohazard analysis by improving real-time landslide hazard assessment and mitigation strategies.
Article
Full-text available
Landslide susceptibility mapping (LSM) constitutes a valuable analytical instrument for estimating the likelihood of landslide occurrence, thereby furnishing a scientific foundation for the prevention of natural hazards, land-use planning, and economic development in landslide-prone areas. Existing LSM methods are predominantly data-driven, allowing for significantly enhanced monitoring accuracy. However, these methods often overlook the consideration of landslide mechanisms and uncertainties associated with non-landslide samples, resulting in lower model reliability. To effectively address this issue, a knowledge-guided landslide susceptibility assessment framework is proposed in this study to enhance the interpretability and monitoring accuracy of LSM. First, a landslide knowledge graph is constructed to model the relationships between landslide entities and summarize landslide susceptibility rules. Next, combining the obtained landslide rules with geographic similarity principles, high-confidence non-landslide samples are selected to optimize the quality of the samples. Subsequently, a Landslide Knowledge Fusion Cell (LKF-Cell) is utilized to couple landslide data with landslide knowledge, resulting in the acquisition of informative and semantically rich landslide event features. Finally, a precise and credible landslide susceptibility assessment model is built based on a convolutional neural network (CNN), and landslide susceptibility spatial distribution levels are mapped. The research findings indicate that the CNN-based model outperforms traditional machine learning algorithms in predicting landslide probability; in particular, the Area Under the Curve (AUC) of the model was improved by 3–6% after sample optimization, and the AUC value of the LKF-Cell method was 6–11% higher than the baseline method.
Article
Full-text available
The purpose of this research is to apply and compare the performance of the three machine learning algorithms-Naive Bayes (NB), kernel logistic regression (KLR), and alternation decision tree (ADT) to come up with landslide susceptibility maps for Pengyang County, a landslide-prone area in Ningxia Hui Autonomous Region, China. In the first phase, we constructed a landslide inventory map consisting of 972 landslides and the same quantity of non-landslides based on digital elevation model analysis, survey data and satellite images, then combined the two databases and classified into training and validating subsets randomly at the ratio of 70:30. Secondly, 13 conditional factors were prepared, and feature selection was performed using average merit. Subsequently, we used the area under the receiver operating characteristic curve (AUC), root mean square error, mean squared error, and frequency ratio precision to test the validity and prediction ability of the models. This outcome demonstrated that three models are all predictive and can generate adequate results in the study scope, and the ADT model is entitled with the best performance, whose AUC values are 0.844 for the training dataset and 0.838 for the validation dataset. The next is KLR (0.811 for the training dataset, 0.814 for the validation dataset) and then NB (0.808 for the training dataset, 0.797 for the validation dataset) models. Meanwhile, the frequency ratio precision of ADT model is 0.971, which is higher than KLR (0.844) and NB (0.810). The suggested landslide susceptibility map and corresponding method enable researchers and local authorities in future decision-making for geological disaster prevention and mitigation.
Article
Full-text available
Over the past three decades, Sri Lanka has observed a substantial rise in landslide occurrences linked to intensified rainfall. However, the lack of comprehensive landslide inventories has hampered the development of effective risk analysis and simulation systems, requiring Sri Lanka to rely heavily on foreign-developed models, despite the difficulty of fully examining the similarities between the characteristics of landslides in Sri Lanka and the areas where the model has been developed. Satellite images have become readily available in recent years and have provided information about the Earth’s surface conditions over the past few decades. Thus, this study verifies the utility of satellite images as a cost-effective remote-sensing method to clarify the commonalities and differences in the characteristics of landslides in two regions Ikawa, Japan, and Sabaragamuwa, Sri Lanka, which exhibit different geological formations despite similar annual rainfall. Using Google Earth satellite images from 2013 to 2023, we evaluated land-slide density, types, and geometry. The findings reveal that Ikawa exhibits a higher landslide density and experiences multiple-type landslides. In contrast, both areas have similar initiation areas; however, Sabaragamuwa predominantly experiences single landslides that are widespread and mobile. The findings also reveal that various characteristics of landslides are mainly influenced by varied topography. Here, we confirmed that even in areas where comprehensive information on landslides is conventionally lacking, we can understand the characteristics of landslides by comparing landslide geometry between sites using satellite imagery.
Article
Full-text available
Deriving rainfall thresholds is one of the most convenient and effective empirical methods for formulating landslide warnings. The previous rainfall threshold models only considered the threshold values for areas with landslide data. This study focuses on obtaining a threshold for each single landslide via the geostatistical interpolation of historical landslide–rainfall data. We collect the occurrence times and locations of landslides, along with the hourly rainfall data, for Dazhou. We integrate the short-term and long-term rainfall data preceding the landslide occurrences, categorizing them into four groups for analysis: 1 h–7 days (H1–7), 12 h–7 days (H12–D7), 24 h–7 days (H24–D7), and 72 h–7 days (H72–D7). Then, we construct a rainfall threshold distribution map based on the 2014–2020 data by means of Kriging interpolation. This process involves applying different splitting coefficients to distinguish the landslides triggered by short-term versus long-term rainfall. Subsequently, we validate these thresholds and splitting coefficients using the dataset for 2021. The results show that the best splitting coefficients for H1–D7, H12–D7, H24–D7, and H72–D7 are around 0.19, 0.52, 0.55, and 0.80, respectively. The accuracy of the predictions increases with the duration of the short-term rainfall, from 48% for H1–D7 to 67% for H72–D7. The performance of these threshold models indicates their potential for practical application in the sustainable development of geo-hazard prevention. Finally, we discuss the reliability and applicability of this method by considering various factors, including the influence of the interpolation techniques, data quality, weather forecast, and human activities.
Article
This work explores the role of landslide sampling strategies in landslide susceptibility modelling (LSM) viz. (a) samples from the landslide scarp, (b) centroid of landslide body, and (c) samples from the debris accumulation zone, and discuss the mechanism and predictive capacity of each type in the LSM output. The evaluation took place near the surroundings of Koyna reservoir region, a highly vulnerable zone in Western Ghats, India, that had not undergone a comparable assessment previously. For this, an inventory dataset, featuring over 3000 landslide polygons were mapped following the July 2021 extreme rainfall event, including details on source-accumulation zone separation using high-resolution satellite data. Fourteen landslide conditioning factors (LCF) (topographic, hydrologic, and climate) are then identified as predictors and utilized to train and test with four widely-used machine learning (ML) models. Our findings reveal substantial differences in the areal percentage of landslides within identical classes of LCF when employing distinct sampling strategies, implying potential differences in predictive accuracies. Results show that LSM prepared from scarp zones demonstrated higher pre-dictive power (AUC = 0.95), and random forest outperforms all other ML models. The outcomes of our study aid landslide investigators in evaluating the suitability of landslide data types and models, as they can significantly impact the accuracy of the resulting LSMs.
Chapter
Landslides along highways are unanticipated obstacles in hilly regions as the happening of which results in connectivity detachment of hilly areas from plains. Due to landslides, the lives of human beings get endangered along with economic loss to a large extent. Susceptibility assessment and mapping are crucial aspects of the mitigation and management of landslide catastrophes in a region. This study was undertaken in a quest to assess the performance of the bagging ensemble (Random Forest) and boosting ensemble (extreme gradient boosting) in susceptibility mapping for landslide along the NH-10 (previously NH-31A), from Sivok (West Bengal) to Rangpoo (Sikkim), which is approximately 51.5 km. In total, eleven geo-environmental factors were taken into account for performing the study under consideration. A total of 309 landslide locations were identified for the assessment, 70% of which were considered to be training samples and the remaining 30% to be testing samples. Feature importance was computed with the game theory-based SHapley Additive exPlanations (SHAP) approach. Evaluation of fulfillment of the proposed methods was accomplished with area under the receiver operating characteristics curve (AUROC) in which the Random Forest and extreme gradient boosting methodologies attained 99.17% and 99.45% accuracy, respectively, with the training samples, whereas the attained accuracies for validation datasets were 95.86% (Random Forest) and 96.58% (extreme gradient boosting), respectively. The constructed maps will be beneficial to government policy- and decision-makers in the region to deal with and mitigate the landslide catastrophes.
Article
The assessment of landslide susceptibility often overlooks the influence of forests on shallow landslide mobility, despite its significance. This study delved into the impact of forest presence on shallow landslide mobility during intense rainfall in Mengdong, China. Field investigations were coupled with the analysis of pre- and post-rainfall remote sensing (RS) images to delineate landslides. The ratio of landslide height (H) to travel distance (L) from a digital elevation model (DEM) were used to calculate landslides mobility. Preceding the event, forest coverage was evaluated using the normalized difference vegetation index (NDVI) derived from multiband RS image. The research identified 1531 shallow landslides in the area, revealing a higher concentration of landslides on slopes with elevated NDVI. Results indicated that disparities in soil permeability and cohesion, generating pore water pressure (PWP), triggered clusters of shallow landslides. Shallow landslides exhibit a higher propensity on slopes with elevated NDVI. The dimensions (height and area) of these identified shallow landslides typically exhibit a positive correlation with NDVI, consequently resulting in longer travel distances for landslides occurring on higher NDVI slopes. The average H/L ratio of all identified landslides was about 0.63. H/L generally increases with NDVI and decreases with landslide area. However, due to river channel restrictions, the H/L increases with slope gradient. The findings suggest that the high permeability of areas with tree roots poses a risk to the shallow stability of slopes, yet trees contribute to mitigating landslide mobility.