ArticlePublisher preview available

Attribute selection using correlations and principal components for artificial neural networks employment for landslide susceptibility assessment

To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Landslide susceptibility maps can be developed with artificial neural networks (ANNs). In order to train our ANNs, a digital elevation model (DEM) and a scar map of one previous event were used. Eleven attributes are generated, possibly containing redundant information. Our base model is formed by, essentially, one input (the DEM), eleven attributes, 30 neurons, and one output (susceptibility). Principal components (PCs) group information in the first projected variables, the last ones can be expendable. In the present paper, four groups of models were trained: one with eleven attributes generated from the DEM; one with 8 out of 11 attributes, in which 3 were eliminated by their high correlation with others; other, with the data projected over its PCs; and another, using 8 out of 11 PCs. The used number of neurons in hidden layer is 30, calibrated based on a complexity analysis that is an in-house developed method. The ANN models trained with the original data generated better statistical results than their counterparts trained with the PC transformed input. Keeping the original 11 attributes calculated provided the best metrics among all models, showing that eliminating attributes also eliminates information used by the model. Using 11 PC transformed attributes hindered trained. However, for the model with eight PCs, training was much faster than its counterpart with little accuracy loss. The metrics and maps achieved were considered acceptable, conveying the power of our model based on ANNs, which uses essentially one input (the DEM) for mapping areas susceptible to mass movements.
This content is subject to copyright. Terms and conditions apply.
/ Published online: 21January2020
Attribute selection using correlations and principal
components for artificial neural networks employment
for landslide susceptibility assessment
ısa Vieira Lucchese ·
Guilherme Garcia de Oliveira ·
Olavo Correa Pedrollo
Received: 12 May 2019 / Accepted: 11 November 2019
© Springer Nature Switzerland AG 2020
Abstract Landslide susceptibility maps can be devel-
oped with artificial neural networks (ANNs). In order
to train our ANNs, a digital elevation model (DEM)
and a scar map of one previous event were used.
Eleven attributes are generated, possibly containing
redundant information. Our base model is formed by,
essentially, one input (the DEM), eleven attributes,
30 neurons, and one output (susceptibility). Principal
components (PCs) group information in the first pro-
jected variables, the last ones can be expendable. In
the present paper, four groups of models were trained:
one with eleven attributes generated from the DEM;
one with 8 out of 11 attributes, in which 3 were elimi-
nated by their high correlation with others; other, with
the data projected over its PCs; and another, using 8
out of 11 PCs. The used number of neurons in hidden
layer is 30, calibrated based on a complexity analy-
sis that is an in-house developed method. The ANN
models trained with the original data generated bet-
ter statistical results than their counterparts trained
ısa Vieira Lucchese ()·Olavo Correa Pedrollo
Instituto de Pesquisas Hidr´
aulicas, Universidade Federal do
Rio Grande do Sul, Av. Bento Gonc¸alves,9500, Porto Ale-
gre, Brazil
Guilherme Garcia de Oliveira
Departamento Interdisciplinar, Universidade Federal do Rio
Grande do Sul, Rodovia RS 030, 11700, km 92. Emboaba,
ı, RS, 95590-000, Brazil
with the PC transformed input. Keeping the origi-
nal 11 attributes calculated provided the best metrics
among all models, showing that eliminating attributes
also eliminates information used by the model. Using
11 PC transformed attributes hindered trained. How-
ever, for the model with eight PCs, training was much
faster than its counterpart with little accuracy loss. The
metrics and maps achieved were considered accept-
able, conveying the power of our model based on
ANNs, which uses essentially one input (the DEM) for
mapping areas susceptible to mass movements.
Keywords Landslide ·Multilayer perceptron ·
Dimensionality reduction ·Susceptibility map
Landslides are natural hazards that occur when a mass
of soil detaches from its place and slides down a
slope (Cruden 1991), possibly causing damage to lives
and properties. With worldwide population growth,
human occupation of hazardous areas has substan-
tially increased over the past decades, and the impact
of natural disasters has been largely magnified in both
industrialized and developing countries (Guzzetti et al.
1999). Between 1971 and 1974, nearly 600 people per
year were killed by landslides (Schuster and Fleming
1986). The fatality rate has increased to 4617 people
per year between 2014 and 2010, during which 32,322
Environmental Monitoring and Assessment (2020) 192: 129
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... Some of the attributes present high intercorrelations with each other. However, in Lucchese et al. (2020), it was shown that even attributes that were intercorrelated by a coefficient higher than 0.7 brought useful information to the model. ...
... The type of ANN used is a multilayer perceptron (MLP) with one hidden layer (Fig. 3), consisting of nh = 30 neurons. The number of neurons in the hidden layer is determined by using a novel approach (Lucchese et al., 2020). It consists in choosing the minimum number of neurons in the hidden layer for which the relationship between the input and the output variables is well represented. ...
Full-text available
Two Artificial Intelligence (AI) methods, Fuzzy Inference System (FIS) and Artificial Neural Network (ANN), are applied to Landslide Susceptibility Mapping (LSM), to compare complementary aspects of the potentials of the two methods and to extract physical relationships from data. An index is proposed in order to rank and filter the FIS rules, selecting a certain number of readable rules for further interpretation of the physical relationships among variables. The area of study is Rolante river basin, southern Brazil. Eleven attributes are generated from a Digital Elevation Model (DEM), and landslide scars from an extreme rainfall event are used. Average accuracy and area under Receiver Operating Characteristic curve (AUC) resulted, respectively, in 81.27% and 0.8886 for FIS, and 89.45% and 0.9409 for ANN. ANN provides a map with more amplitude of outputs and less area classified as high susceptibility. Among the 40 (10%) best-ranked FIS rules, 13 have high susceptibility output, while 27 have low; a cause is that low susceptibility areas are larger on the map. Slope is highly connected to susceptibility. Elevation, when high (plateau) or low (floodplain), inhibits high susceptibility. Six attributes show the same fuzzy set for the 18 best-ranked rules, meaning this fuzzy set is common on the map. Overall findings point out that ANN is best suited for LSM map generation, but, based on them, using FIS is important to help researchers understand more about AI models for LSM and about landslide phenomenon.
... 84 A Geographic Information System (GIS) is an effective tool for landslide assessment. 85 GIS-based qualitative and quantitative models are commonly used to generate landslide 86 susceptibility maps [15]. Qualitative model largely depends on experts' observations (ex-87 pert knowledge-driven models) [16,17]. ...
... Machine Learning Al-89 gorithms (MLAs) and artificial intelligence methods have become popular for various ap-90 plications in multiple fields in recent years. MLAs can be used to design self-improving 91 models in order to obtain desired results [15,[20][21][22][23][24]. Each of the models has its advantages 92 and disadvantages. ...
Full-text available
Among natural hazards, landslides are devastating in China. However, little is known regarding potential landslide-prone areas in Maoxian County. The goal of this study was to apply four deep learning algorithms, including the Convolutional Neural Network (CNN), Deep Neural Network (DNN), Long Short-Term Memory (LSTM) networks, and Recurrent Neural Network (RNN) in evaluating the possibility of landslides throughout Maoxian County, Sichuan, China. A total of 1,290 landslide records was developed using historical records, field observations, and remote sensing techniques. The landslide susceptibility maps showed that most susceptible areas were along the Minjiang River and in some parts of the southeastern portion of the study area. Slope, rainfall, and distance to faults were the most influential factors affecting landslide occurrence. Results revealed that proportion of landslide susceptible areas in Maoxian County was as follows: identified landslides (13.65–23.71%) and non-landslides (76.29–86.35%). The resultant maps were tested against known landslide locations using the Area Under the Curve (AUC). This study indicated that DNN algorithm performed better than LSTM, CNN, and RNN in identifying land-slides in Maoxian County, with AUC values (for prediction accuracy) of 87.30%, 86.50%, 85.60%, and 82.90%, respectively. The results of this study are useful for future landslide risk reduction along with devising sustainable land use planning in the study area.
... Previous studies adopted various statistical measures and methods to evaluate the performance of a model. For instance, Petschko et al. (2014), Pineda et al. (2016), and Rodrigues et al. (2021) employed the area under the receiver operating characteristic curve (AUROC) to assess model's predictive ability; Lin et al. (2019), Tanyas et al. (2019), and Lucchese et al. (2020) used the proportion of true positives and true negatives among the total number of samples examined in a confusion matrix, namely the overall accuracy (ACC) of models trained and validated by different inventories to assess the model performance; Chung and Fabbri (2008), Ozioko and Igwe (2020), and Tien Bui et al. (2020) used the area under the success rate curve (AUSRC) to evaluate the predictive ability. Moreover, Knevels et al. (2020) and Lei et al. (2020) performed the Wilcoxon signedrank test to compare performances of susceptibility models constructed by various methods. ...
Full-text available
When performing a landslide susceptibility analysis, a model is usually established on the basis of a multi-temporal or event-triggered landslide inventory. Because multi-temporal landslide inventories for most areas are rarely available, an event-triggered landslide inventory is often used, but the result depends on the selection of single event. In order to establish a landslide susceptibility model with a good prediction performance, the present study tried to find out how to select a single event-triggered landslide inventory, and investigated the effect of various combinations of event inventories. We selected Shihmen reservoir watershed as the research area, conducted a logistic regression analysis to build 23 event-based landslide susceptibility models and one multi-year landslide susceptibility model, and estimated the performance of these models. In addition, this study further assessed the influence of event characteristics on the model prediction performance, used the above results to merge two different events, and then established models based on these combinations. The results indicated that when establishing an event-based landslide susceptibility model, selecting events with suitable rainfall return periods and landslide density can yield robust models with relatively high predictive ability. Furthermore, the combination of two events which negatively correlate with each other in rainfall spatial distributions can enhance a model’s predictive ability and modeling efficiency.
... The weight values are defined during the training process. The number of neurons in the hidden layer (m) determines the ANN complexity and is assessed based on a search process for a network with the least complexity that still obtains the same performance, with a validation sample, as an initial network, that is purposefully over-sized [77]. The ANNs were developed with the most common inputs used in models for G computation [43][44][45][46][47]. ...
Full-text available
Soil heat flux (G) is an important component for the closure of the surface energy balance (SEB) and the estimation of evapotranspiration (ET) by remote sensing algorithms. Over the last decades, efforts have been focused on parameterizing empirical models for G prediction, based on biophysical parameters estimated by remote sensing. However, due to the existing models' empirical nature and the restricted conditions in which they were developed, using these models in large-scale applications may lead to significant errors. Thus, the objective of this study was to assess the ability of the artificial neural network (ANN) to predict mid-morning G using extensive remote sensing and meteorological reanalysis data over a broad range of climates and land covers in South America. Surface temperature (T s), albedo (α), and enhanced vegetation index (EVI), obtained from a moderate resolution imaging spectroradiometer (MODIS), and net radiation (R n) from the global land data assimilation system 2.1 (GLDAS 2.1) product, were used as inputs. The ANN's predictions were validated against measurements obtained by 23 flux towers over multiple land cover types in South America, and their performance was compared to that of existing and commonly used models. The Jackson et al. (1987) and Bastiaanssen (1995) G prediction models were calibrated using the flux tower data for quadratic errors minimization. The ANN outperformed existing models, with mean absolute error (MAE) reductions of 43% and 36%, respectively. Additionally, the inclusion of land cover information as an input in the ANN reduced MAE by 22%. This study indicates that the ANN's structure is more suited for large-scale G prediction than existing models, which can potentially refine SEB fluxes and ET estimates in South America.
... Additionally, we investigated the number of hidden neurons necessary to present a validation set performance similar to an oversized ANN, preventing the development of overly complex models. In Lucchese et al. (2020), a complexity analysis was performed based on a set of repetitions to select the network configuration that resulted in the best validation performance. Here, the complexity analysis was carried out according to the optimal initial weights obtained by the EA result (oriented search). ...
Full-text available
Reservoirs are operated following specific policies, constrained by hydrological and structural conditions. When modeling antropized water systems with reservoirs, the incorporation of existing operating policies is important to improve model capability. However, operating policies are not always available or easy to identify within large-scale multi-reservoir systems, where operation derives from large number of variables and constraints rather than a clear-cut local objective function. This study applies Artificial Neural Networks (ANNs) with the objective of analyzing if local variables (inflow, storage level, and evaporation) of a sub-system part of a large-scale coordinated multi-reservoir system are sufficient predictors of the operational behavior (release decisions) in a daily time step. The sub-system includes the Luiz Gonzaga and Sobradinho reservoirs. Results pointed to a Nash–Sutcliffe efficiency coefficient (NS) of 0.67 to 0.74 and a coefficient of determination (r2) of 0.75, showing that we can predict the sub-system operational behavior most of the time but with some outflow peaks under predicted.
... For example, support vector machine (SVM) [6][7][8], decision tree analysis [9], random forest [9,10], and logistic regression [11,12] have been adopted to produce landslide susceptibility maps with high prediction accuracies. Recent developments in deep learning algorithms have also provided a basis for landslide analysis [13][14][15] that offer better performance and higher accuracy on landslide prediction to conventional ML algorithms. ...
Full-text available
This paper proposes a novel method to incorporate unfavorable orientations of discontinuities into machine learning (ML) landslide prediction by using GIS-based kinematic analysis. Discontinuities, detected from photogrammetric and aerial LiDAR surveys, were included in the assessment of potential rock slope instability through GIS-based kinematic analysis. Results from the kinematic analysis, coupled with several commonly used landslide influencing factors, were adopted as input variables in ML models to predict landslides. In this paper, various ML models, such as random forest (RF), support vector machine (SVM), multilayer perceptron (MLP) and deep learning neural network (DLNN) models were evaluated. Results of two validation methods (confusion matrix and ROC curve) show that the involvement of discontinuity-related variables significantly improved the landslide predictive capability of these four models. Their addition demonstrated a minimum of 6% and 4% increase in the overall prediction accuracy and the area under curve (AUC), respectively. In addition, frequency ratio (FR) analysis showed good consistency between landslide probability that was characterized by FR values and discontinuity-related variables, indicating a high correlation. Both results of model validation and FR analysis highlight that inclusion of discontinuities into ML models can improve landslide prediction accuracy.
... ANN internal complexity was addressed, based on a formulated conceptual paradigm (Lucchese et al. 2020). The objective is to identify a minimum number of hidden neurons resulting in a model as good as an oversized one which has in this study 20 hidden neurons. ...
Full-text available
Real-time forecasting plays a valuable role in the early warning system framework by reducing damage. However, signal loss in telemetric monitoring networks tends to occur during extreme events, precisely when data are needed for forecasting. We present an original approach, consisting of a tree of artificial neural networks (ANNs), with complete and partial models to deal with signal loss scenarios, where we also tested a new type of filter (GWMA – Gamma-Weighted Moving Average) to aggregate data in time and reduce the number of model inputs. In addition to this filter, we tested UWMA (Uniformly Weighted Moving Average), EWMA (Exponentially Weighted Moving Average) and MD (Moving Difference). Novel concepts were used to reduce ANN internal complexity and to identify a training dataset size corresponding to an ideal amount of information, which does not hinder training. We developed a model to forecast the water level up to 24 h ahead at Encantado, in the Taquari-Antas River basin, southern Brazil. The data period comprises hourly records from 26/11/2015 to 24/04/2019. The verification dataset performances of the partial models are compared to the complete model, indicating no substantial loss. The mean absolute error and the Nash–Sutcliffe of the complete model for the lead times of 4, 10, and 20 h are 5.4, 17.7, and 19.4 cm; and 0.99, 0.95, and 0.92, respectively. Therefore, the ANN tree is confirmed as a viable alternative to cope with signal loss scenarios.
... Recursos associados à inteligência artificial têm auxiliado no processamento e uso de uma grande variedade de dados para aplicações na hidrologia moderna (e.g. Lucchese et al., 2020;Kadam et al., 2019;Firat, 2008). Entretanto, alguns dados relevantes por não se enquadrarem em um mesmo padrão podem ser excluídos da análise pelos algoritmos. ...
Full-text available
O conhecimento depende da existência de dados. A hidrologia moderna tem sido, prioritariamente, pautada em dados quantitativos provenientes das estações de monitoramento. No entanto, outras fontes podem fornecer dados relevantes ao avanço da hidrologia como demonstrado por pesquisadores desde a década de 1970. O panorama geral sobre o assunto evidencia a falta de padronização das terminologias dos tipos de dados e de fontes, podendo conduzir a equívocos. Além disso, ainda não existe consenso sobre o uso de dados provenientes de outras fontes, além das estações de monitoramento. Neste contexto, no presente estudo sugere-se a padronização nas denominações dos dados, bem como em suas fontes. Para isso, levou-se em consideração as características do registro, classificando-os em: dados sistemáticos e dados não sistemáticos. Quanto às fontes dos dados, sugeriu-se a classificação de acordo com a sua origem: evidência instrumental in-situ, evidência instrumental orbital, evidência física e evidência documental. Considerando a relevância dos dados não sistemáticos, principalmente ao que tange eventos hidrológicos extremos, sugere-se o uso conjunto de dados sistemáticos e não sistemáticos aplicando o método de triangulação de dados.
The traditional landslide risk research mostly focuses on the spatial distribution of landslide occurrence probability, but the research on the time distribution of landslide occurrence probability is not in-depth, and there is no effective professional management advice for specific risk areas, so it has great limitations. This paper takes the landslide-prone area along the Qingjiang River in Jianshi County, Hubei Province as the research area, takes the slope as the unit, and uses the information value method to complete the landslide geological disaster risk assessment in the study area. Through the characteristics of landslide risk in time and space, the dynamic management measures of landslide risk in different periods of the study area are formulated. The map of landslide prevention and control planning measures in different periods in the study area can be obtained, which provides a basis for realizing the balance between safety and economy of landslide prevention and control planning. This has obvious guiding significance for landslide prevention and control planning in other regions.
Landslide susceptibility assessment using Artificial Neural Networks (ANNs) requires occurrence (landslide) and nonoccurrence (not prone to landslide) samples for ANN training. We present empirical evidence that a priori intervention on the nonoccurrence samples can produce models that are improper for generalization. Thirteen nonoccurrence cases based on GIS data from Rolante River basin (828.26 km²) in Brazil are studied, divided in three groups. The first group was based on six combinations of buffers with different minimum and maximum distances from the mapped scars (BO). The second group (RO) acquired nonoccurrence only from a rectangle in the lowlands, known for not being susceptible to landslides. For BR, six alternatives respectively to BO were presented, with the inclusion of nonoccurrence samples acquired from the same rectangle used for RO. Accuracy (acc) and the Area Under Receiving Operating Characteristic Curve (AUC) were calculated. RO resulted in perfect discrimination between susceptible and not susceptible to landslides (acc = 1 e AUC = 1). This occurred because the model simply provided susceptible classification to points in which attributes are different from those in the rectangle, harming the classification of nonoccurrence sampling points outside the rectangle. RO map shows large areas classified as susceptible which are known to be non-susceptible. In BR, sampling points from the rectangle, which are easy to classify, were added to the verification sample of BR. Average acc for BO 00 m (minimum buffer distance to scars of 0 m): 89.45%, average acc for BR 00 m: 92.33%, average AUC for BO 00 m: 0.9409, average AUC for BR 00 m: 0.9616. Maps of groups BO and BR were alike. This indicates that metrics can be artificially risen if biased samples are added, although the final map is not visibly affected. To avoid this effect, the employment of easily classifiable samples, generated based on expert knowledge, should be made carefully.
Full-text available
We prepared a landslide susceptibility map for the Sarkhoon watershed, Chaharmahal-w-bakhtiari, Iran, using novel ensemble artificial intelligence approaches. A classifier of support vector machine (SVM) was employed as a base classifier, and four Meta/ensemble classifiers, including Adaboost (AB), bagging (BA), rotation forest (RF), and random subspace (RS), were used to construct new ensemble models. SVM has been used previously to spatially predict landslides, but not together with its ensembles. We selected 20 conditioning factors and randomly portioned 98 landslide locations into training (70%) and validating (30%) groups. Several statistical metrics, including sensitivity, specificity, accuracy, kappa, root mean square error (RMSE), and area under the receiver operatic characteristic curve (AUC), were used for model comparison and validation. Using the One-R Attribute Evaluation (ORAE) technique, we found that all 20 conditioning factors were significant in identifying landslide locations, but "distance to road" was found to be the most important. The RS (AUC = 0.837) and RF (AUC = 0.834) significantly improved the goodness-of-fit and prediction accuracy of the SVM (AUC = 0.810), whereas the BA (AUC = 0.807) and AB (AUC = 0.779) did not. The random subspace based support vector machine (RSSVM) model is a promising technique for helping to better manage land in landslide-prone areas.
Full-text available
Landslide susceptibility mapping is vital for landslide risk management and urban planning. In this study, we used three statistical models [frequency ratio, certainty factor and index of entropy (IOE)] and a machine learning model [random forest (RF)] for landslide susceptibility mapping in Wanzhou County, China. First, a landslide inventory map was prepared using earlier geotechnical investigation reports, aerial images, and field surveys. Then, the redundant factors were excluded from the initial fourteen landslide causal factors via factor correlation analysis. To determine the most effective causal factors, landslide susceptibility evaluations were performed based on four cases with different combinations of factors (“cases”). In the analysis, 465 (70%) landslide locations were randomly selected for model training, and 200 (30%) landslide locations were selected for verification. The results showed that case 3 produced the best performance for the statistical models and that case 2 produced the best performance for the RF model. Finally, the receiver operating characteristic (ROC) curve was used to verify the accuracy of each model’s results for its respective optimal case. The ROC curve analysis showed that the machine learning model performed better than the other three models, and among the three statistical models, the IOE model with weight coefficients was superior.
Full-text available
Landslides are typically triggered by earthquakes or rainfall occasionally a rainfall event followed by an earthquake or vice versa. Yet, most of the works presented in the past decade have been largely focused at the single event-susceptibility model. Such type of modeling is found insufficient in places where the triggering mechanism involves both factors such as one found in the Chuetsu region, Japan. Generally, a single event model provides only limited enlightenment of landslide spatial distribution and thus understate the potential combination-effect interrelation of earthquakes-and rainfall-triggered landslides. This study explores the both-effect of landslides triggered by Chuetsu-Niigata earthquake followed by a heavy rainfall event through examining multiple traditional statistical models and data mining for understanding the coupling effects. This paper aims to compare the abilities of the statistical probabilistic likelihood-frequency ratio (PLFR) model, information value (InV) method, certainty factors (CF), artificial neural network (ANN) and ensemble support vector machine (SVM) for the landslide susceptibility mapping (LSM) using high-resolution-light detection and ranging digital elevation model (LiDAR DEM). Firstly, the landslide inventory map including 8459 landslide polygons was compiled from multiple aerial photographs and satellite imageries. These datasets were then randomly split into two parts: 70% landslide polygons (5921) for training model and the remaining polygons for validation (2538). Next, seven causative factors were classified into three categories namely topographic factors, hydrological factors and geological factors. We then identified the associations between landslide occurrence and causative factors to produce LSM. Finally, the accuracies of five models were validated by the area under curves (AUC) method. The AUC values of five models vary from 0.77 to 0.87. Regarding the capability of performance, the proposed SVM is promising for constructing the regional landslide-Remote Sens. 2019, 11, 638 2 of 30 prone potential areas using both types of landslides. Additionally, the result of our LSM can be applied for similar areas which have been experiencing both rainfall-earthquake landslides.
Full-text available
The aim of this paper was to identify and analyze the susceptible areas to debris flow in the Taquari-Antas River basin. We developed a spatial modeling with probabilistic approach involving the morphometric analysis in areas with occurrence of debris flow for the mapping of susceptible areas. The sites were inventoried from satellite images and on-site expeditions, have been mapped 193 scars. Most scars refer to the event occurred in January 2010, in the Forqueta river basin. We deϐined three morphometric attributes for modeling: (i) the average slope ϐiltered in 5x5 window; (ii) altimetry slope of the ramp; (iii) altimetry slope of the hill. These attributes showed a well-deϐined central tendency, low data dispersion and low correlation with each other. The mapped scars of landslides have a total area of 27.3 ha, most of them with a length of more than 150 m and a width of around 10 m. The average altimetric slope of the hills with mass movements was 317 m, with a mean slope of 39%. The results indicate that the susceptible areas to debris ϐlow, 8.147 km² (30% of the basin), principally are located along the erosive escarpment lines, in contact between the Serra Geral and the adjacent geomorphological units. The lines of escarpment erosive are located on the slopes of the das Antas, da Prata, São Marcos, Carreiro, Guaporé, Forqueta, Fão and Taquari river valleys. In absolute terms, the municipalities with most susceptible areas that are Bom Jesus, Jaquirana and Fontoura Xavier. About 40 municipalities present more than 50% of their areas as susceptible to debris flows.
Full-text available
In the present study, Rotation Forest ensemble was integrated with different base classifiers to develop different hybrid models namely Rotation Forest based Support Vector Machines (RFSVM), Rotation Forest based Artificial Neural Networks (RFANN), Rotation Forest based Decision Trees (RFDT), and Rotation Forest based Naïve Bayes (RFNB) for landslide susceptibility modelling. The validity of these models was evaluated using statistical methods such as Root Mean Square Error (RMSE), Kappa index, accuracy, and the area under the success rate and predictive rate curves (AUC). Part of the landslide prone area of Pithoragarh district, Uttarakhand, Himalaya, India was selected as the study area. Results indicate that the RFDT is the best model showing the highest predictive capability (AUC =0.741) in comparison to RFANN (AUC =0.710), RFSVM (AUC =0.701), and RFNB (AUC =0.640) models. The present study would be helpful in the selection of best model for landslide susceptibility mapping.
Landslides represent a part of the cascade of geological hazards in a wide range of geo-environments. In this study, we aim to investigate and compare the performance of two state-of-the-art machine learning models, i.e., decision tree (DT) and random forest (RF) approaches to model the massive rainfall-triggered landslide occurrences in the Izu-Oshima Volcanic Island, Japan at a regional scale. At first, a landslide inventory map is prepared consisting of 44 landslide polygons (10,444 pixels) from aerial photo-interpretation and field surveys. To estimate the robustness of the models, we randomly adapted two different samples (S1 and S2), comprising of both positive and negative cells (70% of total landslides - 7293 pixels) for training and remaining (30%–3151 pixels) for validation. Twelve causative factors including altitude, slope angle, slope aspect, plan curvature, total curvature, compound topographic index, stream power index, distance to drainage network, drainage density, distance to geological boundaries, lithology and cumulative rainfall were selected as predictors to implement the landslide susceptibility model. The area under the receiver operating characteristics (ROC) curves (AUC) and other statistical signifiers were used to verify the model accuracies. The result shows that the DT and RF models achieved remarkable predictive performance (AUC > 0.9), producing near accurate susceptibility maps. The overall efficiency of RF (AUC = 0.956) is found significantly higher than the DT (AUC = 0.928) results. Additionally, we noticed that the performance of RF for modeling landslide susceptibility is very robust even though the training and validation samples are altered. Considering the performances, we suggest that both RF and DT models can be used in other similar non-eruption-related landslide studies in the tephra-deposited rich volcanoes, as they are capable of rapidly generating accurate and stable LSM maps for risk mitigation, management practices, and decision-making. Moreover, the RF-based model is promising and enough to be recommended as a method to map regional landslide susceptibility.
The present study is dealt with the preparation of landslide susceptibility map of Darjeeling Himalaya with the help of GIS tools and artificial neural network (ANN) model. Fifteen landslide causative factors, i.e. elevation, slope aspect, slope angle, slope curvature, geology, soil, lineament density, distance to lineament, drainage density, distance to drainage, stream power index (SPI), topographic wetted index (TWI), rainfall, normalized differential vegetation index (NDVI) and land use and land cover (LULC) were considered to produce the landslide susceptibility zonation map. To generate all these aforesaid causative factors map, topographical maps, geological map, soil map, satellite imageries, Google earth images and some other authorized maps were processed and constructed into a spatial data base using GIS and image processing techniques. The back-propagation method was applied to estimate factor’s weight and the landslide hazard indices were derived with the help of trained back-propagation weights. Then, the landslide susceptibility zonation map of Darjeeling Himalaya was made using GIS tool and classified into five, i.e. very low, low, moderate, high, and very low landslide susceptibility. To validate the prepared landslide susceptibility map, landslide inventory was used and accuracy result was obtained after processing ROC curve. The accuracy of the landslide susceptibility map was 81.5% which is desirable.
Statistically based landslide susceptibility mapping has become an important research area in the last decades, and several bivariate and multivariate statistical approaches to landslide susceptibility assessments have been applied and compared in all regions of the world. The aim of this study was to compare different statistical approaches and to analyse the degree of spatial agreement between the landslide susceptibility maps produced. To this end, we selected seven statistical methods for comparison, namely, landslide density, likelihood ratio, information value, Bayesian model, weights of evidence, logistic regression and discriminant analysis, and then applied these to an inventory comprising 940 translational landslides, in the southeast region of Minas Gerais state in Brazil, at the western edge of the Quadrilátero Ferrífero (642.13 km²). In some statistical approaches, modifications were made to the input dependent variables. The landslides registered in the inventory map have been used in punctual and polygonal form. Six factors were considered as input landslide predisposing factors: slope angle, geomorphological units, slope curvature, lithological units, slope aspect and inverse wetness index. The combination order of the landslide predisposing factors was established based on a sensitivity analysis, which gave rise to five different cartographic combinations. In total, 58 statistical models of landslide susceptibility were produced, and the results were validated using success and prediction rate curves. The spatial agreement evaluation between the model results was carried out with kappa statistics. There were 214 comparisons of spatial agreement involving classified models at three relative degrees of susceptibility (high, medium and low landslide susceptibility classes). The results showed that all of the models so produced had satisfactory validation rates. The best landslide susceptibility models obtained areas under the curve of > 0.80 in the success and prediction rate curves, with emphasis on the weights of evidence, the information value and the likelihood ratio statistical methods. These statistical approaches were performed with the landslides mapped in the form of points. The landslide susceptibility classes of these models visually demonstrated a slightly more irregular spatial distribution when compared to the models performed with landslide polygons. The likelihood ratio model performed with landslide points presented one of the smallest areas for the high susceptibility class and the largest area for the low susceptibility class. The analysis of the spatial agreement showed that the models produced with a polygonal dependent variable tend to be more concordant, regardless of the statistical technique used. Moreover, we verified that spatial agreement tends to increase with increasing accuracy of the models. Despite the discrepancies found, most of the models compared showed a substantial or almost perfect degree of agreement.