Conference PaperPDF Available

Minute-scale prediction of soil movement using machine-learning techniques.

Conference Paper

Minute-scale prediction of soil movement using machine-learning techniques.

Abstract and Figures

Changes in the Earth's climate are likely to increase natural hazards like landslides in the hilly regions of north India. Thus, forecasting of these events at local-scale will help improve the preparedness of society in facing landslide disasters. There have been prior machine-learning research to predict landslide occurrence based on the statistical analysis of historical data and different triggering factors. While these attempts have shown promising results, these approaches have been limited to predicting landslides at a daily-scale. In this paper, we overcome the daily-scale limitation and focus on a minute-scale prediction of landslides by monitoring several soil and weather properties from a landslide site at Kamand, Himachal Pradesh. Data about temperature, humidity, rain, atmospheric pressure, light intensity, soil moisture, soil pressure, and soil movement were collected every 11-minutes from a landslide location on the Indian Institute of Technology Mandi campus at Kamand, Himachal Pradesh over a 10-day period in August 2017. The data contained a total of 842 instances to train several supervised machine-learning (ML) techniques. These included logistic regression, C4.5 decision tree, Naive Bayes, random forest and support vector machine with a non-linear polynomial kernel function. These models predicted soil movements as a binary class-problem, where the positive-class corresponded to soil movement, and the negative-class referred to no-movement. As the movement data had several instances of no-movement (732 instances) and a few cases of movement (110 instances; i.e., class-imbalance), accuracy was not a good measure of classification (classification accuracy is likely to be high due to the majority no-movement class). Thus, we assessed different ML techniques using metrics like the True Positive (TP) rate and False Positive (FP) rate. Results revealed that the C4.5 decision tree had the highest TP rate (= 61%) and a low FP rate (= 2%) among all algorithms. Thus, C4.5 decision tree algorithm performed best among the different classifiers. As part of our future research, we plan to explore some techniques to correct the class-imbalance in data and improve our current predictions. Additionally, since our data is a time series, we also plan to investigate time-series forecasting using traditional and deep-learning models in future.
Content may be subject to copyright.
Minute-scale Prediction of Soil Movement using Machine-Learning Techniques
K Agrawal1, S Agrawal2, P Chaturvedi1, 3, *, N Mali1,4, VU Kala1,4, V Dutt1
1Applied Cognitive Science Laboratory, Indian Institute of Technology, Mandi – 175005, India
2Centre for Converging Technologies, University of Rajasthan, Jaipur – 302004, India
3Defence Terrain Research Laboratory, Deference Research and Development Organization, Delhi – 110054, India
4Construction Material laboratory, Indian Institute of Technology Mandi – 175005, India
*corresponding author Email: prateek@dtrl.drdo.in
Abstract
Changes in the Earth's climate are likely to increase natural hazards like landslides in the hilly
regions of north India. Thus, forecasting of these events at local-scale will help improve the
preparedness of society in facing landslide disasters. There have been prior machine-learning
research to predict landslide occurrence based on the statistical analysis of historical data and
different triggering factors. While these attempts have shown promising results, these approaches
have been limited to predicting landslides at a daily-scale. In this paper, we overcome the daily-
scale limitation and focus on a minute-scale prediction of landslides by monitoring several soil and
weather properties from a landslide site at Kamand, Himachal Pradesh. Data about temperature,
humidity, rain, atmospheric pressure, light intensity, soil moisture, soil pressure, and soil movement
were collected every 11-minutes from a landslide location on the Indian Institute of Technology
Mandi campus at Kamand, Himachal Pradesh over a 10-day period in August 2017. The data
contained a total of 842 instances to train several supervised machine-learning (ML) techniques.
These included logistic regression, C4.5 decision tree, Naive Bayes, random forest and support
vector machine with a non-linear polynomial kernel function. These models predicted soil
movements as a binary class-problem, where the positive-class corresponded to soil movement, and
the negative-class referred to no-movement. As the movement data had several instances of no-
movement (732 instances) and a few cases of movement (110 instances; i.e., class-imbalance),
accuracy was not a good measure of classification (classification accuracy is likely to be high due to
the majority no-movement class). Thus, we assessed different ML techniques using metrics like the
True Positive (TP) rate and False Positive (FP) rate. Results revealed that the C4.5 decision tree had
the highest TP rate (= 61%) and a low FP rate (= 2%) among all algorithms. Thus, C4.5 decision
tree algorithm performed best among the different classifiers. As part of our future research, we
plan to explore some techniques to correct the class-imbalance in data and improve our current
predictions. Additionally, since our data is a time series, we also plan to investigate time-series
forecasting using traditional and deep-learning models in future.
Deleted: Agrawal1
Deleted: 2
Deleted: 2
Deleted: Construction Ma terial laboratory, Indian Institute of
Technolo gy Mandi175005 , India
Formatted: Superscript
Deleted: that has aimed
Deleted: landslides
Deleted: at a daily-scale
Deleted: refers
Deleted: suggest
ILC2017_NO_33
2
Keywords: Minute-Scale Prediction, Soil Movement, Machine-Learning, Landslides, True-
positives, False-positives
Introduction
Landslides cause a lot of damages to life and property, block roads, and disrupt the transportation of
goods and services especially in the Himalayan Region of India (Chaturvedi, Shrivastava, & Kaur,
2017). For places at very high altitudes, where everything from food to clothing is imported from
cities, blocking of roads due to landslides is a critical problem. Some of these reasons, including
others, make landslide prediction a problem that needs to be addressed at the earliest.
Machine-learning (ML) techniques, i.e., techniques that enable computers to learn patterns in data
have been gaining a lot of popularity across several real-world domains (Brenning, 2015). In fact,
ML algorithms have recently been used in predicting landslides (Catani, Lagomarsino, Segoni, &
Tofani, 2013; Agrawal, Baweja, Dwivedi, Saha, Prasad, Agrawal, Kapoor, Chaturvedi, Mali, Kala,
& Dutt, 2017). These attempts have not only been able to enhance the accuracy of prediction, but
they have also made the interpretability of different factors involved in triggering a landslide much
clearer (Catani et al. 2013). With the widespread use of ML algorithms and the advent of very high
computational power, machine-learning techniques have become a more analytics-friendly tool
compared to the physics- and geology- based traditional mathematical tools for predicting landmass
movement (Agrawal et al. 2017).
Recent ML research (Pham, Bui, Pourghasemi, Indra, & Dholakia, 2017; Goetz, Brenning,
Petschko, & Leopold, 2015; Bui, Pradhan, Lofman, & Revhaug, 2012) emphasized on predicting
landslides at a daily-scale; however, little research has been done on predicting landslides at a
minute-scale. Predicting landslides at a minute-scale is important as the minute-scale predictions
help to warn people about impending landslides timely. This real-time tracking can also be very
helpful in knowing how active a site could be in terms of its susceptibility to landslides.
Furthermore, machine-learning algorithms could also help us understand the rate of change in site-
specific soil and weather properties, which contribute to triggering of soil movement.
The primary goal of this paper is to predict site-specific soil-movement at the minute-scale by using
traditional ML techniques. We use several ML algorithms like logistic regression (Brenning et al.
2005), C4.5 decision trees (Quinlan, 1986), Naïve Bayes(Pham et al. 2017 ), random forests
(Breiman, 2001), and Support Vector Machines (Vapnik, 1998) for predicting soil-movement at
minute-level. The data used in this study was collected using low-cost sensors from one of the
landslide-prone sites in Kamand, Himachal Pradesh. Since landslides are a rare phenomenon, the
instances where soil-movements are recorded (positive class) are relatively smaller compared to no
Deleted: P.
Deleted: S.
Deleted: and
Deleted: P.
Deleted: A.
Deleted: F.
Deleted: D.
Deleted: S.
Deleted: and
Deleted: V.
Deleted: K.
Deleted: Y.
Deleted: D.
Deleted: R.
Deleted: P.
Deleted: S.
Deleted: S.
Deleted: P.
Deleted: N.
Deleted: V. U.
Deleted: V.
Deleted: ,
Deleted: ,
Deleted: B. T.
Deleted: D. T.
Deleted: H. R.
Deleted: P.
Deleted: and
Deleted: M. B.
Deleted: J. N.
Deleted: A.
Deleted: H.
Deleted: and
Deleted: P.
Deleted: Tien
Deleted: B.
Deleted: O.
Deleted: and
Deleted: I.
Deleted: n
Deleted: ,
Deleted: J.
Deleted: ,
Deleted: L.
Deleted: s
Deleted: vector
Deleted: machines
Deleted: V.
ILC2017_NO_33
3
soil-movements (negative class). In such class-imbalanced datasets, accuracy maybe a misleading
performance measure for evaluation. Thus, we use more specific performance measures like the
True Positive (TP) Rate and False Positive (FP) Rate for evaluating the performance of different
ML techniques.
In what follows, first, we provide a brief overview of the research that has been conducted on
sensors for real-time monitoring of on-site soil and weather properties. This overview is followed
by a description of traditional ML techniques that have been popularly used in literature. Next, we
detail the study area and data that was collected from the study area using low-cost sensors. Then,
we provide a comparison of different ML techniques in accounting for soil-movement at a minute-
scale at the study area. Finally, we close the paper by highlighting the implications of our results for
predicting soil-movement at a minute-scale and future research directions.
Previous Work
Prior research has used different methods for site-specific real-time monitoring of soil properties,
soil movement, and weather (Ramesh, 2014). Some of these methods include visual interpretation
of stereoscopic aerial photographs (Podolszki, 2014), Satellite Technology (Pham et al. 2017),
Unmanned Aerial Vehicles (UAVs) – based remote sensing (Neithammer, James, & Rothmund,
2012), Digital Elevation Models (DEMs) from airborne laser altimetry data(Mckean, & Roering,
2004), and Brillouin Optical Time-Domain Reflectometry (BOTDR) (Zhang, Bin, & Hong-Zhoung,
2004). In India, several research organizations like Geological Survey of India, Central Building
Research Institute, Defence Terrain Research Laboratory, and Amrita University have worked in
the field of landslide monitoring and warning using conventional sensors and systems for
monitoring various soil and weather parameters(Kanungo, Maletha, Singh, & Sharma, 2017).
However, the cost of these sensors and systems is presently very high and the accuracy of these
systems are unknown in terms of minute-scale landslide predictions. These limitations restrict the
large-scale deployment of current landslide monitoring sensors and systems (McKean et al. 2004;
Chaturvedi et al. 2017).
Furthermore, there have been several studies that have used certain state-of-the-art machine
learning technique for predicting soil movements (Pham et al. 2017; Goetz et al. 2015; Mathew,
Babu, Kundu, Kumar, & Pant, 2014; Catani et al. 2013). For example, one of the attempts has used
machine-learning algorithms like Multilayer Perceptron, Functional Trees, and Naïve Bayes models
for mapping susceptibility of 430 landslides locations using attributes like slope angle, slope aspect,
elevation, and rainfall (Pham et al. 2017). Another attempt has shown that Random Forests, an
ensemble technique, performs better than other machine-learning techniques for landslide
Deleted: M. V.
Deleted: L.
Deleted: ,
Deleted: U.
Deleted: M. R.
Deleted: S.
Deleted: J.
Deleted: and
Deleted: J.
Deleted: D.
Deleted: S. H. I.
Deleted: and
Deleted: X.
Deleted: D. P.
Deleted: A. K.
Deleted: M.
Deleted: and
Deleted: N.
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: J.
Deleted: D. G.
Deleted: S.
Deleted: K. V.
Deleted: and
Deleted: C. C.
Deleted: ,
Deleted: ,
ILC2017_NO_33
4
susceptibility mapping (Goetz et al. 2015; Catani et al. 2013). Furthermore, some researchers have
used a logistic regression model to predict the slope-failure initiation using the antecedent 30-day
and 15-day rainfall (Mathew et al. 2014). This logistic regression model is further validated through
the Receiver Operating Characteristic (ROC) curve analyses using a set of samples which had not
been used for training the classifier. The model showed an accuracy of 95.1% (Mathew et al. 2014).
Subsequently, (Agrawal et al. 2017) in their study of predicting landslides on a daily-scale along the
National Highway-21 have also used several machine-learning algorithms along with class-
imbalance correction techniques to improve the efficacy of the classifiers. While these studies
prove that machine-learning techniques have shown promising results in predicting the
susceptibility mapping, little research has taken place that investigates the problem of predicting
landslides at a minute-scale. Minute-scale predictions are important to timely warn people about
landslides.
In this paper, we use low-cost sensor technology for sensing different weather and soil parameters
in real-time at a minute-scale. Furthermore, we investigate different ML techniques for predicting
landslides in a minute-scale. As part of this study, we compare five machine-learning algorithms
that include logistic regression (Brenning et al. 2005), C4.5 decision trees (Quinlan et al. 1986),
Naïve Bayes (Tien Bui et al. 2012), random forests (Breiman et al. 2001), and support vector
machines (Vapnik et al. 1998). We evaluate the performance of these algorithms using the standard
10-fold cross-validation technique, where data is randomly and repeatedly divided into non-
overlapping training and test sets (Duda, 2014). The choice of these machine-learning algorithms is
based upon their prior use for landslide predictions (Pham et al. 2017; Goetz et al. 2015; Mathew et
al. 2014; Catani et al. 2013). As decision-tree algorithms have performed well at predicting
landslides at a daily-scale (Goetz et al. 2015; Catani et al. 2013), we expected that the Decision
Tree and Random Forest algorithms would perform well in predicting soil movement at a minute-
scale. Also, the interpretation of decision-tree algorithms is much easier than other machine-
learning techniques making them apt for understanding factors that contribute in triggering of soil
movements.
Landslide site and data collection
The dataset used for this research has been collected from a landslide-prone hill located on the
Indian Institute of Technology Mandi campus at Kamand, Himachal Pradesh (see Figure 1A). An
initial site inspection revealed that a crack had started developing on the top of the hill at the
selected site and a small section of the soil mass had started separating from the hill mass (see
Figure 1B).
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Moved (insertion) [1]
Deleted: ,
Moved up [1]: Subsequ ently, (Agrawal et al, 2017 ) in their study
of predi cting landsl ides on a dai ly-scale along the National Highway-
21 have also used several machine-learning algorithms a long with
class-imbalance correcti on techniques to improve the efficacy of the
classifiers.
Commented [VD1]: Now, mention your ICMLDS work.
COMMENT NOT ADDRESSED
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: R.
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
ILC2017_NO_33
5
A
B
Figure 1. Pictures of the selected landslide site at Kamand, Himachal Pradesh. A. Elevation view of
the landslide site. B. Cracking of the soil at the top of the selected landslide. Different wired sensors
can be seen deployed on the sliding soil mass beyond the crack.
To study the patterns of soil-movement at the site, we buried different sensors on top of the hill and
data was collected every 11-minutes. The system consisted of two types of sensors: surface sensors
and buried sensors. The surface sensors included the following: temperature and humidity sensor,
barometric pressure sensor, light intensity sensor, and a rain gauge. The buried sensors included the
following: soil-moisture sensor, force sensor, and an accelerometer. The sensed values were
recorded every 11 minutes. The soil-moisture sensor used the resistance property to measure water
content in the soil surrounding its electrodes. Resistance is inversely proportional to soil moisture
and output voltage. When the sensor is dry, a high value of resistance is recorded. Force sensor
measured the pressure (in Newtons) due to the internal pressure caused by soil and moisture.
Temperature and humidity measured temperature in °C and humidity in % volume of water in a thin
cylindrical volume of soil surrounding the probe of soil-moisture. Similarly, light and pressure
sensor sensed the induced light in flux and atmospheric pressure in kilo-Pascal (kPa). Rain gauge
collected recordings of rain (in inches) on the site and these readings were reported every 11
minutes. However, one of the sensors used in this study i.e. accelerometer was programmed
differently. The accelerometer was programmed in such a way that whenever it recorded
movement in soil, the rate of change in the angular position (i.e. angular velocity, ) was measured.
The accelerometer sensor reported values as a vector where the first three tuples corresponded to
the three x-, y-, and z-axes acceleration components. In addition, the next three tuples sensed non-
zero angular velocity () along three axes (x, y, z). Whenever any soil-movement was
observed, the angular rotations were summed in these three tuples (x, y, z). Every 11 minutes
Deleted: The sensors communicated with a master microcontroller,
which sent data via a GSM module. Most
Deleted: s
Deleted: was measured
Deleted: of the sensors used in the syst em sent their data ev ery 11-
minute.
Deleted: However , the accelero meter and rain gauge were
programmed
Deleted: they receive d an eventinterrupt due to
Deleted: these valu es were recorded in the micro-controller and
broadcasted every 11-minute.
Deleted: s
Deleted: edits
Deleted: of eight tuples.
Deleted: T
Deleted: ,
Deleted: w
ILC2017_NO_33
6
these tuples were reset to record fresh accelerations and angular movements for a new cycle. The
data collection at the site was done over a 10-day monsoon period between 11th August 2017 and
21st August 2017. The dataset contained 842 data points, where each point recorded different
sensors values every 11-minute. We discuss the data-cleaning and preprocessing techniques in the
next section followed by a brief description of machine-learning classifiers that we have used in this
study.
Methodology
In this section, we describe the techniques that we used for cleaning the data before feeding it to the
machine-learning classifiers. Next, we discuss the machine-learning algorithms used in this study.
Finally, we mention the performance metrics used for evaluating the performance of different ML
algorithms.
1. Data Cleaning
After successfully collecting data from the landslide site, we ran a filter on our data to validate the
recorded rain accumulation, temperature, and humidity values. This validation was done from
multiple weather websites as well as another local weather station installed at Kamand, Himachal
Pradesh. These additional data sources helped us validate weather data collected from our sensors
was accurate. In addition, we performed proper calibration of buried sensors before installing them
on site. The calibration ensured that the data reported by these sensors were accurate.
A machine-learning problem can typically be defined as a mathematical function which takes in
input variables (independent variables) and outputs a decision variable (in our case, the decision is a
value of “Yes (Y)” for soil-movement and a value of “No (N)” for no soil-movement). A data
instance was labeled as ‘Y’ if any of the x-, y-, or z- angular velocities were non-zero. Thus, the
decision variable that we used for classifying soil-movements can be mathematically expressed as:
!"#" $
%
!&
%
'
(
!)
(
'
%
!*
%
Where
!"#"+,
is the decision variable. If
,!"#" - .
, then we classified an instance as Y or soil-
movement else we classified it as N or no soil-movement. Thus, the
,!"#",
was the decision variable
and all other sensed data like temperature, humidity, light intensity, soil moisture, and force were
the independent variables that contributed in the formation of the machine-learning problem. It is
important to note that accelerations, pitch, and roll were not taken as input variables because they
are directly correlated to
!"#"
and the presence of these attributes may make the classifiers biased
Deleted: .
Deleted: The accele rometer sent an interrupt to the micro-
controller every time a motion was detected and then the micro-
controller summed the non-zero angular velocities in each of the
three axes over the 1 1-minute period. These summed a ngular
velocities were broadcasted by the GSM module at end of the 11-
minute period. If there were interrupts during the 11-minute per iod,
then the acc eleration values were averaged over these interrupts and
the averaged accelerations were broadca sted at the end of the 11-
minute period. The micro-controller board also calculated the
orientati on (pitch and roll) of the acceleromete r from the acceleration
values at the end of the 11-minute period. The se pitch and roll values
were broadcasted by the GSM module to a remote-server at the end
of the 11 -minutes period.
Deleted: accelerations
Deleted: were derived from accelerations
Deleted: our
ILC2017_NO_33
7
towards them. In this study however; we are interested to know the correlation between different
soil and weather properties that influence soil-movement.
2. Machine-learning Algorithms
Here, we discuss different machine-learning approaches that have been successful in the past to
predict landslides with higher accuracies. In this paper, we have compared several popular machine-
learning techniques like logistic regression (Brenning et al. 2005), C4.5 decision tree (Quinlan et al.
1986), Naive Bayes (Tien Bui et al. 2012), random forest (Breiman et al. 2001), and support vector
machine with a polynomial kernel function (Vapnik et al. 1998).
Logistic regression has been particularly used in modeling landslides as it provides a probability of
landslide occurrence against every data point using the logit model (Brenning et al. 2005). This
algorithm has been widely used in landslide susceptibility mapping (Mathew et al. 2014). A
decision tree is a hierarchical model composed of decision rules that recursively split independent
variables into zones such that each time maximum balance in each split is achieved (Quinlan et al.
1986). The advantage of decision trees is that they can handle categorical as well as numeric
variables and can incorporate them without strict assumptions on data (Tien Bui et al. 2012). In this
study, we have used the J48 algorithm which is a Java implementation of the C4.5 algorithm (E.
Frank, Hall, & Witten, 2016). The C4.5 uses an entropy-based measure as the attribute selection
criteria on the tree nodes, and it is the same as the ID3 algorithm (Quinlan et al. 1986). Given a
training dataset T with subsets T_i, i = 1,2,...,s, the C4.5 algorithm constructs a decision tree using
the top-down and recursive-splitting technique starting with attributes with the maximum gain
(Quinlan et al. 1986).
A Naïve Bayes (NB) classifier is a classification system based on Bayes' theorem that assumes that
all the attributes are fully independent and give the output class, called the conditional
independence assumption (Tien Bui et al. 2012). The main advantage of the NB classifier is that it
is very easy to construct without needing any complicated iterative parameter estimation schemes
(Tien Bui et al. 2012). In the case of NB classifier, the probability is first calculated for each output
class (Y, N), and the classification is then made for the class with the largest posterior probability.
Random forest (RF) is an ensemble technique that utilizes many classification trees (a ‘forest’) to
stabilize the model predictions (Breiman et al. 2001). The RF algorithm exploits random binary
trees which use a subset of the attributes through bootstrapping techniques: From the original data
set a random selection of the attributes is performed and used to build the model, the data not
included is referred to as “out-of-bag” (OOB) (Breiman et al. 2001). Each tree is developed to
minimize classification errors; but, the random selection influences the results, making a single-tree
Deleted: .
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: M.
Deleted: and
Deleted: I.
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
ILC2017_NO_33
8
classification very unstable. For this reason, the RF method makes use of an ensemble of trees (the
so-called “forest”) thereby ensuring model stability (Breiman et al. 2001). The RF algorithm has
been used in landslide predictions domain and susceptibility modeling by several studies (Goetz et
al. 2015; Catani et al. 2013).
Support Vector Machine is a supervised learning method based on statistical learning theory and the
structural risk minimization principle (Vapnik et al. 1998). Using the training data, SVM implicitly
maps the original input space into a high-dimensional feature space. Subsequently, in the feature
space, the optimal hyper plane is determined by maximizing the margins of class boundaries. We
chose a non-linear Polynomial Kernel function in this paper since it has outperformed other kernels
in prior research (Vapnik et al. 1998).
While each of these machine-learning algorithms could be used with a variety of settings and
procedures for model selection, we chose configurations that we have considered typical based
upon prior applications. All techniques mentioned above were run in the Java-written Weka
package with default parameter settings and using a 10-fold cross-validation approach (Frank et al.
2016; Duda et al. 2004).
3. Analysis Methodology
Accuracy is the most straight-forward way to describe the performance of classifiers. It is defined as
the ratio of instances (both positive and negative) correctly classified by the total number of
instances present in the dataset. However, accuracy can be misleading in predicting natural hazards
like landslides (Batista, Prati, & Monard, 2004). That is because soil-movement (landslide)
occurrence is a rare phenomenon. This property makes landslide-prediction a class-imbalanced
problem. In our study, the distribution of the two classes, i.e., Y and N, are 13% and 87%,
respectively. If a trained classifier is biased towards the N class and labels each instance as
belonging to the N class, then the classifier’s accuracy would be 87%. As a classifier may not
accurately predict the Y class and still may have a high accuracy, we used more specific
performance measures like true-positive (TP) rate and false-positive (FP) rate to compare different
classifiers (Batista et al, 2004). The TP rate is the percentage of landslide instances correctly
classified by the classifier as landslides and the FP rate is the percentage of no-landslide instances
that are classified as landslides by the classifier. Thus, it is desirable for a classifier to possess a
high TP rate and a low FP rate. In the next section, we present the results from different classifiers
using a ten-fold cross-validation approach.
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: vector
Deleted: machi ne
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: G. E.
Deleted: R.
Deleted: and
Deleted: M.
ILC2017_NO_33
9
Results
In this section, we report the results of each classifier and their comparison in correctly predicting
the soil-movements in the dataset.
Table 1: Ten-fold cross-validation comparisons of different classifiers on the landslide dataset
Classifier
Accuracy
True Positive Rate
Logistic Regression
92.16
0.48
C4.5 Decision Tree
92.87
0.61
Naïve Bayes
91.45
0.53
Random Forests
89.07
0.17
Support Vector Machines
90.26
0.27
Table 1 shows the ten-fold cross-validated results from different classifiers. We can observe that the
highest accuracy was obtained for C4.5 decision tree followed by logistic regression. In terms of
interpretability, C4.5 decision tree is a very user-friendly technique as we can print the decision tree
to see the attributes (in levels) that were picked up by the algorithm for classifying the data. Other
classifiers also produced a high accuracy; however, accuracy is likely a biased measured due to the
class-imbalance present in the data set. Thus, next we evaluated the TP and FP rates to compare
different classifiers.
Table 1 show that the C4.5 decision tree had the highest True Positive rate of 0.63 followed and a
moderately low FP rate 0.02. This result indicated that C4.5 decision tree is 63% of the times
correct in predicting soil-movement and at the same time has a relatively low a false-alarm rate.
Such a combination is desirable in the landslides prediction domain as we want both the true-
positive rate to be high and the false-alarm rate to be low. The Naïve Bayes algorithm had the
second highest TP rate of 0.53; however, with a relatively high FP rate of 0.28. These results
suggest that while the Naïve Bayes algorithm accurately predicted the soil-movement more than
50% times, it misclassified no-landslides as landslides in 28% of the instances. Lastly, both Support
Vector Machines and Random Forests had very low FP and TP rates, which indicated that almost
all the instances in the data set were classified as belonging to the N class by these classifiers.
C4.5 Decision Tree
Figure 2 shows the resulting C4.5 decision tree for predicting soil-movement in the data set. As can
be seen in Figure 2, the decision tree suggested that the primary attribute for distinguishing
movement class from the no-movement class was the rain recorded in the last 11-minutes. This
result shows that rain was one of the primary reasons for triggering soil movements at the chosen
location. On descending further one can observe that force (pressure due to moisture soil),
ILC2017_NO_33
10
humidity, temperature, and time were other important attributes in the decreasing order of
importance. Interestingly, other variables like light and soil-moisture did not enter the decision tree.
Thus, there variables did not influence the soil movement compared to other attributes.
To understand the structure of the tree, we used a depth-first search approach wherein we descended
along one path of the tree and inferred each node as we proceed. Thus, if no rain has been recorded
and if no force has been recorded, then movement of soil is unlikely. In contrast, if the force is non-
zero but rain is zero, then humidity is used as a splitting attribute. One can observe in the structure
that force occurs three times along the route with different critical thresholds and each successive
threshold is always greater than the preceding one. This result is notable as it may suggest that three
different forces can attribute to three different magnitudes of soil-movement i.e. no-movement,
moderate movement, and severe movement extending the scope for future research i.e. a multi-class
problem. Furthermore, higher thresholds of humidity and temperature along this path suggests that
soil-movements were more likely when the temperature and humidity were higher than 26ºC and
69%, respectively. This result indicated that soil-movements were more abundant in higher levels of
relative humidity and temperatures. Finally, time was accounted as a factor for splitting the dataset
along this route i.e. when rain was zero, where the threshold was 3.15 am. This result suggested that
movements were more likely to occur after 3am than earlier.
On traversing the tree from the top towards the right branch, one can observe that force (pressure
exerted by moist soil) was an important attribute in the tree. A non-zero force coupled with non-
zero rain was indicative of soil-movement as per this route. Finally, if the force is non-zero then
time is used as the final attribute. The critical time calculated by the C4.5 algorithm along this route
was 2pm, i.e., soil-movements were likely to occur if the time was greater than 2pm than otherwise.
Commented [VD3]: I see that ac celeration also did not enter the
decision tree. Please LET ME KNOW if you really used acceleration
in the machine-le arning algorithms.
Answer: Sir, we are not using any of the MPU attribu tes in training
the classifier.
Commented [VD4]: Write th e rule completely.
Moved (insertion) [2]
Commented [VD5]: But your p roblem is a binary class problem
so don’t claim this extra bit rather state in terms of m ovement and
no-movement.
Moved up [2]: extending the scope for future research i.e. a multi-
class problem.
Commented [VD6]: But your p roblem is a binary class problem
so don’t claim this extra bit rather state in terms of movement and
no-movement.
Commented [VD7]: Cann ot understand this poorly written
sentence.
Answer: Made chang es
Commented [VD8]:
ILC2017_NO_33
11
Fig 2: Decision Tree produced by C4.5 algorithm with Ten-Fold Cross Validation
ILC2017_NO_33
12
Discussion and Conclusions
Till recently, machine-learning (ML) techniques had been used to predict landslides on a daily scale
(Goetz et al. 2015; Catani et al. 2013). In this paper, our primary goal was to try different ML
algorithms to make minute-scale predictions for soil-movements at a landslide site. We compared
and evaluated the performances of five different ML algorithms that have proven to work well in
prior research predictions (Pham et al. 2017; Goetz et al. 2015; Mathew et al. 2014; Catani et al.,
2013). We observed that C4.5 decision tree algorithm outperformed other machine-learning
techniques in predicting soil-movements at the minute-scale. This result agrees with prior literature,
where non-parametric algorithms like Random Forest and C4.5 decision tree had performed
accurately for daily-scale prediction of landslides (Pham et al. 2017; Goetz et al. 2015).
In the C4.5 decision tree algorithm, we found that rain and force (pressure due to moist soil) were
listed as the top decision attributes in splitting the movement and no-movement classes. A non-zero
force and non-zero rain were predicted movement class; whereas, both zero rain and zero force
were predicted as no-movement class. This result may seem primitive; however, it is important as it
confirms our key hypothesis that rainfall and soil-pressure were indicative of soil-movements.
Furthermore force occurred in three different levels of the tree and on each succeeding level the
critical threshold of force was greater than the preceding one. Although we can only speculate
currently, this result perhaps indicates that the three different levels of critical soil-pressure
thresholds likely correspond to the three different magnitudes of soil-movements occurring at the
site. Lastly, two attributes namely light-intensity and soil-moisture did not enter the decision tree.
This result indicates that these sensed values were not important for evaluating soil-movements on a
minute-scale compared to the other attributes. While light-intensity may not be a good factor for
evaluating soil-movements, however; soil-moisture, as per our expectation and prior research, could
be a key factor in determining soil-movements. One explanation for this discrepancy might be the
consistent seepage of water from other internal sources along the hill. This internal seepage of water
resulted in consistently high moisture-values irrespective of the rainfall. Another explanation could
be that the force attributed, which measured the pressure due to moist soil, account for soil-moisture
indirectly as well.
Furthermore, our research also shed light on the methodology to follow while evaluating the
performance of different machine-learning classifiers in data sets involving class imbalance. In
cases of class imbalance, the accuracy is likely to be high and one needs to rely upon specific
performance measures like true-positive and false-positive rates for evaluating machine-learning
algorithm’s performance. These measures enable us to check the performance of classifiers across
both instances of a binary class problem.
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
Deleted: ,
ILC2017_NO_33
13
There are several ideas as part of future research that could help us improve our current results.
First, we plan to deploy our sensors on several other landslide-prone sites beyond the one selected
in this paper. We then plan to use the C4.5 trained decision tree from the above study on different
datasets to measure the robustness of the algorithm. Subsequently, we plan to integrate site-specific
different data sets collected from different hills and combine them into a unified dataset. This
unified dataset could have several soil properties like texture, structure, pore space, and consistence.
Also, these properties could be supported by other properties like local weather, lighting, soil-
moisture, and force.
Second, as part of our future research, we would like to emphasize on evaluating time-series
forecasting techniques like the Auto-Regressive Moving Average (ARIMA) model (Khashei, &
Bijari, 2011)and recurrent neural networks like Long-Short Term Memory (LSTM) models
(Mikolov, Karafiát, Burget, Cernocký, & Khudanpur, 2010) for soil-movement predictions.
ARIMA models have been widely used in financial forecasting where the data is a time-series, like
our landslide dataset (Khashei et al. 2011). Similarly, LSTM models are memory models that keep
a record of past memory of events and how this past memory affects the current predictions
(Mikolov et al. 2010). Furthermore, we would also like to use more sophisticated performance
measures for model comparison as part of our future research. Measures like Area under the
Receiver Operator Characteristics (ROC) curve (Japkowicz, & Stephen, 2002) and sensitivity-index
(d’) (Macmillan, & Creelman, 2010) may provide alternate performance measures for comparing
the performance of different ML techniques.
References
Authors names, year, Title, Journal Name, Volume(Series if any): Page No.
[1] S. L. Gariano, F. Guzzetti, 2016, Landslides in a changing climate, Earth-Science Reviews, 162,
pp 227-252.
[2] B. T. Pham, D. T. Bui, H. R. Pourghasemi, P. Indra, M. B. Dholakia, 2017, Landslide
susceptibility assesssment in the Uttarakhand area (India) using GIS: a comparison study of
prediction capability of naïve bayes, multilayer perceptron neural networks, and functional trees
methods, Theoretical and Applied Climatology, 128(1-2), pp 255-273.
[3] J. N. Goetz, A. Brenning, H. Petschko, P. Leopold, 2015, Evaluating machine learning and
statistical prediction techniques for landslide susceptibility modeling, Computers &
Geosciences, 81, pp 1-11.
Commented [VD9]: Men tion that we need to validate the success
of Decisi on Tree alg orithm acros s several othe r landslid e sites.
Commented [Z10]: Here..
Deleted: M.
Deleted: and
Deleted: M.
Deleted: T.
Deleted: M.
Deleted: L.
Deleted: J.
Deleted: and
Deleted: S.
Deleted: ,
Deleted: ,
Deleted: N.
Deleted: and
Deleted: S.
Deleted: N.
Deleted: and
Deleted: C.
Deleted: and
Deleted: and
Deleted: and
ILC2017_NO_33
14
[4] J. Mathew, D. G. Babu, S. Kundu, K.V. Kumar, and C. C. Pant, 2014, Integrating intensity–
duration-based rainfall threshold and antecedent rainfall-based probability estimate towards
generating early warning for rainfall-induced landslides in parts of the Garhwal Himalaya,
India, Landslides, 11(4), pp 575-588.
[5] A. Brenning, 2005, Spatial prediction models for landslide hazards: review, comparison and
evaluation, Natural Hazards and Earth System Science,9 vol. 5, no. 6, pp. 853-862.
[6] J. Quinlan, 1986, Induction of decision trees, Machine Learning, vol. 1, no. 1, pp. 81-106.
[7] L. Breiman, 2001, Random forests, Machine Learning, vol. 45, no. 1, pp. 5-32.
[8] Tien Bui, B. Pradhan, O. Lofman, I. Revhaug, 2012, Landslide Susceptibility Assessment in
Vietnam Using Support Vector Machines, Decision Tree, and Naïve Bayes Models, Mathematical
Problems in Engineering, vol. 2012, pp. 1-26.
[9] V. Vapnik, 1998, Statistical learning theory. New York: J. Wiley.
[10] G. E. Batista, R. Prati, M. Monard, 2004, A study of the behavior of several methods for
balancing machine learning training data, ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, p.
20.
[11] T. Mikolov, M. Karafiát, L. Burget, J. Cernocký, S. Khudanpur, 2010, Recurrent neural
network based language model, In Interspeech, 2, pp 3.
[12] M. Khashei, M. Bijari, 2011, A novel hybridization of artificial neural networks and ARIMA
models for time series forecasting. Applied Soft Computing, 11(2), pp 2664-2675.
[13] F. Catani, D. Lagomarsino, S. Segoni, V. Tofani, 2013, Landslide susceptibility estimation by
random forests technique: sensitivity and scaling issues, Natural Hazards and Earth System Science,
vol. 13, no. 11, pp. 2815-2831.
[14] L. Podolszki, 2014, Stereoscopic analysis of landslides and landslide susceptibility on the
southern slopes of the Medvednica Mt, Retrieved from: http://hrcak.srce.hr/file/219540
[15] U. Niethammer, M. R. James, S. Rothmund, 2012, UAV-based remote sensing of the Super-
Sauze landslide: Evaluation and results, Engineering Geology, 128, 2-11.
[16] D. Zhang, S. H. I. Bin, X. Hong-Zhong, 2004, Experimental study on the deformation
monitoring of reinforced concrete T-beam using BOTDR, Journal of Southeast University (Natural
Science Edition), 4, 12.
Deleted: and
Deleted: and
Deleted: and
Deleted: and
Deleted: and
Deleted: and
Deleted: and
ILC2017_NO_33
15
[17] J. McKean, J. Roering, 2004, Objective landslide detection and surface morphology mapping
using high-resolution airborne laser altimetry, Geomorphology, 57(3), 331-351.
[18] P. Chaturvedi, S. Shrivastava, P. Kaur, 2017, Landslide Early Warning System Development
using Statistical Analysis of Sensors’ Data at Tangni Landslide, Uttarakhand, India, Advances in
Intelligent Systems and Computing, Springer International Publishing, 547
[19] D. P. Kanungo, A. K. Maletha, M. Singh, N. Sharma, 2017, Ground Based Wireless
Instrumentation and Real Time Monitoring of Pakhi Landslide, Garhwal Himalayas, Uttarakhand
(India), In Workshop on World Landslide Forum, pp. 293-300
[20] M. V. Ramesh, 2014, Design, development, and deployment of a wireless sensor network for
detection of landslides. Ad Hoc Networks, 13, 2-18.
[21] E. Frank, M. Hall, and I. Witten, 2016, Data Mining: Practical Machine Learning Tools and
Techniques, 4th ed. Morgan Kaufmann.
[22] R. Duda, 2004, Pattern Classification 2nd Edition with Computer Manual 2nd Edition Set.
John Wiley & Sons.
[23] N. Macmillan, C. Creelman, 2010, Detection theory. New York, NJ [u.a.]: Psychology Press.
Deleted: and
Deleted: and
Deleted: and
Deleted: and
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Despite our increasing knowledge on the subject, the damage tolls due to landslides are on rise during monsoon in Indian Himalayan terrain. Therefore, the installation of a real-time monitoring system is often a cost-effective risk mitigation measure. A Landslide Observatory with wireless instrumentation for real time monitoring of ground deformation and hydrologic parameters has been established at Pakhi Landslide in Garhwal Himalayas, India. The measurement sensors include in-place inclinometers (IPI), piezometers, wire-line extensometers and an automatic weather station (AWS). The real time data is being monitored to establish warning thresholds. The annual cumulative rainfall during 2015 was 1388 mm with cumulative monsoon period (June to September 2015) rainfall of 825 mm. At the crown of landslide beyond main scarp, there is negligible displacement being the stable part. Within the main body of the landslide, it could be inferred that the colluvium, greatly weathered bedrock and their interface experience somehow greater extent of movement at different depths in comparison to the interface between greatly weathered bedrock and unweathered bedrock. A correlation between higher intensity rainfall events and displacement pattern across the inclinometer sensors is also witnessed. However, these inferences can only be established with further data analysis of later periods.
Article
Full-text available
Warming of the Earth climate system is unequivocal. That climate changes affect the stability of natural and engineered slopes and have consequences on landslides, is also undisputable. Less clear is the type, extent, magnitude and direction of the changes in the stability conditions, and on the location, abundance, activity and frequency of landslides in response to the projected climate changes. Climate and landslides act at only partially overlapping spatial and temporal scales, complicating the evaluation of the climate impacts on landslides. We review the literature on landslide-climate studies, and find a bias in their geographical distribution, with large parts of the world not investigated. We recommend to fill the gap with new studies in Asia, South America, and Africa. We examine advantages and limits of the approaches adopted to evaluate the effects of climate variations on landslides, including prospective modelling and retrospective methods that use landslide and climate records. We consider changes in temperature, precipitation, wind and weather systems, and their direct and indirect effects on the stability of single slopes, and we use a probabilistic landslide hazard model to appraise regional landslide changes. Our review indicates that the modelling results of landslide-climate studies depend more on the emission scenarios, the Global Circulation Models, and the methods to downscale the climate variables, than on the description of the variables controlling slope processes. We advocate for constructing ensembles of projections based on a range of emissions scenarios, and to use carefully results from worst-case scenarios that may over/under-estimate landslide hazards and risk. We further advocate that uncertainties in the landslide projections must be quantified and communicated to decision makers and the public. We perform a preliminary global assessment of the future landslide impact, and we present a global map of the projected impact of climate change on landslide activity and abundance. Where global warming is expected to increase the frequency and intensity of severe rainfall events, a primary trigger of rapid-moving landslides that cause many landslide fatalities, we predict an increase in the number of people exposed to landslide risk. Finally, we give recommendations for landslide adaptation and risk reduction strategies in the framework of a warming climate.
Article
Full-text available
The objective of this study is to make a comparison of the prediction performance of three techniques, Functional Trees (FT), Multilayer Perceptron Neural Networks (MLP Neural Nets), and Naïve Bayes (NB) for landslide susceptibility assessment at the Uttarakhand Area (India). Firstly, a landslide inventory map with 430 landslide locations in the study area was constructed from various sources. Landslide locations were then randomly split into two parts (i) 70 % landslide locations being used for training models (ii) 30 % landslide locations being employed for validation process. Secondly, a total of eleven landslide conditioning factors including slope angle, slope aspect, elevation, curvature, lithology, soil, land cover, distance to roads, distance to lineaments, distance to rivers, and rainfall were used in the analysis to elucidate the spatial relationship between these factors and landslide occurrences. Feature selection of Linear Support Vector Machine (LSVM) algorithm was employed to assess the prediction capability of these conditioning factors on landslide models. Subsequently, the NB, MLP Neural Nets, and FT models were constructed using training dataset. Finally, success rate and predictive rate curves were employed to validate and compare the predictive capability of three used models. Overall, all the three models performed very well for landslide susceptibility assessment. Out of these models, the MLP Neural Nets and the FT models had almost the same predictive capability whereas the MLP Neural Nets (AUC = 0.850) was slightly better than the FT model (AUC = 0.849). The NB model (AUC = 0.838) had the lowest predictive capability compared to other models. Landslide susceptibility maps were final developed using these three models. These maps would be helpful to planners and engineers for the development activities and land-use planning.
Article
Full-text available
Despite the large number of recent advances and developments in landslide susceptibility mapping (LSM) there is still a lack of studies focusing on specific aspects of LSM model sensitivity. For example, the influence of factors such as the survey scale of the landslide conditioning variables (LCVs), the resolution of the mapping unit (MUR) and the optimal number and ranking of LCVs have never been investigated analytically, especially on large data sets. In this paper we attempt this experimentation concentrating on the impact of model tuning choice on the final result, rather than on the comparison of methodologies. To this end, we adopt a simple implementation of the random forest (RF), a machine learning technique, to produce an ensemble of landslide susceptibility maps for a set of different model settings, input data types and scales. Random forest is a combination of Bayesian trees that relates a set of predictors to the actual landslide occurrence. Being it a nonparametric model, it is possible to incorporate a range of numerical or categorical data layers and there is no need to select unimodal training data as for example in linear discriminant analysis. Many widely acknowledged landslide predisposing factors are taken into account as mainly related to the lithology, the land use, the geomorphology, the structural and anthropogenic constraints. In addition, for each factor we also include in the predictors set a measure of the standard deviation (for numerical variables) or the variety (for categorical ones) over the map unit. As in other systems, the use of RF enables one to estimate the relative importance of the single input parameters and to select the optimal configuration of the classification model. The model is initially applied using the complete set of input variables, then an iterative process is implemented and progressively smaller subsets of the parameter space are considered. The impact of scale and accuracy of input variables, as well as the effect of the random component of the RF model on the susceptibility results, are also examined. The model is tested in the Arno River basin (central Italy). We find that the dimension of parameter space, the mapping unit (scale) and the training process strongly influence the classification accuracy and the prediction process. This, in turn, implies that a careful sensitivity analysis making use of traditional and new tools should always be performed before producing final susceptibility maps at all levels and scales.
Article
Full-text available
In order to generate early warning for landslides, it is necessary to address the spatial and temporal aspects of slope failure. The present study deals with the temporal dimension of slope failures taking into account the most widespread and frequent triggering factor, i.e. rainfall, along the National Highway-58 from Rishikesh to Mana in the Garhwal Himalaya, India. Using the post-processed three-hourly rainfall intensity and duration values from the Tropical Rainfall Measuring Mission-based Multi-satellite Precipitation Analysis and the time-tagged landslide records along this route, an intensity–duration (I–D)-based threshold has been derived as I = 58.7D −1.12 for the rainfall-triggered landslides. The validation of the I–D threshold has shown 81.6 % accuracy for landslides which occurred in 2005 and 2006. From this result, it can be inferred that landslides in the study area can be initiated by continuous rainfall of over 12 h with about 4-mm/h intensity. Using the mean annual precipitation, a normalized intensity–duration relation of NI = 0.0612D −1.17 has also been derived. In order to account for the influence of the antecedent rainfall in slope failure initiation, the daily, 3-day cumulative, and 15- and 30-day antecedent rainfall values associated with landslides had been subjected to binary logistic regression using landslide as the dichotomous dependent variable. The logistic regression retained the daily, 3-day cumulative and 30-day antecedent rainfall values as significant predictors influencing slope failure. This model has been validated through receiver operating characteristic curve analysis using a set of samples which had not been used in the model building; an accuracy of 95.1 % has been obtained. Cross-validation of I–D-based thresholding and antecedent rainfall-based probability estimation with slope failure initiation shows 81.9 % conformity between the two in correctly predicting slope stability. Using the I–D-based threshold and the antecedent rainfall-based regression model, early warning can be generated for moderate to high landslide-susceptible areas (which can be delineated using spatial integration of preconditioning factors). Temporal predictions where both the methods converge indicate higher chances of slope failures for areas predisposed to instability due to unfavourable geo-environmental and topographic parameters and qualify for enhanced slope failure warning. This method can be verified for further rainfall seasons and can also be refined progressively with finer resolutions (spatial and temporal) of rainfall intensity and multiple rain gauge stations covering a larger spatial extent.
Article
The principle of Brillouin optical time domain reflectometer (BOTDR) is introduced and its application to the strain monitoring of the strain distribution of reinforced concrete beam is studied. The experimental results show that the strains measured by BOTDR optical fiber sensor are in good agreement with that by strain gauges. By comparing the measured strains of the steel bar inside the concrete with that attached on the concrete surface at the same level, the damage is detected when the load reaches 25 kN. The deflection of the beam is calculated with the strain distribution measured by BOTDR. It is concluded that, as an advanced distributed optical fiber sensing technology, BOTDR is good for measuring the strain distribution of reinforced concrete structures. It is accurate and feasible to be used in the fields of deformation monitoring and damage detection of reinforced concrete structures.
Article
Unmanned aerial vehicles (UAVs) equipped with digital compact cameras can be used to map landslides quickly and at a high ground resolution. Images taken by a radio-controlled mini quad-rotor UAV of the Super-Sauze, France landslide have been used to produce a high-resolution ortho-mosaic of the entire landslide and digital terrain models (DTMs) of several regions. The UAV capability for imaging fissures and displacements on the landslide surface has been evaluated, and the subsequent image processing approaches for suitably georectifying the data have been assessed. For Super-Sauze, horizontal displacements of 7 to 55m between a high-resolution airborne ortho-photo of May 2007 and a UAV-based ortho-mosaic of October 2008 have been measured. Fixed areas of persistent deformation have been identified, producing fissures of different distributions and orientations comparable to glacial crevasses, and relating directly to the bedrock topography. The UAV has demonstrated its capability for producing valuable landslide data but improvements are required to reduce data processing time for the efficient generation of ortho-mosaics based on photogrammetric DTMs, in order to minimise georeferencing errors.