Conference PaperPDF Available

Road Surface Recognition Based on DeepSense Neural Network using Accelerometer Data

Authors:

Abstract and Figures

Smartphones play an important role in our lives, which makes them a good sensor for perceiving our environment. Therefore, many applications have emerged using mobile sensors to solve different problems related to activity recognition, health monitoring, transportation, etc. One of the intriguing issues in transportation is mapping our road network's quality, road types, or discover unmapped roads in our road network, which is very costly to maintain and to examine. In this paper, we propose a methodology to recognize different road types by using accelerometer data of smartphones. The approach is based on DeepSense neural network with customised preprocessing and feature engineering steps. In addition, we compared our method performance against Convolutional Neural Network, Fully-connected Neural Network, Support Vector Machine, and RandomForest classifier. Our approach outperformed all four methods, and it was capable of distinguishing three road types (asphalt roads, stone roads, and off-roads). Source Code & Dataset: https://github.com/simonwu53/RoadSurfaceRecognition
Content may be subject to copyright.
Road Surface Recognition Based on DeepSense Neural Network using
Accelerometer Data
( Preprint Version )
Shan Wu1and Amnir Hadachi1
Abstract Smartphones play an important role in our lives,
which makes them a good sensor for perceiving our envi-
ronment. Therefore, many applications have emerged using
mobile sensors to solve different problems related to activity
recognition, health monitoring, transportation, etc. One of
the intriguing issues in transportation is mapping our road
network’s quality, road types, or discover unmapped roads
in our road network, which is very costly to maintain and
to examine. In this paper, we propose a methodology to
recognize different road types by using accelerometer data
of smartphones. The approach is based on DeepSense neural
network with customised preprocessing and feature engineering
steps. In addition, we compared our method performance
against Convolutional Neural Network, Fully-connected Neural
Network, Support Vector Machine, and RandomForest classi-
fier. Our approach outperformed all four methods, and it was
capable of distinguishing three road types (asphalt roads, stone
roads, and off roads).
I. INTRODUCTION
Road surface recognition is a widely investigated problem
where most of the solutions are relying on the usage of
sensor data [1] from cameras, accelerometers, lasers, etc.
With the rapid technological development in transportation
and mobility, it is crucial and necessary to ensure drivers’
safety on different road conditions. Well maintained roads
can reduce the number of accidents or incidents and provide
commuters with comfort during their journey [2]. However,
there are plenty of claims for pothole and vehicle damage,
which has costly consequences financially on the drivers [3]
or it can lead to injuries.
However, it is not easy to examine and recognize the road
surfaces because of the cost of labors to maintain the road
network, and the cost of specific road examination equipment
is also high [4]. Another problem is the installation of the
sensors. The traditional methods require a set up for the
sensors and calibrate them before using them, which makes
the process sophisticated [5].
Nowadays, smartphones have penetrated our daily life
[6]. These little devices are affordable and provide us good
sensors to utilize. There are motion sensors such as ac-
celerometer and gyroscope; environmental sensors such as
barometers and photometers; Position sensors like magne-
tometers, orientation sensors, etc. [7]. Based on the built-
in accelerometer, dozens of applications emerged in various
domains that include road surface recognition. The vibration
1ITS Lab, Institute of Computer Science, University of Tartu, Narva mnt
18, 51009 Tartu, Estonia Shan.Wu@ut.ee
collected from a smartphone’s accelerometer can be pro-
cessed and analyzed to find the patterns for different road
types.
One of the traditional ways to process accelerometer data
is using threshold-based techniques such as Z-peak, Z-diff,
and XZ-ratio. These algorithms are convenient and fast to
implement but do not have satisfactory accuracy due to the
noises [8]. Besides, traditional methods and classifiers are
found hard to be generalized for various road surfaces [9].
Thus, we need a solution to address these challenges.
Another popular topic is deep learning, which is heav-
ily studied by researchers worldwide. The deep learning
method uses multiple computational layers to compress the
input data into an abstract representation. With the help of
the deep convolutional neural network, we could achieve
breakthroughs [10] in image processing, speech recognition,
genomics, etc. Recurrent networks are good at learning
sequential data, which is also known as time-series data
[11]. Hence, deep learning networks are capable of solving
intricate problems by using different structural models, that
include convolutional layers and recurrent layers.
In this paper, a deep learning convolutional neural network
[12] that do not require much computational power was
adopted by us, and we modified it to learn the patterns from
accelerometer data that can work with asphalt roads, stone
roads and off roads. The features are computed and selected
by the FRESH algorithm [13], which automatically analyse
the correlation between features and labels for the time-series
data. We experimented with our proposed method in Tartu,
Estonia, and evaluated it using ten-fold cross-validation. As
a result, our method outperformed other traditional methods
that we have tested in terms of inference accuracy.
II. REL ATE D WORK
Acceleration signal processing and analysis have been
studied by many researchers in many applications such as
human activity recognition [14], gait analysis [15], and road
roughness estimation [16]. Besides, acquiring accelerometer
data becomes more convenient since the penetration of smart-
phones. Thus, we could utilize smartphones to provide the
data we need by using the Android APIs for accelerometer
and gravity sensor [17].
Based on the review from [1], the general process pipeline
consists of five parts, which are sensors data collection, pre-
processing, main process, post-processing, and performance
evaluation. In preprocessing, various filters will be applied
to reduce the noises; Data will be transformed according
to task-specific requirements. During the main process, we
have some common selections such as threshold-based tech-
niques, Dynamic Time Warping (DTW), and vision-based
techniques. Machine learning and 3D reconstruction methods
are also popular recently due to the increasing capacity of
computing resources and advantages in sensors[5].
Mednis et al. [18] designed a real-time road condition
detection algorithm for potholes using the smartphone’s
accelerometer. They used MansOS based software to acquire
the acceleration measurements and transmitted through USB
to the laptop after collecting data. A new G-ZERO algorithm
was introduced by them to detect the vehicle in a free
fall (three axes values close to zero). They compared the
results with the existing Z-THRESH (detect the peak in
the Z-axis measurements which are above the threshold), Z-
DIFF (detect the changes in a particular time window which
are above the threshold), STDDEV(Z) (detect the standard
deviation within a time window) algorithms and found that
all algorithms can produce acceptable results after spending
time to fine-tune the thresholds.
Nericell, which is created by Mohan et al. [8], is a system
using smart phone’s sensors to monitor roads. A variety
of sensors are used in the system, such as accelerometer,
GPS sensors, microphone, and GSM radio to detect the
road bumps, anomalies, honking, and braking. The whole
system is consists of three components: one Windows Mobile
platform device, one smartphone for cellular transmission,
and one device equipped with an accelerometer. In their
system, they depicted a mechanism to reorient the phone’s
coordinate system to the vehicle’s coordinate system. This
feature needs GPS turned on, but it can auto-calibrate the
frames afterward. Z-THRESH is used to detect the anomalies
while at slow speed, and Z-SUS (look for a sustained dip in
measurements for a specific time window) is used at high
speed. Two simple techniques both suffered in false-positive
rates.
Singh et al. [9] proposed a Smart Patrolling system for
road surface monitoring, which uses multiple smartphones
via crowdsourcing. The central server is used to analyze,
filtering, and aggregate the collected data from smartphones.
Virtual orientation is performed in the system to align the
coordinate system between the smartphone and the vehicle.
Several filters are applied to smoothen the raw data, and only
Z-axis measurements are used in the algorithm. They use
Dynamic Time warping (DTW) to compare the accelerome-
ter measurements with the pre-collected template data. This
technique can match two time-independent series data, which
are varied in the speed. The results showed that the DTW
technique outperforms the previous heuristic threshold-based
techniques and reduces false-negative rates. Nevertheless, it
needs to create template data before, and it may vary among
different types of roads.
Harikrishnan et al. [19] researched on road surface mon-
itoring using the smart phone’s accelerometer data. They
focus on the X and Z axis data and applied Max-Abs
filtering before the abnormal event detection and X-Z ratio
filtering after detection. The detection is based on a Gaussian
model with a threshold. However, the system contains many
components depending on the threshold values manually
selected, which could be challenging to adapt to another
scenario.
Aside from the previous threshold-based techniques, some
researchers implemented some basic classifiers to perform
the task. Wolverine, which is proposed by Bhoraskar et al.
[20], uses accelerometer data collected from smartphones
combining with Support Vector Machine (SVM) to classify
two types of road surfaces. They segmented the data into one-
second windows and used the K-means classifier to classify
those windows into two categories. The manually labeling
is conducted after the auto-labeling to correct the labels
further. The features they used are the mean and the standard
deviation in the three coordinate axes but found that the
deviation of the Z-axis, which is aligned with the vehicle’s
vertical displacement, is dominant among all features.
Another RoadSense system which is proposed by Allouch
et al. [21] compared three different classifiers for the road
condition estimation, which are C4.5 decision tree, SVM,
and Naive Bayes. They applied a low pass filter on the
collected sensory data and segmented the data with a length
64 window size. After filtering process, feature extraction
and selection are conducted for both time-series data, and
Fourier Transformed data. Only the relevant features are kept
for the later training process. The results showed that C4.5
outperformed other classifiers for the two-class detection
task.
Hoffmann et al. [22] also used Naive Bayes and K Nearest
Neighbor classifiers to detect three types of road surfaces.
Speed, inclination, acceleration mean, acceleration variance,
and the standard deviation are selected to be the features for
the classifier. However, the results suffer from recognizing
the bumpy roads and the rough roads.
Among all the techniques talked above, threshold-based
techniques have generally faster execution speed. However,
they cannot distinguish well for the bumpy events and not
adaptable for the real road condition [23]. Besides, those
classifiers usually need to fine-tune the parameters for dif-
ferent data from various roads [24].
III. METHODOLOGY
As shown in Figure 1of the proposed method, we first
collect the fixed-length tri-axial accelerometer measurements
from a smartphone, which will add a label to each recorded
data fragment. During the preprocessing stage, we clean the
data and segment the data by a fixed 10 seconds window. The
following massive feature extraction and selection process
will be performed on the segmented data. Then, we only
keep the features that are relevant to the labels and reshape
it to fit the next neural network structure. The final output
from the neural network will be the prediction of which road
type it is.
A. Dataset
The accelerometer data is collected from a collector app
installed in an Android smartphone. There are three sensors
Fig. 1. The pipeline of our proposed methodology which includes an
Android app and a desktop computer.
used in the app, which are accelerometer, gyroscope, and
GPS, where all sensors used the same sampling rate at 10
Hz (10 records in every second). In the end, the collected
data will be written in CSV format, and the information is
described below:
Timestamp: It is recorded in milliseconds, starts from
00:00:00.0, 1st of January, 1970 UTC.
Raw Data: Raw accelerometer data for X, Y, Z axes
respectively. It is raw sensor measurements from the
Android API.
Rectified Data: Virtually-oriented accelerometer data.
This accelerometer data is virtually transformed from
the phone’s coordinate frame to the world coordinate
frame.
B. Coordinate Systems
Although both the smartphone and the earth use an or-
thonormal based coordinate system (showed in Figure 2),
they are usually not aligned. Thus, the vibration measured
in a vehicle may vary depending on the fixed position of
the smartphone inside the car. This problem can be solved
by manually aligning the smartphone’s coordinate system
with the world’s coordinate system. However, if the virtual
transformation is applied automatically while collecting the
data, we no longer need to align the coordinate systems
perfectly, and this makes the analysis much easier. The
transformation algorithm 1was integrated into the Android
collector.
Algorithm 1 The transformation from phone’s coordinate
system to world’s coordinate system in Android.
Gravity gravity values
Magnetic magnetic values
Accphone acceleration in phone’s coordinate system
RgetRotationM atrix(Gravity, Magnetic)
Rinv invert(R)
Accworld multiply(Accphone, Rinv )
Fig. 2. A figure shows the differences between a phone’s coordinate system
and world’s coordinate system.
Based on the algorithm 1, a rotation matrix R transforming
the coordinate space of the device to the 3D coordinate space
in the world. The gravity values and the geomagnetic strength
values are acquired first from the Android APIs. A rotation
matrix is inferred from the acquired two values. Next, the
rotation matrix needs to be transposed before using it because
OpenGL matrices are column-major matrices. Then, we can
calculate the real rotation matrix conveniently using OpenGL
inverse function. Finally, the inverted rotation matrix and
the measurements in three axes are multiplied to get the
measurements in the world’s coordinate system.
(a)
(b)
Fig. 3. Demonstration of the filtering process on Z-axis accelerometer
data collected from a stone road. (a) accelerometer data before sifting.
The measurements inside the red ellipse are considered as outliers. (b)
accelerometer data after sifting.
C. Preprocessing
The collected acceleration measurements in the worlds’
coordinate system are time-series data, which is a series
of data points indexed in time order, and the interval of
two successive measurements is roughly 0.1 second (10
Hz). After transmitting the measurements to the desktop
computer, we noticed that there are some noises and outliers
in the data. As a consequence, we performed a filtering
process to clean the data. As shown at the top of Figure
3(a), this is a fragment of acceleration values from a stone
road where we can find some peaceful fragments in the red
ellipse that should not be collected on a stone road. This
situation may be caused by a car stopped or waiting for a
traffic light.
We first examined all data by applying a slicing 10 second
time window and calculated the standard deviation of the Z-
axis measurements in that window since the Z-axis is aligned
with the normal vector of the earth ground plane. In Figure
3(b), the outliers were removed by an empirical threshold.
Next, we split the filtered original data into small pieces in
order to extract and select the features for machine learning
purposes. A various length of time windows can be selected
based on different projects. For example, a 0.25-second
window was used for human activity recognition (HAR)
problem, which is proposed by Yao et al.[12]. However,
vehicles usually keep driving on the road for a longer time
compared to intricate human movements. Hence, we selected
an empirical 5 seconds time window in this paper to segment
the collected dataset. New labels were generated, which has
the same size of the segmented dataset.
D. Feature Extraction And Selection
Christ et al.[25] proposed a method designed for time-
series data that uses massive Feature Extraction based on
Scalable Hypothesis tests (FRESH). The authors provide
a useful library that is extendable for customized feature
mappings.
The process is constituted by two connected components:
extendable feature mappings and feature selection based on
scalable hypothesis tests. In the first stage, features will be
extracted from the tri-axial segmented accelerometer data
using predefined feature mappings. A feature mapping θ:
RmR, will characterize time-series data and reduce
its dimension. For example, a feature mapping can be a
maximum function that returns the maximum values in the
given sequence. θmax(s) = max{s1, s2, s3, . . . }. We used
in total 65 feature mappings to uniquely describe every data
segment in each axis, such as mean, standard deviation, and
their Fourier Transformed statistics. The full feature mapping
list can be found here [26].
After the first stage, all data segments are transformed into
1D vectors X={θ1, θ2, . . . }, where each element in the
vector is the result of one feature mapping. However, there is
a massive quantity of features calculated, and we only need
to keep the features that are relevant to the labels, which
can improve the inference speed as well. Based on Christ et
al.’s theorem [13] in time series feature extraction: A feature
Xθis relevant or meaningful for the prediction of Yif and
only if Xθand Yare not statistically independent. Hence,
the feature selection can be realized by hypothesis testing, a
mean to determine the probability that a given hypothesis is
true.
H0={Xθis irrelevant for the label Y }
H1={Xθis relevant for the label Y }(1)
Algorithm 2 Feature selection process.
fselected set()
for lin unique(labels)do
lbinary labels == l
fsifted hypothesis test(ftrain, lbinary )
fselected union(fselected, fsif ted)
end for
Dselected Dall[fselected ]
We initiate two hypotheses 1for the features. In this
process, every feature will be examined by one hypothesis
based on the feature’s character (binary or not, continuous
or not). Then we follow [13] to reject the features associated
with the hypothesis when H0is false. The algorithm used is
shown in 2, where fmeans features.
The final step used in feature selection is Principal Com-
ponent Analysis (PCA) to reduce the feature dimension [27]
further. It is a widely used tool to convert a set of features into
principal components where the first component maximizes
the variance. In this method, ten components are kept, which
formed a (N, 10) feature matrix for all data segments. Next,
we combined every two consecutive rows into one training
feature, which will transform the feature matrix into our
training data DT
Sin shape of (N/2,1,10,2) where Sis the
sensory modality, and Tis time windows.
E. DeepSense Learning Framework
Inspired by DeepSense [12], which is a unified deep learn-
ing framework for time-series data that address noise, feature
customization, and different sensory modalities challenges,
we implemented this framework and we adjusted it to our
case. Moreover, we defined in a detailed manner the neural
network specifications while respecting the original structure.
It has an individual convolutional subnet for each sensor,
a merged convolutional subnet, recurrent layers, and final
output layers based on specific tasks. Since we only use
smartphones’ accelerometer in our method, there will be only
one individual subnet which directly connected to the merged
subnet. The general architecture of the DeepSense is shown
in Figure 4.
Each input data has the shape of (d, n f eatures, T ),
where dis the dimension of sensor measurements, Tis
the time window of each input. As aforementioned, our
training data has the shape of (N/2,1,10,2), which has N/2
samples, one sensor measurement dimension after feature
engineering, ten features, and two time windows.
1) Individual Convolutional Subnet: This subnet will first
process one time window DT1
S1from a sample of the input
data. There are three 2D convolution layers with kernel sizes
of (d, conv1a),(1, conv1b), and (1, conv 1c)respectively to
learn the relations among the extracted features. The output
will be an abstract representation of the original features of
that time window. Next, we flatten the output into a 1D
feature vector VT1
S1. This feature vector could be stacked
with other feature vectors VT1
S2...k , which is produced from
other sensors’ individual convolutional subnets to form a
Fig. 4. The architecture of the DeepSense deep learning framework which
includes individual convolutional subnets, a merged convolutional subnets,
recurrent layers and output layers.
(k, f latten1) feature map, where kis the sensor modalities
and flatten1is the length of flattened feature vector.
2) Merge Convolutional Subnet: This subnet has three
2D convolution layers in a similar fashion to the Individ-
ual Convolutional Subnet with kernel sizes of (k, conv2a),
(1, conv2b), and (1, conv2c)respectively to learn the pat-
terns among the sensor modalities. After that, the output is
flattened again into a 1D feature vector VT1
S, which contains
the condensed features from all sensor modalities at this
specific time window. A time window size τwas added
in the end of the VT1
S. Then, we repeat the process of
these two subnets for other time windows and stack all
1D feature vectors VT2...t
Stogether to form a (T, flatten2)
feature map, where Tis all time windows and flatten2is
the length of flattened feature vector. In our experiment, we
used one sensor modality, but still followed the process of
above two subnets. Batch normalization is added between
all convolutional layers to reduce covariance shift [28] and
ReLU is used as activation function.
3) Recurrent Layers: Since our accelerometer data is
time-series data, and the data is organized by fixed-length
time steps, we may want to find the temporal acceleration
patterns across different time steps by using recurrent layers.
Stacked Gated Recurrent Unit (GRU) layers are used in
DeepSense implementation because they are more efficient
and have almost equivalent performance compared with
Long Short-Term Memory (LSTM) layers [29]. There are
two layers where each of them contains ten features and
three features respectively. Three units correspond to our
three categories, and the output of recurrent layers includes
features from all time sequences (shape of (T, gru2)). A
dropout layer with a rate 0.5 is added between stacked GRU
layers.
4) Output Layers: The output layers can be varied depend
on how we want the output looks like for our specific
task. Firstly, we averaged the output from GRU layers over
time sequences based on Equation 2. Next, a softmax layer
is used to generate the possibilities for three categories.
All specification of the abovementioned DeepSense neural
network is shown in Table I.
X= (PT
t=1 xt
T)(2)
Layer Kernel Channels Input
conv1a 1×5128 D1
S1
conv1b 1×3128 conv1a
conv1c 1×3128 conv1b
flatten1 None 256 conv1c
conv2a 1×5128 flatten1
conv2b 1×3128 conv2a
conv2c 1×3128 conv2b
flatten2 None 31744 conv2c
concat None None flatten2*
Layer output shape units Input
gru1 2×10 10 concat
gru2 2×33 gru1
avg 1×3None gru2
softmax 1×3None avg
TABLE I
SPE CIFIC ATI ON OF TH E DEEPSENSE NEURAL NETWORK. * S IGN
INDICATES A COLLECTION OF THE OUTPUT.
IV. EVALUATI ON
In this section, the testing and evaluation process will be
illustrated. All data are collected around the University of
Tartu, Tartu City, Estonia in April using ”Nexus 5”, and
”Samsung S5”. We selected several road surfaces for asphalt,
stone, and off roads. The examples of the road types are
shown in Figure 5:
Asphalt Road: road surfaces covered with asphalt,
and there’s almost no anomalies such as freeways and
highways.
Stone Road: road surfaces covered with stones in
various shapes and sizes, which can be found in the
old town of many cities.
Off Road: muddy road surfaces and there’s no asphalt
on the top, which happens in countryside or unpopulated
areas.
A compact SUV vehicle is used in the data acquisition
process for gathering accelerometer data using a mobile
application developed within our Lab. The phone is posi-
tioned inside the vehicle via a phone holder, which was stuck
on the front window. We did not change the smartphone’s
position during the data acquisition, and we drove around
30km/h on urban asphalt roads, stone roads, and off-roads,
and drove around 80km/h on asphalt covered freeways. The
length of each accelerometer data collection varies from 10
(a)
(b) (c)
Fig. 5. The road surfaces we experimented with. (a) a stone road, (b) a
asphalt road (c) a off road.
seconds, 20 seconds, until 500 seconds, but all of them can
be segmented into continuous 10-second fragments.
Altogether we collected 98 accelerometer data files. Our
Desktop environment used in this research had two Intel R
Xeon R
Silver 4108 CPU @ 1.80GHz and a GeForce
RTX 2080 Ti GPU inside the desktop. The overwhelming
computation power ensures our fast deploying and testing to
save time for converging the results.
After the preprocessing and balancing for all three cate-
gories, there were in total of 369 samples for the following
ten-fold cross-validation, which can provide a comprehensive
evaluation of the model. In each fold, we shuffled and split
one fold for testing. The evaluation metric used is accuracy,
standard deviation, and accuracy on each category, which is
shown in Equation 3,4. We averaged the accuracy from all
ten rounds on the testing data to get the overall accuracy.
Acc =Cor rect P rediction
All P rediction (3)
σacc =v
u
u
t
1
N1
N
X
i=1
(xi¯x)2(4)
where xiis the result of one fold, ¯xis averaged results and
Nis the total amount of results.
The DeepSense neural network is recreated in every fold,
and we used 1000 epochs and RMSprop optimizer with 0.01
learning rate. We also added a factor of 0.8 for learning rate
decay when there is no improvement in accuracy over 50
epochs. In Figure 6, the training process is illustrated in terms
of accuracy and loss for both training and testing stage.
In Table II, the overall accuracy of the ten-folds cross-
validation was 84.81% with 4.14% standard deviation. Ac-
curacy on each category is 94.36%, 73.14%, 86.92% respec-
tively. Our proposed method can distinguish asphalt road
surfaces well and also capable of separating stone road and
off road surfaces. Due to the accelerometer data in stone
roads and off roads is indeed close, the standard deviation
Fig. 6. An illustration of the training process in one of the ten folds
(number of epochs on horizontal axis and accuracy/loss on vertical axis):
(a) Training accuracy. (b) Training loss. (c) Testing accuracy. (d) Testing
loss. The training converged around 800 epochs.
of those two categories is a bit higher than the asphalt road
type result, but the accuracy is satisfying.
Overall Asphalt Stone Off Road
Accuracy 84.81% 94.36% 73.14% 86.92%
Std. 4.14% 5.05% 11.64% 9.82%
TABLE II
THE P ERFO RMAN CE OF TH E PROPO SED M ETHO D ON EACH C ATE GORY.
To compare with our proposed method, we implemented
four baseline methods, which are Convolutional Neural
Network (CNN), Fully-connected Neural Network (NN),
Support Vector Machine (SVM), and RandomForest (RF).
All of the competitors used the same features from the
same pipeline but reshaped for different requirements. Both
CNN and NN are stacked six-layer models. CNN uses three
convolutional layers with the same kernel as DeepSense
individual subnet and three fully-connected layers with 128,
32, and 3 neurons, while NN uses all fully-connected layers
with 64, 128, 256, 128, 32, 3 neurons respectively. The
output is connected to a softmax layer to infer the categories.
We used the SVM classifier with the RBF kernel and
penalty parameter Cequals to 1, and RF classifier with
100 estimators. Scikit-learn provides us convenient tools to
deploy classifiers in the Python environment.
As shown in Figure 7, our proposed method is the only
method that got over 80% accuracy and outperformed all
other techniques. All of them got satisfying performance in
asphalt road surfaces, but suffered in distinguishing stone
roads and off-roads surfaces. NN is the overall second best
method in our experiment, even superior to the CNN. This is
thanks to the created and sifted input features, which is more
suitable for NN to find approximation. The SVM got the best
results in recognizing off-road surfaces; however, it had the
worst performance in recognizing bumpy road surfaces. This
Fig. 7. Overall evaluation of the performance of our approach against other techniques in recognizing road surface types.
latter means that the SVM’s prediction has some biases with
respect to off-roads recognition process. The RF ranked the
last in our experiment.
As we know, the convolutional neural network is able to
extract higher-level abstracts of the input data. We could
feed raw input data rather than the preprocessed features.
In our experiment, using preprocessed features got a better
performance. The reasons for following this approach are
as follows: Firstly, raw data contains certain noises, which
implies a need for a considerable large CNN architecture to
get the same results. Secondly, due to the noisy data, the
feature engineering process was used to provide a better
description of the input data. The process is designed for
time-series data, where we select the features that contribute
more to the recognition task by hypotheses tests. Thirdly, the
idea of DeepSense network is to demonstrate a uniformed
architecture for multiple modalities in the time-series data.
Then, Internal relations within the sensor data and the
external relations across the sensors are learned from the
individual convolutional layers and merged convolutional
layers respectively. As, the stacked GRU layers will find the
patterns through different time steps, its layers have fewer
parameters than LTSM layers and have more capacity than
a single layer. In the end, we averaged the outputs through
all time steps to get stable results. Nevertheless, any other
methods to generate the final results are acceptable if it is
reasonable to the specific tasks.
V. CONCLUSIONS
The road surface recognition is considerably important
nowadays. A strong understanding of the road surface can
ensure drivers’ safety and comfort. To this end, we proposed
a new method adapted to accelerometer signal recognition
based on DeepSense framework. The proposed design allows
performing better than the traditional techniques in detecting
different types of road surfaces. The results showed that our
method got an overall highest accuracy at 84.81% on all
three categories, which are asphalt roads, stone roads, and
off roads. It is also capable of separating road types that have
an almost similar signal signature (stone and off roads) with
an accuracy of 73% and 86% respectively.
However, there is still a gap for improvement. For exam-
ple, we may improve the preprocessing stage on our dataset
by modeling the noises. The features can be further sifted
and reshaped to have better compatibility with the neural
network structure. Last but not least, the method could be
transported to portable devices if we deploy it on a regular
vehicle to run directly and simultaneously which can allow a
large-scale mapping of the road’s type in our road networks.
ACK NO WL EDG EME NT
This research work was supported IUT34-4 ”Data Science
Methods and Applications” (DSMA) project.
REFERENCES
[1] S. Sattar, S. Li, and M. Chapman, “Road surface monitoring using
smartphone sensors: A review,” Sensors, vol. 18, no. 11, p. 3845,
2018.
[2] J. Eriksson, L. Girod, B. Hull, R. Newton, S. Madden, and H. Balakr-
ishnan, “The pothole patrol: using a mobile sensor network for road
surface monitoring,” in Proceedings of the 6th international conference
on Mobile systems, applications, and services. ACM, 2008, pp. 29–
39.
[3] A. Exchange, “Potholes and vehicle damage,” https:
//exchange.aaa.com/automotive/automotive-trends/potholes-vehicle-
damage/, Last accessed on 2019-11-12.
[4] T. Fwa, W. Chan, and C. Tan, “Genetic-algorithm programming
of road maintenance and rehabilitation,” Journal of Transportation
Engineering, vol. 122, no. 3, pp. 246–253, 1996.
[5] T. Kim and S.-K. Ryu, “Review and analysis of pothole detection
methods,” Journal of Emerging Trends in Computing and Information
Sciences, vol. 5, no. 8, pp. 603–608, 2014.
[6] A. Berenguer, J. Goncalves, S. Hosio, D. Ferreira, T. Anagnostopoulos,
and V. Kostakos, “Are smartphones ubiquitous?: An in-depth survey
of smartphone adoption by seniors,” IEEE Consumer Electronics
Magazine, vol. 6, no. 1, pp. 104–110, 2016.
[7] J. Tillu, “Mobile sensors: The components that make our smartphones
smarter,” 2018, https://medium.com/jay-tillu/mobile- sensors- the-
components-that- make-our- smartphones-smarter- 4174a7a2bfc3, Last
accessed on 2019-11-12.
[8] P. Mohan, V. N. Padmanabhan, and R. Ramjee, “Nericell: rich mon-
itoring of road and traffic conditions using mobile smartphones,” in
Proceedings of the 6th ACM conference on Embedded network sensor
systems. ACM, 2008, pp. 323–336.
[9] G. Singh, D. Bansal, S. Sofat, and N. Aggarwal, “Smart patrolling:
An efficient road surface monitoring using smartphone sensors and
crowdsourcing,Pervasive and Mobile Computing, vol. 40, pp. 71–
88, 2017.
[10] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol.
521, no. 7553, pp. 436–444, 2015.
[11] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[12] S. Yao, S. Hu, Y. Zhao, A. Zhang, and T. Abdelzaher, “Deepsense: A
unified deep learning framework for time-series mobile sensing data
processing,” in Proceedings of the 26th International Conference on
World Wide Web. International World Wide Web Conferences Steering
Committee, 2017, pp. 351–360.
[13] M. Christ, A. W. Kempa-Liehr, and M. Feindt, “Distributed and par-
allel time series feature extraction for industrial big data applications,”
arXiv:1610.07717 [cs], 2017.
[14] O. D. Lara, A. J. P´
erez, M. A. Labrador, and J. D. Posada, “Centinela:
A human activity recognition system based on acceleration and vital
sign data,” Pervasive and mobile computing, vol. 8, no. 5, pp. 717–729,
2012.
[15] R. Takeda, S. Tadano, M. Todoh, M. Morikawa, M. Nakayasu, and
S. Yoshinari, “Gait analysis using gravitational acceleration measured
by wearable sensors,” Journal of biomechanics, vol. 42, no. 3, pp.
223–233, 2009.
[16] A. Gonz´
alez, E. J. O’brien, Y.-Y. Li, and K. Cashell, “The use
of vehicle acceleration measurements to estimate road roughness,”
Vehicle System Dynamics, vol. 46, no. 6, pp. 483–499, 2008.
[17] A. Google, “Api reference – android developers,” 2019, https://
developer.android.com/reference, Last accessed on 2019-11-12.
[18] A. Mednis, G. Strazdins, R. Zviedris, G. Kanonirs, and L. Selavo,
“Real time pothole detection using android smartphones with ac-
celerometers,” in 2011 International conference on distributed com-
puting in sensor systems and workshops (DCOSS). IEEE, 2011, pp.
1–6.
[19] P. Harikrishnan and V. P. Gopi, “Vehicle vibration signal processing
for road surface monitoring,” IEEE Sensors Journal, vol. 17, no. 16,
pp. 5192–5197, 2017.
[20] R. Bhoraskar, N. Vankadhara, B. Raman, and P. Kulkarni, “Wolverine:
Traffic and road condition estimation using smartphone sensors,” in
2012 Fourth International Conference on Communication Systems and
Networks (COMSNETS 2012). IEEE, 2012, pp. 1–6.
[21] A. Allouch, A. Koubˆ
aa, T. Abbes, and A. Ammar, “Roadsense: Smart-
phone application to estimate road conditions using accelerometer and
gyroscope,” IEEE Sensors Journal, vol. 17, no. 13, pp. 4231–4238,
2017.
[22] M. Hoffmann, M. Mock, and M. May, “Road-quality classification and
bump detection with bicycle-mounted smartphones,” in Proceedings of
the 3rd International Conference on Ubiquitous Data Mining-Volume
1088. CEUR-WS. org, 2013, pp. 39–43.
[23] ´
E. Renault, V. H. Ha et al., “Road anomaly detection using smart-
phone: A brief analysis,” in International Conference on Mobile,
Secure, and Programmable Networking. Springer, 2018, pp. 86–97.
[24] B. Sch ¨
olkopf and A. Smola, Smola, A.: Learning with Kernels -
Support Vector Machines, Regularization, Optimization and Beyond.
MIT Press, Cambridge, MA. MIT Press Cambridge, MA, USA, 01
2001, vol. 98.
[25] M. Christ, N. Braun, J. Neuffer, and A. W. Kempa-Liehr, “Time
series feature extraction on basis of scalable hypothesis tests (tsfresh–a
python package),” Neurocomputing, vol. 307, pp. 72–77, 2018.
[26] B. N. N. J. hrist, M. and K.-L. A.W., “Overview on extracted
features – tsfresh,” 2017, https://tsfresh.readthedocs.io/en/latest/text/
list of features.html, Last accessed on 2019-11-12.
[27] M. Ringn´
er, “What is principal component analysis?” Nature biotech-
nology, vol. 26, no. 3, p. 303, 2008.
[28] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
network training by reducing internal covariate shift,arXiv preprint
arXiv:1502.03167, 2015.
[29] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation
of gated recurrent neural networks on sequence modeling,” arXiv
preprint arXiv:1412.3555, 2014.
Article
Full-text available
Road surface monitoring is a key factor to providing smooth and safe road infrastructure to road users. The key to road surface condition monitoring is to detect road surface anomalies, such as potholes, cracks, and bumps, which affect driving comfort and on-road safety. Road surface anomaly detection is a widely studied problem. Recently, smartphone-based sensing has become increasingly popular with the increased amount of available embedded smartphone sensors. Using smartphones to detect road surface anomalies could change the way government agencies monitor and plan for road maintenance. However, current smartphone sensors operate at a low frequency, and undersampled sensor signals cause low detection accuracy. In this study, current approaches for using smartphones for road surface anomaly detection are reviewed and compared. In addition, further opportunities for research using smartphones in road surface anomaly detection are highlighted.
Article
Full-text available
Time series feature engineering is a time-consuming process because scientists and engineers have to consider the multifarious algorithms of signal processing and time series analysis for identifying and extracting meaningful features from time series. The Python package tsfresh (Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests) accelerates this process by combining 63 time series characterization methods, which by default compute a total of 794 time series features, with feature selection on basis automatically configured hypothesis tests. By identifying statistically significant time series characteristics in an early stage of the data science process, tsfresh closes feedback loops with domain experts and fosters the development of domain specific features early on. The package implements standard APIs of time series and machine learning libraries (e.g. pandas and scikit-learn) and is designed for both exploratory analyses as well as straightforward integration into operational data science applications.
Article
Full-text available
In india, road transportation dominates all other means of transportation mechanisms. Well maintained roadways make the travel smooth and comfortable for the passengers. Most of the time road irregularities such as potholes and humps create disturbances to cozy travel and cause major damages to vehicles. Detecting road potholes and road roughness levels is a key to road condition monitoring, which impacts on transport safety and driving comfort.This paper proposes a method that aims to monitor road surfaces, detect road potholes and humps and predict their severity by analyzing the vertical vibration signals produced by the vehicle while it moves. The proposed sytem uses smartphone accelerometer to capture the vehicle vibrations in which Z-axis reading corresponds to the vehicle vertical vibrations. Gaussian model based mining algorithm is proposed for the abnormal event detection, X-Z ratio filtering is applied for event classification as pothole or hump. Severity estimation algorithm is proposed, which makes use of the relation between vertical acceleration and relative vertical displacement of the vehicle.
Article
Full-text available
Monitoring the road condition has acquired a critical significance during recent years. There are different reasons behind broadening research on this field: to start with, it will guarantee safety and comfort to different road users; second, smooth streets will cause less damage to the car. Our motivation is to create a real-time Android Application RoadSense that automatically predicts the quality of the road based on tri-axial accelerometer and gyroscope, show the road location trace on a geographic map using GPS and save all recorded workout entries. C4.5 Decision tree classifier is applied on training data to classify road segments and to build our model. Our experimental results show consistent accuracy of 98.6%. Using this approach, we expect to visualize a road quality map of a selected region. Hence, we can provide constructive feedback to drivers and local authorities. Besides, Road Manager can benefit from this system to evaluate the state of their road network and make a checkup on road construction projects, whether they meet or not the required quality.
Article
Full-text available
Substantial ongoing research now uses smartphones as a research platform for various studies and interventions. With the aging population becoming a frequent focus of research, an increasing number of studies and projects attempt to develop technological interventions for the elderly population. The extent to which the elderly population (i.e., seniors) adopts and uses smartphones is not clear. Many studies acknowledge that today's seniors are not particularly keen on using smartphones, but in the near future we can expect this trend to change. In this article, we present an in-depth survey of statistics on smartphone adoption within the elder population, including the popularity and type of use that smart-phones enjoy among seniors. We show that, far from being ubiquitous, smartphones are still overshadowed by phones with traditional features. We also show that substantial geographical differences exist between countries. Furthermore, those seniors who do adopt smartphones tend to use them as feature phones and do not adopt services that are popular among younger users. Our survey provides an
Article
Full-text available
The all-relevant problem of feature selection is the identification of all strongly and weakly relevant attributes. This problem is especially hard to solve for time series classification and regression in industrial applications such as predictive maintenance or production line optimization, for which each label or regression target is associated with several time series and meta-information simultaneously. Here, we are proposing an efficient, scalable feature extraction algorithm, which filters the available features in an early stage of the machine learning pipeline with respect to their significance for the classification or regression task, while controlling the expected percentage of selected but irrelevant features. The proposed algorithm combines established feature extraction methods with a feature importance filter. It has a low computational complexity, allows to start on a problem with only limited domain knowledge available, can be trivially parallelized, is highly scalable and based on well studied non-parametric hypothesis tests. We benchmark our proposed algorithm on all binary classification problems of the UCR time series classification archive as well as time series from a production line optimization project and simulated stochastic processes with underlying qualitative change of dynamics.
Article
Road surface monitoring is an important problem in providing smooth road infrastructure to the commuters. The key to road condition monitoring is to detect road potholes and bumps, which affect the driving comfort and transport safety. This paper presents a smartphone based sensing and crowdsourcing technique to detect the road surface conditions. The in-built sensors of the smartphone like accelerometer and GPS¹ have been used to observe the road conditions. It has been observed that several techniques in the past have been proposed using these sensors. Such techniques either use fixed threshold values which are road or vehicle condition dependent or use machine learning based classified training which requires intensive and continuous training. The motivation of our work is to improve classification accuracy of detecting road surface conditions using DTW² technique which has not been researched on data based on motion sensors. The main features of DTW is its ability to automatically cope with time deformations and different speeds associated with time data, its simplicity is to be used in resource constrained devices such as smartphones and also the simplicity in its training procedure which is must as fast as compared to techniques such as SVM,³ HMM⁴ and ANN.⁵ Our technique shows better accuracy and efficiency with detection rate of 88.66% and 88.89% for potholes and bumps respectively, when compared with the existing techniques with the use of the proposed technique, prioritization of the road repair and maintenance can be decided based on real-time data and facts.
Conference Paper
Mobile sensing and computing applications usually require time-series inputs from sensors, such as accelerometers, gyroscopes, and magnetometers. Some applications, such as tracking, can use sensed acceleration and rate of rotation to calculate displacement based on physical system models. Other applications, such as activity recognition, extract manually designed features from sensor inputs for classification. Such applications face two challenges. On one hand, on-device sensor measurements are noisy. For many mobile applications, it is hard to find a distribution that exactly describes the noise in practice. Unfortunately, calculating target quantities based on physical system and noise models is only as accurate as the noise assumptions. Similarly, in classification applications, although manually designed features have proven to be effective, it is not always straightforward to find the most robust features to accommodate diverse sensor noise patterns and heterogeneous user behaviors. To this end, we propose DeepSense, a deep learning framework that directly addresses the aforementioned noise and feature customization challenges in a unified manner. DeepSense integrates convolutional and recurrent neural networks to exploit local interactions among similar mobile sensors, merge local interactions of different sensory modalities into global interactions, and extract temporal relationships to model signal dynamics. DeepSense thus provides a general signal estimation and classification framework that accommodates a wide range of applications. We demonstrate the effectiveness of DeepSense using three representative and challenging tasks: car tracking with motion sensors, heterogeneous human activity recognition, and user identification with biometric motion analysis. DeepSense significantly outperforms the state-of-the-art methods for all three tasks. In addition, we show that DeepSense is feasible to implement on smartphones and embedded devices thanks to its moderate energy consumption and low latency.
Article
Mobile sensing applications usually require time-series inputs from sensors. Some applications, such as tracking, can use sensed acceleration and rate of rotation to calculate displacement based on physical system models. Other applications, such as activity recognition, extract manually designed features from sensor inputs for classification. Such applications face two challenges. On one hand, on-device sensor measurements are noisy. For many mobile applications, it is hard to find a distribution that exactly describes the noise in practice. Unfortunately, calculating target quantities based on physical system and noise models is only as accurate as the noise assumptions. Similarly, in classification applications, although manually designed features have proven to be effective, it is not always straightforward to find the most robust features to accommodate diverse sensor noise patterns and user behaviors. To this end, we propose DeepSense, a deep learning framework that directly addresses the aforementioned noise and feature customization challenges in a unified manner. DeepSense integrates convolutional and recurrent neural networks to exploit local interactions among similar mobile sensors, merge local interactions of different sensory modalities into global interactions, and extract temporal relationships to model signal dynamics. DeepSense thus provides a general signal estimation and classification framework that accommodates a wide range of applications. We demonstrate the effectiveness of DeepSense using three representative and challenging tasks: car tracking with motion sensors, heterogeneous human activity recognition, and user identification with biometric motion analysis. DeepSense significantly outperforms the state-of-the-art methods for all three tasks. In addition, DeepSense is feasible to implement on smartphones due to its moderate energy consumption and low latency