Content uploaded by Barak Or

Author content

All content in this area was uploaded by Barak Or on Aug 26, 2022

Content may be subject to copyright.

VOL. X, NO. XX, XXX 2022 0000000

Sensor Applications

Learning Car Speed Using Inertial Sensors for Dead Reckoning Navigation

Maxim Freydin, Member, IEEE and Barak Or, Member, IEEE

Maxim Freydin and Barak Or are with ALMA Technologies, Haifa, 3100400, Israel

Manuscript accepted to IEEE Sensors Letters at August 22, 2022; This is a preprint version

Abstract—A deep neural network (DNN) is trained to estimate the speed of a car driving in an urban area using as input

a stream of measurements from a low-cost six-axis inertial measurement unit (IMU). Three hours of data was collected

by driving through the city of Ashdod, Israel in a car equipped with a global navigation satellite system (GNSS) real time

kinematic (RTK) positioning device and a synchronized IMU. Ground truth labels for the car speed were calculated using

the position measurements obtained at the high rate of 50 Hz. A DNN architecture with long short-term memory layers is

proposed to enable high-frequency speed estimation that accounts for previous inputs history and the nonlinear relation

between speed, acceleration and angular velocity. A simpliﬁed aided dead reckoning localization scheme is formulated to

assess the trained model which provides the speed pseudo-measurement. The trained model is shown to substantially

improve the position accuracy during a 4 minutes drive without the use of GNSS position updates.

Index Terms—Machine learning, supervised learning, long short-term memory, dead reckoning, real-time kinematic positioning, inertial

measurement unit.

I. INTRODUCTION

Accurate localization is a critical capability for automated driving

[1]–[4]. Existing solutions are based on sensor fusion techniques

that utilize inputs from cameras, radars, global navigation satellite

system (GNSS) receivers, and inertial measurement units (IMUs)

[5]–[7]. However, some environments, like indoor parking lots and

dense urban areas, remain challenging to localize in. Cameras and

radars require line-of-sight toward the object of interest and GNSS

receivers can not operate indoor. IMU sensors, on the other hand,

are immune to those problems and remain available in all settings.

The shortcoming of IMUs for localization is that existing integration

methods accumulate error rapidly over time [8], [9].

In this work, a deep neural network (DNN) [10] model is trained

to predict the speed of a car from IMU measurements which may

then be used for localization (by an additional integration) or used

as speedometer. This work is aimed at creating a high-frequency

real-time speed estimator using a single low-cost IMU sensor. For

that, we establish a data-driven approach that consists of two steps:

(1) collecting data of IMU signals from multiple car drives with

corresponding speed measurements as labels, using a real time

kinematic (RTK)-GNSS device, and (2) applying supervised learning

to train a DNN-based model to estimate the car speed in real-time.

The proposed approach may improve existing dead reckoning (DR)

techniques in GNSS-denied areas [11] and enable localization based

only on a single IMU sensor for longer time periods while maintaining

an acceptable positioning error.

Previous attempts to apply supervised learning in estimation

problems with localization applications were focused on learning

the sensor noise models. Mahdi et al. used low-cost sensors as

input together with expensive sensors to generate the true labels for

the measurement noise [12]. Azzam et al. trained long short-term

memory (LSTM) models to improve the trajectory accuracy obtained

using a simultaneous localization and mapping (SLAM) algorithm

Corresponding author: M. Freydin (e-mail: maxim@almatechnologies.com).

Associate Editor: XXXX XXXX.

Digital Object Identiﬁer 10.1109/LSENS.2017.0000000

[13]. However, they used the pose estimated by the SLAM algorithm

as input and not direct sensor measurements. Srinivasan et al. [14]

learned speed estimation for an autonomous racing car. The true

label for the speed was produced using a specially developed mixed

Kalman Filter (KF) [15] to fuse inputs from multiple sensors of

different types. In [16], Reina et al. used a convolutional neural

network (CNN)-based architecture to train a regressor to predict

speed. However, this approach is not able to create memory between

samples. Also, the large window size of the input makes the model

unﬁt for high-frequency real-time applications. Yanlei et al. used

a similar approach to estimate speed with low-cost IMU as input

[17]. Their solution was based on position measurements from a

common GNSS receiver to calculate the true speed which suffers

from relatively low accuracy and low sampling rate. In addition, the

input was the IMU signal sampled at the low frequency of 10 Hz and

a window size of 4seconds which is not sufﬁcient for most real-time

applications. Karlsson et al. also used a CNN-based architecture to

train a regressor to predict speed from an accelerometer and the yaw

rate [18]. They also considered a large input time window (more than

2 seconds) and an architecture which is not able to establish memory

between samples. This work aims to address the shortcomings of

previous work and apply the trained model in an inertial localization

scheme.

Previous works on DR techniques emphasize the importance of

accurate speed updates to reduce the positioning error. Brossard et

al. proposed special speed updates when speciﬁc events are detected

using machine learning or other methods [19]. Key example is the

zero velocity update (ZUPT) when a full stop is detected. Other

examples include nullifying the vertical and horizontal components

of the velocity of a four wheeled vehicle when no slip is detected.

The detection of special events with respective corrections to the

velocity vector are used to nullify accumulated integration errors.

The approach proposed in this work is a generalization of this idea

because the trained model can learn all possible detectable events

with their respective speed updates to automatically reset accumulated

errors.

The rest of the paper is organized as follows: Section II deﬁnes

arXiv:2205.07883v2 [cs.LG] 23 Aug 2022

0000000 VOL. X, NO. XX, XXX 2022

the DR problem with speed as input, Section III presents the learning

approach, including a highly accurate data collection method and

DNN architecture. Section IV presents the model training results with

hold-out set evaluation and application as input for DR localization,

Section V gives the conclusions.

II. DEAD RECKONING WITH SPEED INPUT

For testing the proposed approach, a two-dimensional DR scheme

is used. The speed and the heading angle of the car (both with respect

to navigation frame) are used in a single integration step to obtain the

position. Consider the following two-dimensional linear kinematic

equation of motion for the car position

𝑝𝑛

𝑥

𝑝𝑛

𝑦[𝑘]=𝑝𝑛

𝑥

𝑝𝑛

𝑦[𝑘−1]+Δ𝑡𝑣𝑛

𝑥

𝑣𝑛

𝑦[𝑘](1)

where 𝑝𝑛

𝑖[𝑘]is the car position and 𝑣𝑛

𝑖[𝑘]is the car velocity in

the 𝑖∈{𝑥, 𝑦}axis, at time step 𝑘. The 𝑛denotes the navigation

coordinate frame and Δ𝑡is the time step size. The car velocity is

provided by

𝑣𝑛

𝑥

𝑣𝑛

𝑦[𝑘]=cos 𝜓

sin 𝜓[𝑘]𝑠𝑏[𝑘](2)

where 𝜓[𝑘]is the car heading angle and 𝑠𝑏[𝑘]is the car speed at

time step 𝑘in the body frame. The car heading angle 𝜓[𝑘]is obtained

using an attitude estimation algorithm [7] which also provides the

roll and pitch angles used to transform IMU measurements from the

body to navigation frame. The car speed 𝑠𝑏[𝑘]can be obtained in

two ways: (1) from the output of the trained DNN model and (2)

by direct integration of the noisy acceleration signal (in navigation

frame).

III. LEARNING APPROACH

A. Motivation

It is widely known that the direct integration of acceleration

measured with a low-cost IMU sensor to obtain speed (or velocity)

results in rapid divergence due to measurement noise [7]. This section

describes the development of a DNN-based car speed estimator with

noisy IMU measurements as input. A supervised learning approach

is presented where the IMU measurements are labeled with highly

accurate speed measurement, using the RTK-GNSS device, which

the DNN model is trained to predict.

B. Dataset collection and generation

Data was collected with a WitMotion BWT61CL IMU sensor [20]

and a STONEX RTK-GNSS device connected to a Windows PC

through serial port connections (Figure 1). Data was recorded at 50 𝐻𝑧

and 100 𝐻𝑧 rates with the RTK-GNSS and IMU sensors, respectively.

The sensors were synchronized to a single clock. Accurate ground

truth labels are critical in supervised learning tasks. The RTK-GNSS

STONEX device provides position measurements with a 2𝑐𝑚 average

positioning error in contrast to the common GNSS receiver which has

an average error on the order of meters [7]. A total of 180 minutes

of driving was recorded in Ashdod, Israel. The recordings were split

into train and validation sets with a ratio of 85:15. Figure 2 shows

the trajectories of the recorded drives for the train and validation sets

(without overlap). The data includes diverse driving with numerous

turns and roundabouts at speeds typical to city driving.

Speed was obtained from the position measurements by taking

the derivative with respect to time for each component and then

calculating the vector norm as shown in Figure 3. Then, the 50 Hz

speed signal was upsampled to match the 100 Hz rate of the IMU.

Train and validation samples were created by matching windows of

20 measurements of the 6 channel IMU with their respective window

of 20 samples of speed. Acceleration measurements were rotated to

navigation frame to remove the gravity component and then rotated

back to body frame.

Fig. 1. STONEX RTK-GNSS device (left) and WitMotion BWT61CL

IMU sensor (right).

Fig. 2. Car trajectory for the train and validation sets in an urban area

(Ashdod, Israel). No overlap between the train and validation data sets.

C. LSTM based-model architecture

A DNN architecture was selected comprising one LSTM layer,

two bi-directional LSTM layers, and a single dense layer. The model

contains a total of 12,860 trainable parameters. The model is of

Fig. 3. Car speed regressor training diagram. The IMU and GNSS-

RTK signals are input to the loss function where the ﬁrst goes trough

the DNN and the later used to derive the car speed.

VOL. X, NO. XX, XXX 2022 0000000

type "many-to-many", i.e., each prediction expects 20 samples of a

6 channel IMU as input and returns 20 samples of speed prediction.

The architecture and input dimension are summarized in Figure 4.

The LSTM layers were conﬁgured to pass the hidden state between

samples to allow long term memory learning between predictions.

Fig. 4. DNN architecture with many-to-many input-output relation,

receives 20 samples of a 6-channel IMU as input and returns 20

samples of speed prediction.

D. Loss function and training

The following mean square error (MSE) loss function was

minimized in the training stage

L=1

2𝑁

𝑁

𝑗=1s𝐺𝑇

𝑗−ˆ

s𝑗(a, 𝜔;W)2

(3)

Here, ˆ

sis a function of a,𝜔, and the network weights W.aand 𝜔are

the accelerometer and gyroscope batch readings. The training data

was divided into 4 drives each about 40 minutes long. Each drive

was broken into batches ordered by acquisition time to allow the

model learn long term memory. Each training batch consisted of 4

windows of 6×20 IMU measurements (with the gravity component

subtracted) that followed the previous batch and passed forward the

LSTM hidden states. Shorter drives were padded with zeros (which

the loss function was modiﬁed to ignore) to match the drive time. At

the start of each epoch the LSTM hidden states were reset to zeros.

IV. RESULTS

A. Training results

The model was trained until convergence for 200 epochs. The

root MSE (RMSE) reached a value of 0.83 𝑚/𝑠on the train set

and 1.21 𝑚/𝑠on the validation set. Figure 5 shows a representative

example of speed prediction from the validation set. The above RMSE

on the validation set may be interpreted as a 12 meter positioning

error during a 10 𝑠drive. For a drive at an average speed of 15 𝑚/𝑠,

the car passes a total 150 ±12 𝑚. In this case the positioning error

is 8% of the driven distance.

B. Model application in aided DR navigation

The trained DNN model is tested on a two-dimensional navigation

problem using a hold-out dataset. A 4 minute drive in the Matam

high-tech park, Haifa was recorded using the same equipment and car.

The train/validation and the hold-out sets were recorded in relatively

ﬂat urban areas. Position is obtained by integrating the car speed

after decomposition to velocity vector using the heading angle as

shown in Equations 1 and 2. For the integration, speed was obtained

in two ways: (1) using the trained model and (2) by direct integration

of the acceleration in navigation frame (used as reference).

Fig. 5. Estimated (DNN model) and true car speed vs. time for a

single drive from the validation set (RMSE = 0.90 𝑚/𝑠).

Figure 6 shows the speed of the car versus time obtained by direct

integration of acceleration (in navigation frame), DNN prediction,

and the ground truth (RTK-GNSS). The car starts stationary and the

DNN prediction takes several seconds to nullify despite starting with

hidden states set to zero. The ﬁrst acceleration starts near 𝑡=10 𝑠

and the DR speed diverges rapidly from the ground truth. The DNN

speed follows the ground truth with good agreement until 𝑡=110 𝑠

(RMSE (0𝑠 < 𝑡 < 110𝑠)=1.08 𝑚/𝑠). At that point the road quality

was observed to be low and signiﬁcant external vibrations were

introduced to the signal (RMSE (110𝑠 < 𝑡 < 150𝑠)=3.30 𝑚/𝑠).

At 𝑡=150 𝑠the car ﬁnished a roundabout and returned to the high

quality asphalt which resulted in better speed prediction accuracy

(RMSE (150𝑠 < 𝑡 < 233𝑠)=1.72 𝑚/𝑠).

Fig. 6. Car speed vs. time (test set): dead reckoning (DR), trained

DNN model, and ground truth (GT).

Figure 7 shows the car trajectory obtained using the two different

methods and the ground truth. The plain (without DNN aid) DR

fails to follow the car true trajectory at the drive start, which was

expected due to the speed error shown in Figure 6. At later time

steps the shape of the DR curve becomes more like the ground

truth. However, this is mainly because of the heading angle accuracy

which deﬁnes the shape but not the length of the segments. The

aided DR (DL-DNN aided) shows good agreement with the ground

truth and a substantial improvement over plain DR. The signiﬁcant

deviation from true speed is observed near the roundabout located

at the southern point of the trajectory. Figure 8 shows the error

versus time for the two position solutions. The error is deﬁned as

0000000 VOL. X, NO. XX, XXX 2022

the distance between the position solution and the ground truth. The

plain DR diverges rapidly over time while the DNN aided scheme

maintains less than 20 m error for more than a minute and less than

120 m during the whole 4 minute drive.

Fig. 7. Car trajectory calculated using plain and aided DR vs. ground

truth.

Fig. 8. Position error vs. time using plain and aided DR.

V. CONCLUSIONS

A deep neural network model was trained to estimate the speed

of a car from noisy IMU measurements. A new method for

collecting accurate speed measurements at high frequency was

presented using an RTK-GNSS device. The trained model is light

(less than 13,000 trainable parameters) and suitable for real-time

applications. A simpliﬁed aided dead reckoning localization scheme

was used to asses the performance of the trained model. The DNN

provided speed pseudo-measurements which substantially improved

the positioning accuracy over the direct integration approach. The

proposed methodology can be adapted and extended to other ground

vehicle and pedestrians as well as aerial and water vehicles. In future

work, more data will be collected to introduce additional driving

scenarios, rich terrain properties, and non-ﬂat roads. Additionally,

real-time implementation for GNSS-free navigation is currently under

development.

REFERENCES

[1] T. Luettel, M. Himmelsbach, and H.-J. Wuensche, “Autonomous ground vehi-

cles—concepts and a path to the future,” Proceedings of the IEEE, vol. 100, no.

Special Centennial Issue, pp. 1831–1839, 2012.

[2] C. Badue, R. Guidolini, R. V. Carneiro, P. Azevedo, V. B. Cardoso, A. Forechi,

L. Jesus, R. Berriel, T. M. Paixao, F. Mutz et al., “Self-driving cars: A survey,”

Expert Systems with Applications, vol. 165, p. 113816, 2021.

[3] M. M. Rana, “Attack resilient wireless sensor networks for smart electric vehicles,”

IEEE Sensors Letters, vol. 1, no. 2, pp. 1–4, 2017.

[4] P. H. Chan, G. Dhadyalla, and V. Donzella, “A framework to analyze noise factors

of automotive perception sensors,” IEEE Sensors Letters, vol. 4, no. 6, pp. 1–4,

2020.

[5] F. Gustafsson, F. Gunnarsson, N. Bergman, U. Forssell, J. Jansson, R. Karlsson,

and P.-J. Nordlund, “Particle ﬁlters for positioning, navigation, and tracking,” IEEE

Transactions on signal processing, vol. 50, no. 2, pp. 425–437, 2002.

[6] A. Mikov, A. Cloitre, and I. Prikhodko, “Stereo-vision-aided inertial navigation

for automotive applications,” IEEE Sensors Letters, vol. 5, no. 2, pp. 1–4, 2021.

[7] J. Farrell, Aided navigation: GPS with high rate sensors. McGraw-Hill, Inc.,

2008, pp: 393-395.

[8] X. Liu, Q. Zhou, X. Chen, L. Fan, and C.-T. Cheng, “Bias-error accumulation

analysis for inertial navigation methods,” IEEE Signal Processing Letters, 2021.

[9] S. Guerrier, R. Molinari, and Y. Stebler, “Theoretical limitations of allan variance-

based regression for time series model estimation,” IEEE Signal Processing Letters,

vol. 23, no. 5, pp. 597–601, 2016.

[10] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553,

pp. 436–444, 2015.

[11] F. Luo, S. Wang, Y. Gong, X. Jing, and L. Zhang, “Geographical information

enhanced cooperative localization in vehicular ad-hoc networks,” IEEE Signal

Processing Letters, vol. 25, no. 4, pp. 556–560, 2018.

[12] A. E. Mahdi, A. Azouz, A. E. Abdalla, and A. Abosekeen, “A machine learning

approach for an improved inertial navigation system solution,” Sensors, vol. 22,

no. 4, 2022. [Online]. Available: https://www.mdpi.com/1424-8220/22/4/1687

[13] R. Azzam, Y. Alkendi, T. Taha, S. Huang, and Y. Zweiri, “A stacked LSTM-based

approach for reducing semantic pose estimation error,” IEEE Transactions on

Instrumentation and Measurement, vol. 70, pp. 1–14, 2021.

[14] S. Srinivasan, I. Sa, A. Zyner, V. Reijgwart, M. I. Valls, and R. Siegwart, “End-to-

end velocity estimation for autonomous racing,” IEEE Robotics and Automation

Letters, vol. 5, no. 4, pp. 6869–6875, 2020.

[15] G. Welch, G. Bishop et al., “An introduction to the Kalman ﬁlter,” 1995.

[16] S. C. Reina, A. Solin, and J. Kannala, “Deep learning based speed estimation

for constraining strapdown inertial navigation on smartphones,” 2018 IEEE 28th

International Workshop on Machine Learning for Signal Processing (MLSP), pp.

1–6, 2018.

[17] Y. Gu, Q. Wang, and S. Kamijo, “Intelligent driving data recorder in smartphone

using deep neural network-based speedometer and scene understanding,” IEEE

Sensors Journal, vol. 19, no. 1, pp. 287–296, 2019.

[18] R. Karlsson and G. Hendeby, “Speed estimation from vibrations using a deep

learning cnn approach,” IEEE Sensors Letters, vol. 5, no. 3, pp. 1–4, 2021.

[19] M. Brossard, A. Barrau, and S. Bonnabel, “RINS-W: Robust inertial navigation

system on wheels,” in 2019 IEEE/RSJ International Conference on Intelligent

Robots and Systems (IROS), 2019, pp. 2068–2075.

[20] “Wit bwt61cl product page,” https://www.wit-motion.com/6- axis/

witmotion-bluetooth- 2-0.html, accessed: 5-7-2022.