Content uploaded by Issac Lee

Author content

All content in this area was uploaded by Issac Lee on Jul 29, 2019

Content may be subject to copyright.

I KNOW HOW YOU DRIVE!

DRIVING STYLE PROFILE VIA SMARTPHONE

A PREPRINT

Issac Lee∗

Department of Statistics and Actuarial Science

The University of Iowa

Iowa City, IA 52246

sak-lee@uiowa.edu

Nariankadu D. Shyamalkumar

Department of Statistics and Actuarial Science

The University of Iowa

Iowa City, IA 52246

shyamal-kumar@uiowa.edu

July 29, 2019

ABS TRAC T

Accessibility to telematics data has changed auto insurance pricing. Actuarial research has focused

solely on the use of GPS based telematics data. Such data suffer from limited accuracy and slow

updates, which could inaccurately reﬂect the driving styles of drivers. Our focus is on the use of

complementary data from the inertial measurement unit (IMU) sensors in smartphones that are

relevant to vehicle kinematics. Interestingly, such data require careful modeling as road conditions

and driver behavior can easily bias them. In this paper, we discuss the preparation of IMU data,

including necessary bias-corrections, for telematics analysis. Combining information from the two

independent sensors via Kalman ﬁlter, we suggest a longitudinal-lateral acceleration density plot as a

richer and more reliable object for driver proﬁling.

Keywords

telematics

·

inertial measurement unit (IMU)

·

global positioning system (GPS)

·

Kalman ﬁlter

·

accelerometer ·smartphone

1 The thriving telematics insurance industry

When it comes to the property-casualty (P

&

C) insurance market, auto insurance is the essential line of business for

insurance companies. In 2017, the total volume of the earned premium of the automobile insurance industry was

about $267 billion, which accounts for about 42

%

of the total premium volume of the U.S. P

&

C industry. (National

Association of Insurance Commissioners, 2018, p. 3) In the auto insurance industry, it is hard to ignore the changes in

the pricing paradigm driven by accessibility to the individual driver’s telematics data. The usage-based pricing model

indicates the insurance pricing framework which calculates premiums. These consider not only traditional insurance

factors such as gender, age, driving experience in years, but also the individual’s driving habits such as hard brake and

acceleration, or time of day.

The ﬁrst usage-based insurance (UBI) business in North America was started several years ago by Progressive Insurance

Company in 2004 (Progressive Casualty Insurance Company, 2019). During the past decade, many different types of

UBIs have appeared due to technological development. Among the various UBI products, there two main categories of

UBI in the market now: pay-as-you-drive (PAYD) and pay-how-you-drive (PHYD). The differences between PAYD

and PHYD are related to the information contained in the data used in pricing. PAYD focuses on the driving habits of

the driver, such as the average driven distance, driving experience, and the time of day at which the insured is driving.

On the other hand, PHYD uses the data related to the driving style of the driver. These include factors such as hard

brake or hard acceleration during driving, and how the insured driver turns the vehicle in the turning events (Verbelen

et al., 2018; Tselentis et al., 2016). An excellent example of the thriving telematics insurance industry is Root Insurance

Company, whose pricing method for auto insurance is solely based on UBI factors. Root uses four categories for

∗https://issaclee.netlify.com

APREPRINT - JU LY 29, 2019

insurance pricing; braking, hours, turns, and consistency. Since 2015, Root insurance has increased its valuation to $1

billion and has expanded into 25 U.S. states.

2 Actuarial research related to telematics analysis

Along with the development of the telematics industry, there is an abundance of actuarial research related to telematics

analysis. In 2015, the AXA insurance company published its customer’s telematics data on Kaggle, a popular website

among the machine learning community, for a competition (AXA, 2014). This competition was held to develop a model

for ﬁnding a "telematic ﬁngerprint," which is a set of features distinguishing a driver’s driving habits. The competitors

used this published set of global positioning system (GPS) data of random driving trips from 2736 anonymous drivers.

Each anonymous driver’s data consists of 200 trip data, while each trip data can be thought of as a multi-dimensional

time series data with two variables:

x

and

y

positions of a vehicle per every second. The data was deleted by the request

of AXA after the competition and is no longer available to the public. However, we believe this competition itself has

brought a positive impact on the actuarial community, having given access to this new type of data in auto insurance

and related research topics.

After the AXA competition, actuarial papers about telematics analysis have been produced. Interestingly, Ayuso et al.

(2016) sought to examine gender discrimination in UBI by looking at the effect of the distance traveled on the risk

of accidents among young male and female drivers with PAYD policies. This research can be connected to the work

of Verbelen et al. (2018), who suggest a statistical modelling approach using generalized additive models. These

models utilize PAYD factors such as total distance driven or total time driven as telematics variables on expected claim

frequency modeling.

On the other hand, there are research related to PHYD insurance product. Nikulin (2016) tried to construct driving

proﬁle using speed, acceleration, deceleration and turning speed which are factors used to detect bad drivers from the

drivers pool in AXA data. Weidner et al. (2016) and Weidner et al. (2017) present the possible application of pattern

recognition techniques and Fourier analysis to telematics data to differentiate driving behaviour and provide a good

summary of the difference between traditional and telematic pricing framework. Wüthrich (2017) suggested a heatmap

of velocity and acceleration, so-called ‘v-a heat map,’ as an object for the telematics analysis. Further research about

the v-a heatmap clustering continues in Gao and Wüthrich (2018), which suggests that the v-a heat map object can be

reconstructed by continuous low-dimensional representations using the singular value decomposition (SVD) or the

bottleneck neural network. The reason for ﬁnding the lower-dimensional representation of the v-a heatmap is so that it is

easier and more helpful to use these extracted features for clustering the driving styles instead of merely using heatmap

itself. Furthermore, Gao et al. (2019a) investigate the predictive power of the extracted features from the heatmap

object. The paper shows that the ﬁrst principal component of SVD and the bottleneck activation have greater predictive

power for claims frequency than the driver’s age, which is one of the traditional pricing factors. Furthermore, in Gao

et al. (2019b), they examine the predictive power of telematics variables including driving habit variables, ﬁnding that

driving style variables are much more related to claims frequency than driving habit variables.

2.1 Telematics data beyond GPS

Since AXA has published telematics data publicly, the GPS data format has become the most popular format for

telematics analysis in actuarial science as a result. In the previous section, we have mentioned many papers about

telematics in actuarial science utilizing GPS based telematics data for analysis. However, in other ﬁelds such as

engineering, there are many papers which focus on other measurements for telematics analysis. For example, Johnson

and Trivedi (2011) tried to recognize driving events such as turns, swerving, and braking using a combination of IMUs;

accelerometer, gyroscope, magnetometer, proximity, ambient light etc. Aljaafreh et al. (2012) used a 2 dimensional

accelerometer to cluster the driving style into four categories; below normal, normal, aggressive, and very aggressive.

Van Ly et al. (2013) tried to build a driving proﬁle using a front side radar, and CAN signals from the car including

engine speed, brake or acceleration pedal pressure, vehicle speed, angular rotation, headpose analysis, GPS, and

Front/rear camera view.

One of the popular data formats for telematics analysis is the acceleromter data format, which is one of the inertial

measurement units (IMUs) in smartphones. There are two reasons for the popularity of accelerometer data in the

literature. The ﬁrst reason is that the IMU sensor has become highly accessible due to the increasing use of smartphones.

The technological development of smartphones has seen an increase in sensors installed inside of the phone. For instance,

many smartphone games use IMU sensors as input sensors such as accelerometer, gyroscope, and magnetometer. The

second reason is that the acceleromter is more sensitive than GPS concerning update rate, making it more suitable

for telematics analysis. The GPS usually records the data at the rate of 1Hz (one data point per second) while the

IMU sensors in smartphones can record data with a rate from 10Hz to 200Hz. Thus, acceleromter data can capture the

2

APREPRINT - JU LY 29, 2019

vehicle’s movement more precisely than GPS data since it can detect even slight movement change such as lane change

or drowsy driving. The IMU sensors, however, are not superior to GPS because they are known to suffer from bias.

Thus, proper data processing is essential to IMU sensor data for telematics analysis.

The data preparation process will play a crucial role in the analysis because it can affect not only the result of the

analysis but also the interpretation of the analysis. For example, the accelerometer measures the force to the body of the

smartphone, not the acceleration of smartphone. This characteristics of accelerometer could lead a misinterpretation

of the data such as a vehicle looks to keep decelerating itself when it stopped at a downhill. In other words, the road

topology affects the accelerometer measurement. Hence, a clustering using uncalibrated IMU data could put the drivers

who share a similar road topology in their commute route into the same cluster. Considering the importance of the

interpretation in actuarial science, establishing a formal data processing set-up for telematics data is needed.

The goal of this paper is to establish a data process to obtain calibrated and interpretable telematics data for the analysis

of drivers’ driving style, in particular, a vehicle’s longitudinal, lateral acceleration data. Our method combines the

information from GPS with the data from IMU sensors; using the Kalman ﬁlter method, IMU sensor data will be

calibrated to achieve our goal.

In the second part of this paper, we will explain how the data was recorded and the characteristics of the telematics

sensors. The third part is for explaining the theories about the Kalman ﬁlter method, which will be used to calibrate the

accelerometer data in the next part. Note that we explain the Kalman ﬁlter in Bayesian point of view, which is not the

usual way of explaining it in the engineering ﬁeld. In Bayesian point of view, we can give more clear intuition about the

Kalman ﬁltering, but also it is easy for readers to understand the Kalman ﬁlter since actuarial science community is

already familiar with Bayesian statistics. After presenting the theories, we will show the result of the implementation of

our method for the two individual drivers, and discuss the beneﬁts of using it by drawing longitudinal-lateral acceleration

plots; one is based only on GPS data and another on the calibrated accelerometer.

3 Sensors in the smartphone

As smartphones have been developed for the past decades, many sensors are equipped in a smartphone these days. The

two most popular data format used in telematics analysis are GPS data and IMU based data. In this section, we will

explain the pros and cons of using each data format.

3.1 Global Position System (GPS)

The Global Positioning System (GPS) data gives us information on an absolute position of the vehicle. The leading ﬁve

variables of GPS data includes time, a latitude, a longitude, an altitude, and speed. Note that the speed information

in GPS is not based on the Euclidean distance formula using two GPS points but based on one GPS points using the

Doppler effect in physics, which generates more accurate speed data than the Euclidean based speed (Chalko, 2007;

Bevly, 2004). GPS positions are calculated through the transmission of signals between satellites and GPS receiver. It

means each data point generated by GPS are independent of each other, which illustrates that the GPS does not suffer

from a cumulative bias in speed calculation.

As in Figure 1a, the GPS receiver should be connected with at least four different satellites to determine the position

of a smartphone. Since it uses the signal from the satellites, it can be distracted many noises which caused by such

as the blocking from trees or reﬂection from buildings. According to the ofﬁcial U.S. government website, GPS in

smartphones is usually accurate to within a 4.9 m (16 ft.) radius under the open sky (The National Coordination

Ofﬁce for Space-Based Positioning and Timing., 2017). Considering that the standard width of the U.S.A highway is

12 ft. and the refresh rate of GPS, GPS data is less accurate for capturing the lateral movement of a vehicle (Federal

Highway Administration, 2019).

3.2 Inertial Measurement Units (IMUs)

There are many sensors in the smartphone which called the inertial measurement units (IMUs). Among them, we mainly

use the following three IMUs: as accelerometer, gyroscope, and barometer. These devices can be used to describe

the movement of a smartphone body. Precisely, an accelerometer measures the force that applied to the smartphone

along with three directions; longitudinal, lateral, and vertical. A gyroscope measures the angular speed of the body

of the smartphone concerning the three directions as well; roll rate, pitch rate, and yaw rate (See Figure 4b). Using a

gyroscope, we can also obtain the information about how much the smartphone is tilted with respect to the plane which

is the perpendicular of the direction of the gravity. Those information from gyroscope will be used for the road grade

estimation and the lateral acceleration estimation in this paper. The altitude of the smartphone body can be captured

3

APREPRINT - JU LY 29, 2019

(a) The position determination in GPS system. (b) A visualization of GPS data (red dots)

Figure 1: To ﬁgure out the position of the vehicle, at least 4 satellites should be available at the same time. Using the

longitude and latitude from GPS data, a sample route can be visualized on the real map as in Figure 1b.

(a) A box model for understanding accelerometer (b) The three directions of accelerometer

Figure 2: An accelerometer records the force on the wall of the box, not the actual acceleration of the accelerometer.

by a barometer which measures the air pressure around it. Barometers record the relative altitude assuming that the

sea level is zero altitude. We will use the barometer measurements in the Kalman ﬁltering combining the altitude

information from GPS.

One of the challenges of using accelerometer measurements as a telematics data is the fact that the accelerometer

does not measure the acceleration of the device. Instead, it measures the force that applied to the body of acceleration.

Figure 2a shows how the accelerometer works in general. We can conceive the accelerometer as the box with a ball

inside of it. What the accelerometer records are the force that applied to the wall of the box. For instance, when the

accelerometer is put on the ﬂat sufﬁce of the earth, then the ball pushes the bottom of the box because of the gravity.

Thus, in that case, the accelerometer measures

9.81m/s2

upward from the ground, which differs from zero acceleration

for the static object. Furthermore, if the accelerometer value indicates the zero at the speciﬁc time point with a certain

axis, we expect the body of the accelerometer is being accelerated by gravity to the direction; in short, it is free falling.

This characteristic of accelerometer hinders the estimation of acceleration using an accelerometer when the road grade

changes dramatically.

3.2.1 The road grade effect

The road grade

α

can be deﬁned as the angle between the road plane and the ground plane, which is perpendicular

to the direction of gravity (Jauch et al., 2017). If the accelerometer is on the uphill as in Figure 3a, the ball in the

accelerometer from Figure 2a will be pull by the force of

gsin(α)

. Thus, if the accelerometer attached to the vehicle is

accelerated on the uphill, it will record the acceleration of the vehicle and the force of

gsin(α)

at the same time, which

is illustrated in Figure 3b. Note that there is a large deviation between the raw accelerometer data (black line) and the

longitudinal acceleration of vehicle (blue line) around 20 second, when the vehicle was on the middle of a uphill. One

might think that the road grade can be captured from the pitch angle of the gyroscope if the IMU sensor is attached to

the vehicle. However, the pitch angle of gyroscope is affected by the acceleration of the vehicle so there is a deviation

between the road grade and the measured pitch angle (See (Jauch et al., 2017)). Thus, to use the accelerometer data for

4

APREPRINT - JU LY 29, 2019

(a) The road grade effect on accelerometer

(b) The raw y-axis accelerometer measurements (black) vs. the

ground truth longitudinal acceleration from OBD (green)

Figure 3: The black line in Figure 3b represents the raw data from y-axis accelerometer, while the blue line indicates

the ground truth longitudinal acceleration based on wheel speed. The vehicle drove on a sample route which has an

uphill incline during the time points around 20 second. On the uphill incline, the black line is above the blue line since

the accelerometer records both the road effect and the longitudinal acceleration.

telematics analysis, we need to remove the road grade effect from the accelerometer data. We will discuss this topic in

section 4.2.2

4 Methodology

4.1 Data recording

The data used in this paper are recorded by an iPhone application named "sensor play". The used device was an

iPhone 6 and it was fastened during the recording in a vehicle, Ford fusion 2015, as shown in Figure 4a. Figure 2b

shows the directions of the three axes of accelerometer. Since the smartphone device is installed as Figure 4a in the

vehicle, it matches the y-axis accelerometer with the longitudinal movements of the vehicle, while the x-axis and z-axis

accelerometer correspond to the lateral and vertical movements of the vehicle respectively (See Figure 4b). Based on

the ﬁgure, we can infer that the z-axis of the gyroscope will capture the angular velocity of the vehicle when the vehicle

makes the turns or lane change.

4.2 Discrete Kalman ﬁlter

The Kalman ﬁlter was introduced 50 years ago in the Engineering ﬁeld, but it is still used relatively often in many

engineering problems (Faragher et al., 2012). The term Kalman ﬁlter is not familiar to the most of actuaries; however,

the Kalman ﬁlter has already been introduced to actuarial science community by De Jong and Zehnwirth (1983). After

(a) The installation of the device in a vehicle (b) The direction shared between IMUs and the vehicle

Figure 4: The smartphone was attached to the vehicle as in Figure 4a. Based on Figure 2b and Figure 4a, we can infer

each direction of IMUs corresponds to the vehicle movements as in Figure 4b.

5

APREPRINT - JU LY 29, 2019

the introduction, the Kalman ﬁlter appears continuously in non-life insurance ﬁelds such as (Arjas, 1989), Kremer

(1994), Evans and Schmid (2007), and (Taylor, 2012).

Although many papers in actuarial science have used the Kalman ﬁlter before, they follow the explanation in the

original paper of Kalman (1960), which explained the theory in the electrical engineering context. However, Meinhold

and Singpurwalla (1983) suggested the more comfortable way of understanding the Kalman ﬁlter from a Bayesian

perspective which most actuaries are already familiar with. The Kalman ﬁlter can be understood as a recursive Bayesian

estimation with a Normal-Normal conjugate prior setting. Bayesian inference under the Normal-Normal conjugate

assumption as follows;

Theorem 1

If

X|µ∼Nµ, σ2

and

µ∼Nµ0, σ2

0

, then the sampling distribution of

X

given

µ

and the prior

distribution can be written as follows;

p(x|µ) = 1

√2πσ2e−(x−µ)2

2σ2, p (µ) = 1

p2πσ2

0

e−(µ−µ0)2

2σ2

0

The posterior distribution of µgiven that we have an observation xis

p(µ|x) = 1

p2πσ2

∗

e−(µ−µ∗)2

2σ2

∗

where the mean µ∗and variance σ2

∗is as follows;

µ∗=µ0σ2+xσ2

0

σ2+σ2

0

, σ2

∗=1

σ−2+σ−2

0

.

In Bayesian statistics, setting the prior distribution is subjective, and it does not affect the posterior distribution when

the sample size is large. However, in the Kalman ﬁlter, we can think the sample size is always ﬁxed as one, which is a

current observation from an individual sensor at a time point

t

and the parameters for the prior distribution at time

t

are

determined by the inference result from the previous time point

t−1

. We will elaborate this in the next section by

providing more concrete example.

4.2.1 1-Dimensional Kalman ﬁltering

Let us assume that we want to combine the information about vehicle speed at time

t

from two independent sensors;

GPS and accelerometer. To apply the Kalman ﬁlter, we modeled the speed information from GPS and accelerometer at

time tas a variable Stand Θtas follows;

St= Θt+t

Θt= Θt−1+byt+wt

(1)

where

t

and

wt

are observation noises which follow

N(0, r)

and

N(0, q)

respectively and

b

is constant. From the

basic physics,

speedt=speedt−1+ ∆t×ay

t(2)

where

speedt

and

ay

t

are the vehicle speed and the longitudinal acceleration of the vehicle at time

t

respectively. Thus,

in Equation (1), we can interpret the

Θt

as the vehicle speed at time

t

, if we assume the constant

b

and

yt

as the time

difference between tand t−1, and the longitudinal acceleration from y-axis accelerometer respectively.

In Bayesian point of view, we have a prior information about the vehicle speed,

Θt

, from an accelerometer by integrating

the longitudinal acceleration with the previous speed estimation Θt−1. Moreover, we can observe a sample of vehicle

speed from GPS at time

t

. Since

t

and

wt

are normally distributed, we can re-expressed the above relationship between

Stand Θtas follows;

St|Θt∼ N (Θt, r),

Θt∼ N (µt−1+byt, pt−1+q),

=N(µt, pt)

where

µt−1:= E[Θt−1]

,

pt−1:= V ar (Θt−1)

, and

r:= V ar (St|Θt−1)

. According to the Theorem 1, the posterior

distribution of Θtwith given observation stis

6

APREPRINT - JU LY 29, 2019

Figure 5: The Kalman ﬁlter as a recursive Bayesian estimation.

Θt|St∼ N µtr+ptst

r+pt

,1

r−1+p−1

t

=Nr

r+pt

µt+pt

r+pt

st,1

r−1+p−1

t

=N(µt+κ(st−µt), pt−κpt)

where

κ=pt/(r+pt)

, and

st

is an observation from

St

, in other words, an observation from GPS at time

t

. Readers

who are already familiar with Kalman ﬁlter may notice that κis called as a Kalman gain.

In the Bayesian statistics, after obtaining the posterior distribution, the next step is to predict the future value based on

the given information. Since we are in the Normal-Normal conjugate situation, the predictive distribution of

St|st

is

also Normal distribution with mean as same as the posterior mean. However, in the Kalman ﬁlter context, instead of

predicting the future value

St

given

st

, we predict the optimal value of

St

based on the given observation

st

. Thus,

estimation of the optimal (or Kalman ﬁltered) speed estimation at time t,St, is as follows;

ˆsopt

t=ESopt

t|st

=r

r+pt

µt+pt

r+pt

st

=µt+κ(st−µt).

(3)

Thus, the estimation of the optimal

st

is simply the posterior mean, which is a weighted average between the prior mean

and the sample mean. Thus, what Kalman ﬁlter does is that it calculates the optimal estimation of a quantity Stas the

weighted average between the observation of a quantity

St

from GPS,

st

, and the estimated value of the same quantity

but using the information from the previous step estimation and the accelerometer,

µt

. In other words, it adjusts the

prior knowledge about the speed from an accelerometer

µt

with a Kalman gain based on the measurement error from

GPS, st−µt.

After obtaining the optimal estimation of the speed at

t

, Kalman ﬁlter use the information from time

t

to estimate the

speed at time

t+ 1

as an prior information. For the convenience of notations, let us deﬁne the following two quantities;

µ?

t:= µt+κ(st−µt)

p?

t:= pt−κpt

These two quantities can be seen as "ﬁltered" quantities through the Kalman ﬁlter at time

t

. The Kalman ﬁlter can be

considered as a recursive Bayesian estimation because the estimation

µ?

t

and the variance of the posterior distribution of

p?

t

, will be used in a prior mean and variance of

Θ

for

µt

and

pt

respectively at time

t+ 1

. Figure 5 illustrates these

Kalman ﬁlter process as a series of prior and posterior inference.

7

APREPRINT - JU LY 29, 2019

4.2.2 Multidimensional Kalman ﬁltering

From the previous section, we discuss how to use the Kalman ﬁlter to combine the information from two independent

sensor for estimating the vehicle speed. In Equation 1, we assumed the

yt

is the longitudinal acceleration at time

t

from accelerometer. However, because of the characteristics of the accelerometer, the grade effect should be removed

from accelerometer measurements to feed them into the Kalman ﬁlter. (See, Section 3.2.1 and Figure 3) Bevly (2004)

used multidimensional Kalman ﬁlter to consider the road grade effect as one factor in the modeling for estimating the

longitudinal acceleration of vehicle.

Let us deﬁne

αt

as a road grade at time

t

, then the longitudinal acceleration at time

t

,

at

, can be obtained as the

following physics formula;

at=acct−g×sin(αt)(4)

where the constant gis the force of gravity, 9.81m/s2and acctis the raw y-axis accelerometer value at time t.

In multidimensional Kalman ﬁlter,

Θt

, the object we are trying to estimate, is a vector whose components are speed of

vehicle, the road grade at time

t

; these three quantities are observable or can be calculated from GPS and IMU sensors.

By using the integration formula, Equation (2), and the road grade removal from Equation (4), it can be expressed by

the following formula;

speedt

αt=speedt−1+ ∆t×at

αt

=speedt−1+ ∆t×(acct−g sin (αt))

αt

≈1−g×∆t

0 1 speedt−1

αt−1+∆t

0acct

(5)

Note that the approximation in Equation (5) holds under the assumption that the road grade is not abruptly changed,

αt−1≈αt

and the magnitude of the road grade is small,

sin(αt)≈αt

. Furthermore, to getting a better estimation

for the road grade, we use the weighted average between pitch angle and the previous estimate of road grade from

the previous step with weights 0.02 and 0.98 respectively. For readers who want to know investigate all the details

of the algorithm we used, please look at the code of

kalmanﬁlter_withalpha

in

R package

named

ikhyd

from

(https://github.com/issactoast/ikhyd).

On the other hand, from GPS, we also have an observation about the same quantities. Since the velocity of the vehicle

at time t,vt, is provided from GPS, the road grade αtcan be calculated as;

αt=tan−1vup

t

vhori

t

where

vup

t

is the vertical speed of vehicle obtained using the altitudes of GPS coordinates and

vhori

t

is the horizontal

speed of vehicle based on latitude and longitude of GPS coordinates. Although Bevly (2004) only uses GPS data for

estimating the road grade using the inverse tangent of the vertical velocity over the the horizontal velocity, we changed

Equation 4.2.2 in order to use the Doppler velocity vtdirectly as follows;

αt≈sin (αt) = vup

t

vt

assuming that the road grades are small. Moreover, for the calculation of the vertical velocity of the vehicle, we used

barometer which is more accurate than GPS altitude data as in Figure 6a.

Thus, using the following multidimensional Kalman ﬁlter modeling equation, the

St

and

Θt

can be written as follows;

St= Θt+t

Θt=AΘt−1+byt+wt

(6)

where

Θt=speedt−1

αt−1, A := 1−g×∆t

0 1 , b := ∆t

0, yt=accy

t.

Also

t

and

wt

follows the multivariate Normal distribution

N(0, R)

and

N(0, Q)

respectively. Note that the variable

Stand Θtare vectors in a multidimensional setting but still has a linear relationship with Θt.

The relationship between Stand Θ∗

tcan be written in a Bayesian way as follows;

8

APREPRINT - JU LY 29, 2019

St|Θt∼ N (Θt,Σ)

Θt∼ N (µt,Σ0)(7)

where µt=Aµt−1+byt,

Σ = R,

Σ0=APt−1AT+Q.

Since the modeling distribution and the prior distribution follow the Normal distribution, the joint distribution is also

Normal distribution with a mean vector and a covariance matrix as follows;

Θt

St∼ N µt

µt,Σ0Σ0

Σ0Σ0+ Σ .

Thus, the posterior distribution can be easily obtained by using the conditional distribution of multivariate Normal

distribution formula;

Θt|St∼ N (µt+K(st−µt),Σ0− KΣ0)(8)

where κis

K= Σ0(Σ0+ Σ)−1.

Similar to 1-dimensional case in Equation (3), the optimal estimation of

Θt

given the observation

st

,

ˆ

θopt

t

, can be set as

the posterior mean as follows;

ˆ

θopt

t=µ?

t:=µt+K(st−µt)

P?

t:=Σ0− KΣ0

(9)

Note that the optimal estimation of

θt

,

µ?

t

, and the updated posterior variance of

Θt|St

,

P?

t

, will be used as a prior

knowledge for µtand Ptat t+ 1 respectively.

5 Kalman ﬁltering Implementation

In this section, the Kalman ﬁlter will be applied to the y-axis accelerometer data to obtain the longitudinal acceleration

of the vehicle during a trip. To evaluate the performance of the Kalman ﬁltering for acceleromter data, we choose a

sample route which has complicated road topology as in Figure 1b. The route consists of a ﬂat road, uphill and downhill,

and turns, which can be inferred from the GPS visualization of the route and the relative altitude of the route (See

Figure 6a). We assume that if the Kalman ﬁlter works well for the sample route, it will work for most of other routes

which can be reproduced by the combination of the elements in the sample routes. Thus, in this section, we will focus

on the calibration of the accelerometer for the sample route.

5.1 Filtering the longitudinal acceleration

The black line in the upper panel of Figure 6c shows the graph of raw y-axis accelerometer data during the sample trip.

As in Figure 6b, it is known that the speed graph based on the accelerometer suffers from a long term drift error. In

Figure 6b, the black line indicates the speed based on the integration method using the raw accelerometer data, which is

the black line in the upper panel of Figure 6c. On the other hands, the green line in Figure 6b indicates the ground truth

speed graph based on the wheel rotation rate, which came from OBD device. As we can see in the ﬁgure, the two line

deviates from around the time point of 25 second to the end of the trip. Note that the road grade effect main reason of

these deviation between the two lines which occurs in the following two main events; vehicle stops and driving on the

hills.

First, if the vehicle stopped on the inclined road, the accelerometer records are affected by gravity. For example, we can

infer the stop period during the trip based on the stability of the raw accelerometer values such as horizontal line on the

time points around 50 second in Figure 6c. Since the vehicle stopped at the tilted road, the raw y-axis accelerometer

(the black line in the upper panel of Figure 6c) during the stop records non-zero values because of the gravity. These

negative accelerations yield the deviation in the speed graph during the corresponding section in Figure 6b; the negative

acceleration means a decrease in speed.

Second, the road grade affect the speed graph not only when the vehicle is stopped but also when it is driving on the

inclined hills. Note that the black line around the time points of 25 second in Figure 6b differs from the green line in the

9

APREPRINT - JU LY 29, 2019

(a) Relative altitude for the sample route (See Figure 1b), which

has an uphill at the beginning and a downhill at the end.

(b) The speed graph of the sample route from each source; the raw

(black), Kalman ﬁltered (blue), and OBD (green).

(c) The longitudinal acc. (upper) and the lateral acc. (lower) of the

trip; The raw (black) vs. the Kalman ﬁltered (blue).

Figure 6: Telematics data corresponds to the sample route in Figure 1b. The trip started from the left bottom side of

the square. The yellow point in the middle of Figure 1b is the vehicle position at 28 second from the beginning. By

considering the relative altitude of the route in Figure 6a, about the ﬁrst 30 seconds of the trip corresponds to uphill

section of the route.

ﬁgure with respect to its shape, unlike the second hump around the time points of 75 second. Since the changes in road

grade around the time points of 75 second is relatively smaller than the one during the time points of 25 second (See

Figure 6a), the shape deviation around the time points of 25 second is more evident than around the one of 75 second.

The effect of road grade during the driving can also be conﬁrmed from the signiﬁcant drop in the speed graph (Figure

6b) around the time points of 140 second, when the vehicle drove on the downhill.

Equation (6) will be used for ﬁltering the longitudinal acceleration. Note that the equalities in Equation (6) hold under

the assumption that the quantities are under the same scale. Since the values from an accelerometer,

accy

t

, are recorded

as the acceleration scale

m/s2

, the speed quantities used in the equation should be converted into the unit of

m/s

, and

the road grade

αt

should be recorded as the unit of radian. The blue line in the upper panel of Figure 6c represents the

Kalman ﬁltered longitudinal acceleration of the sample route, which can be obtained from the optimal estimates of the

vehicle speed at each time point in Kalman ﬁlter process. Note that the ﬁltered acceleration has zero values during the

time that the vehicle was stopped. Furthermore, the speed graph based on the Kalman ﬁltered accelerometer, the blue

line in Figure 6b, synchronized with the green line in Figure 6b, which is the ground truth speed from OBD device.

For the readers who wants to implement the Kalman ﬁlter setting that we used in this paper, all the codes are available

as a

R package

named

ikhyd

(

https://github.com/issactoast/ikhyd

). Note that we used the identity matrix

for the matrix

R

, but for the matrix

Q

, we used

0.001

and

1

for y-axis accelerometer value and the road grade value at

time

t

respectively. This indicates that we put more credit to the acceleromter data in Kalaman ﬁlter process than the

GPS observation.

10

APREPRINT - JU LY 29, 2019

(a) Speed comparison between Euclidean based speed vs.

Doppler effect adjusted speed from GPS

(b) The lateral acceleration can be calculated by multiplying

the vehicle speed with the angular speed ω≈∆θt/∆t.

Figure 7: The gray line in Figure 7a represents the speed calculated from two pair of GPS coordinates; latitude,

longitude. Since these GPS coordinates are recorded using the geographic coordinate system, the distance between

these GPS coordinates are converted into miles using distm function from a R package geosphere written by Hijmans

et al. (2019).

5.2 Filtering the lateral acceleration

After obtaining the longitudinal acceleration of the vehicle, the next step is to ﬁlter the lateral acceleration. The lateral

acceleration, ax, can be obtained by following formula;

ax=accx

t−g×sin (φt)(10)

where

accx

t

is the raw

x

-axis accelerometer value at time

t

,

g

is the constant of gravity,

φt

is roll angle. The last term can

be interpreted as the adjustment of gravitational effect on the lateral accelerometer similar to the road grade effect on

the longitudinal accelerometer (See Equation (4)). However, unlike the case of longitudinal acceleration, the pitch angle

of gyroscope does not affected by vehicle movements, the lateral acceleration can be directly calculated from Equation

(10). In lower panel of Figure 6c, the blue line indicates the adjusted lateral acceleration data whose gravitational

effect is compensated by using roll angle from gyroscope. During the time points around 100 second in the ﬁgure, the

obtained lateral acceleration is also centered at zero when the vehicle was stopped at the tilted road. The two blue lines

in Figure 6c represent the ﬁnalized longitudinal acceleration and the lateral acceleration of the sample trip. In the next

section, we will discuss how to make a driving proﬁle using these two ﬁltered objects.

6 Driving proﬁling with the Lon-Lat plot

Wüthrich (2017) suggests an object called velocity-acceleration heatmap (v-a heatmap) as a driving proﬁling object for

telematics data. The v-a heatmap can be considered as a discrete density plot of speed and acceleration data from GPS.

Even though speed calculated Euclidean distance in Wüthrich (2017), the v-a should be directly calculated from GPS

speed information since the speeds based on Euclidean distance are less accurate than the speeds based on Doppler

effect as in Figure 7a. Moreover, acceleration values in v-a heatmap are the longitudinal acceleration which calculated

by the changes in speed, which means the v-a heatmap does not have the information about the vehicle’s movements

related to lateral direction. Considering the turns are one of the major factors to determine the driving style, the object

for the telematics analysis should be accurate not only for the information of the longitudinal movements but also the

information of the lateral movements.

In this section, we took the extended sample route as in Figure 8a for the telematics analysis. We asked two drivers to

drive the extended sample routes using same vehicle. Both driver spent around 20 minutes to ﬁnish the trip. The record

rate of the IMU was 25 Hz, and the rate of GPS was 1 Hz. Using the calibration method in previous section, we could

get the longitudinal and lateral acceleration for the trip as in Figure 6c.

Using the the longitudinal and lateral acceleration data, the longitudinal-lateral acceleration density plot (lon-lat plot)

can be drawn as in Figure 8c or 8d. Since both trips took around 20 minutes, about 30,000 data points of acceleration

are used in IMU based lon-lat plot, which corresponds to 1,140 GPS data points. Note that Figure 8b shows the GPS

based lon-lat plots; red dots in the ﬁgure corresponds to the driver 1 (Figure 8c) and blue dots corresponds to the driver

2 (Figure 8d). To calculate the longitudinal acceleration data for the GPS based lon-lat plot, we uses the differences

11

APREPRINT - JU LY 29, 2019

(a) GPS based Lon-Lat plot comparison. (b) GPS based Lon-Lat plot comparison.

(c) IMU based Lon-Lat plot: Driver 1 (d) IMU based Lon-Lat plot: Driver 2

Figure 8: The IMU based Lon-Lat density plot could elucidates the differences of the driving style between the two

drivers than the GPS based Lon-Lat density plot.

between the Doppler based GPS speed data. On the other hands, the lateral acceleration for the GPS based lon-lat plot

can be obtained by the following formula;

ax

t=ωt×vt(11)

where ax

tis the later acceleration, ωtis the angular velocity, and vtis the velocity of the vehicle at time trespectively.

Using the three consecutive GPS points, for example the three red dots in Figure 7b, we can approximate the angular

velocity. Since the refresh rate of GPS is 1 Hz, the lateral acceleration at time tcan be approximated as

ax

t≈∆θt×speedt,(12)

where

∆θt

can be calculated from the three GPS points and the vehicle speed at time

t

is given by GPS data itself (See

Figure 7b).

Based on Figure 8c and Figure 8d, we can infer that the IMU based lon-lat plots are better for revealing the difference

between the two driver’s driving style than the GPS based lon-lat plot in Figure 8b. For example, comparing Figure 8c

with Figure 8d, we can see that the driver 1 has sharper turns than the driver 2 since the driver 1 has more data points

along with x-axis than the driver 2. Also, the longer positive tail of the longitudinal acceleration (y-axis) from the driver

1 than driver 2 implies that the driver 1 push the gas pedal harder than the driver 2. However, it is hard to distinguish

these differences using GPS based lon-lat plots in Figure 8b because the distribution of the red dots and the blue dots

looks almost the same. One might think that the number of data points used in the ﬁgure is small for the GPS lon-lat

plot to expose the differences. Note that even though there are small number of data sample for GPS based lon-lat plot,

we can see that the red dots are more widely spread than the blue dots along with x-axis, which implies the larger lateral

12

APREPRINT - JU LY 29, 2019

acceleration for the driver 2 than the driver 1. Thus, using the information of lateral movements from GPS could lead

an opposite interpretation with the one based on IMU based lon-lat plots. Note that GPS based lon-lat plot for both

driver 1 and driver 2 has almost the same length for longitudinal and lateral acceleration. Considering the mechanism of

the vehicle movements, it is clear that it should not be true because the longitudinal acceleration range should be longer

than the lateral acceleration. According to Xu et al. (2015), the lateral acceleration of 5

m/s2

is the discomfort limit

for the driver, which can occur in a mountain area with 30 km/h speed. Therefore, we can assure that the IMU based

lon-lat plot has more interpretation power than the GPS based lon-lat plot for telematics analysis.

One of the beneﬁts of using the Kalman ﬁltered accelerometer data is that it can be easily converted into the v-a heatmap,

which is already well studied by Gao et al. (2019b). By integrating the longitudinal acceleration, we can produce the

v-a heatmap since they already synced with the vehicle speed via the ﬁltering process. Moreover, because of the ﬁne

recording rate of the IMU, v-a heatmap can be produced by the short trip such as 20 - 30 min. small stip like the given

extended sample route. Note that the number of data points in 20 min of IMU recording is equal to the number of data

points in 8 hrs and 20 min of GPS recording. Secondly, in the lon-lat plot, since the both accelerations are centered at

zero, it could provide researchers with the parametric approach to the driving proﬁle analysis. By considering that Gao

and Wüthrich (2018) tried to extract the low-dimensional features from the v-a heatmap using PCA and bottle neck

neural network, the parametric approach could offer an another view to the telematics object.

7 Conclusion

In this paper, we suggest the calibration process for the accelerometer data, which can be used for the telematics

analysis. Also, a new type of the telematics object, called a lon-lat plot was suggested for the driving style proﬁling. By

investigating the lon-lat plot, we shed light on the beneﬁts of using Kalman ﬁltered accelerometer data as a building

block for telematics analysis; the information about the lateral movements of the vehicle in IMU data is more accurate

than the information in GPS telematics data. Kalman ﬁlter is used to combine the speed information from GPS and

IMU sensors such as accelerometer, gyroscope, barometer. The suggested telematics object has not only the information

about speed and the acceleration, which are used previously in the literature, but also the additional information of the

lateral movements of the driving style. Furthermore, we explained the Kalman ﬁltering process in the Bayesian point of

view, which could help the undergraduate students and actuaries to understand the concept intuitively.

Notations

Notation Interpretations

xt, yt, ztThe longitude, latitude, altitude at time tfrom GPS

ax

t,ay

tThe lateral, longitudinal acceleration at time t

accx

t,accy

tThe x-axis, y-axis raw accelerometer value at time t

φt,θt,ψtThe roll, pitch, yaw angle from gyroscope at time t

vt,vup

t,vhori.

tThe longitudinal, vertial, horizontal velocity of a vehicle at time t

αtThe road grade at time t

gThe universal gravitational constant

References

Aljaafreh, A., Alshabatat, N., and Al-Din, M. S. N. (2012). Driving style recognition using fuzzy logic. In 2012 IEEE

International Conference on Vehicular Electronics and Safety (ICVES 2012), pages 460–463. IEEE.

Arjas, E. (1989). The claims reserving problem in non-life insurance: Some structural ideas. ASTIN Bulletin: The

Journal of the IAA, 19(2):139–152.

AXA (2014). Axa insurance company driver’s telematics analysis.

https://www.kaggle.com/c/

axa-driver-telematics-analysis. (Accessed: 07/04/2019).

Ayuso, M., Guillen, M., and Pérez-Marín, A. (2016). Telematics and gender discrimination: some usage-based evidence

on whether men’s risk of accidents differs from women’s. Risks, 4(2):10.

Bevly, D. M. (2004). Global positioning system (gps): A low-cost velocity sensor for correcting inertial sensor errors

on ground vehicles. Journal of dynamic systems, measurement, and control, 126(2):255–264.

Chalko, T. J. (2007). High accuracy speed measurement using gps (global positioning system). NU Journal of Discovery,

4:1–9.

13

APREPRINT - JU LY 29, 2019

De Jong, P. and Zehnwirth, B. (1983). Claims reserving, state-space models and the kalman ﬁlter. Journal of the

Institute of Actuaries, 110(1):157–181.

Evans, J. P. and Schmid, F. (2007). Forecasting workers compensation severities and frequency using the kalman ﬁlter.

In Casualty Actuarial Society Forum, pages 43–66.

Faragher, R. et al. (2012). Understanding the basis of the kalman ﬁlter via a simple and intuitive derivation. IEEE

Signal processing magazine, 29(5):128–132.

Federal Highway Administration (2019). Lane width.

https://safety.fhwa.dot.gov/geometric/pubs/

mitigationstrategies/chapter3/3_lanewidth.cfm. (Accessed: 07/04/2019).

Gao, G., Meng, S., and Wüthrich, M. V. (2019a). Claims frequency modeling using telematics car driving data.

Scandinavian Actuarial Journal, 2019(2):143–162.

Gao, G. and Wüthrich, M. V. (2018). Feature extraction from telematics car driving heatmaps. European Actuarial

Journal, 8(2):383–406.

Gao, G., Wüthrich, M. V., and Yang, H. (2019b). Evaluation of driving risk at different speeds. Insurance: Mathematics

and Economics.

Hijmans, R. J., Williams, E., Vennes, C., and Hijmans, M. R. J. (2019). Package ‘geosphere’.

Jauch, J., Masino, J., Staiger, T., and Gauterin, F. (2017). Road grade estimation with vehicle-based inertial measurement

unit and orientation ﬁlter. IEEE Sensors Journal, 18(2):781–789.

Johnson, D. A. and Trivedi, M. M. (2011). Driving style recognition using a smartphone as a sensor platform. In 2011

14th International IEEE Conference on Intelligent Transportation Systems (ITSC), pages 1609–1615. IEEE.

Kalman, R. E. (1960). A new approach to linear ﬁltering and prediction problems. Journal of basic Engineering,

82(1):35–45.

Kremer, E. (1994). Robust credibility via robust kalman ﬁltering. ASTIN Bulletin: The Journal of the IAA, 24(2):221–

233.

Meinhold, R. J. and Singpurwalla, N. D. (1983). Understanding the kalman ﬁlter. The American Statistician, 37(2):123–

127.

National Association of Insurance Commissioners (2018). State insurance regulation: Key facts and market trends.

https://www.naic.org/state_report_cards/report_card_dc.pdf. (Accessed: 07/04/2019).

Nikulin, V. (2016). Driving style identiﬁcation with unsupervised learning. In Machine Learning and Data Mining in

Pattern Recognition, pages 155–169. Springer.

Progressive Casualty Insurance Company (2019). Progressive Firsts.

https://www.progressive.com/about/

firsts. (Accessed: 07/04/2019).

Taylor, G. (2012). Loss reserving: an actuarial perspective, volume 21. Springer Science & Business Media.

The National Coordination Ofﬁce for Space-Based Positioning, N. and Timing. (2017). How accurate is gps?

https:

//www.gps.gov/systems/gps/performance/accuracy/. Accessed: 2018-04-15.

Tselentis, D. I., Yannis, G., and Vlahogianni, E. I. (2016). Innovative insurance schemes: pay as/how you drive.

Transportation Research Procedia, 14:362–371.

Van Ly, M., Martin, S., and Trivedi, M. M. (2013). Driver classiﬁcation and driving style recognition using inertial

sensors. In 2013 IEEE Intelligent Vehicles Symposium (IV), pages 1040–1045. IEEE.

Verbelen, R., Antonio, K., and Claeskens, G. (2018). Unravelling the predictive power of telematics data in car insurance

pricing. Journal of the Royal Statistical Society: Series C (Applied Statistics), 67(5):1275–1304.

Weidner, W., Transchel, F. W., and Weidner, R. (2016). Classiﬁcation of scale-sensitive telematic observables for

riskindividual pricing. European Actuarial Journal, 6(1):3–24.

Weidner, W., Transchel, F. W., and Weidner, R. (2017). Telematic driving proﬁle classiﬁcation in car insurance pricing.

Annals of Actuarial Science, 11(2):213–236.

Wüthrich, M. V. (2017). Covariate selection from telematics car driving data. European Actuarial Journal, 7(1):89–108.

Xu, J., Yang, K., Shao, Y., and Lu, G. (2015). An experimental study on lateral acceleration of cars in different

environments in sichuan, southwest china. Discrete Dynamics in nature and Society, 2015.

14