PreprintPDF Available


Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Accessibility to telematics data has changed auto insurance pricing. Actuarial research has focused solely on the use of GPS based telematics data. Such data suffer from limited accuracy and slow updates, which could inaccurately reflect the driving styles of drivers. Our focus is on the use of complementary data from the inertial measurement unit (IMU) sensors in smartphones that are relevant to vehicle kinematics. Interestingly, such data require careful modeling as road conditions and driver behavior can easily bias them. In this paper, we discuss the preparation of IMU data, including necessary bias-corrections, for telematics analysis. Combining information from the two independent sensors via Kalman filter, we suggest a longitudinal-lateral acceleration density plot as a richer and more reliable object for driver profiling.
Content may be subject to copyright.
Issac Lee
Department of Statistics and Actuarial Science
The University of Iowa
Iowa City, IA 52246
Nariankadu D. Shyamalkumar
Department of Statistics and Actuarial Science
The University of Iowa
Iowa City, IA 52246
July 29, 2019
Accessibility to telematics data has changed auto insurance pricing. Actuarial research has focused
solely on the use of GPS based telematics data. Such data suffer from limited accuracy and slow
updates, which could inaccurately reflect the driving styles of drivers. Our focus is on the use of
complementary data from the inertial measurement unit (IMU) sensors in smartphones that are
relevant to vehicle kinematics. Interestingly, such data require careful modeling as road conditions
and driver behavior can easily bias them. In this paper, we discuss the preparation of IMU data,
including necessary bias-corrections, for telematics analysis. Combining information from the two
independent sensors via Kalman filter, we suggest a longitudinal-lateral acceleration density plot as a
richer and more reliable object for driver profiling.
inertial measurement unit (IMU)
global positioning system (GPS)
Kalman filter
accelerometer ·smartphone
1 The thriving telematics insurance industry
When it comes to the property-casualty (P
C) insurance market, auto insurance is the essential line of business for
insurance companies. In 2017, the total volume of the earned premium of the automobile insurance industry was
about $267 billion, which accounts for about 42
of the total premium volume of the U.S. P
C industry. (National
Association of Insurance Commissioners, 2018, p. 3) In the auto insurance industry, it is hard to ignore the changes in
the pricing paradigm driven by accessibility to the individual driver’s telematics data. The usage-based pricing model
indicates the insurance pricing framework which calculates premiums. These consider not only traditional insurance
factors such as gender, age, driving experience in years, but also the individual’s driving habits such as hard brake and
acceleration, or time of day.
The first usage-based insurance (UBI) business in North America was started several years ago by Progressive Insurance
Company in 2004 (Progressive Casualty Insurance Company, 2019). During the past decade, many different types of
UBIs have appeared due to technological development. Among the various UBI products, there two main categories of
UBI in the market now: pay-as-you-drive (PAYD) and pay-how-you-drive (PHYD). The differences between PAYD
and PHYD are related to the information contained in the data used in pricing. PAYD focuses on the driving habits of
the driver, such as the average driven distance, driving experience, and the time of day at which the insured is driving.
On the other hand, PHYD uses the data related to the driving style of the driver. These include factors such as hard
brake or hard acceleration during driving, and how the insured driver turns the vehicle in the turning events (Verbelen
et al., 2018; Tselentis et al., 2016). An excellent example of the thriving telematics insurance industry is Root Insurance
Company, whose pricing method for auto insurance is solely based on UBI factors. Root uses four categories for
APREPRINT - JU LY 29, 2019
insurance pricing; braking, hours, turns, and consistency. Since 2015, Root insurance has increased its valuation to $1
billion and has expanded into 25 U.S. states.
2 Actuarial research related to telematics analysis
Along with the development of the telematics industry, there is an abundance of actuarial research related to telematics
analysis. In 2015, the AXA insurance company published its customer’s telematics data on Kaggle, a popular website
among the machine learning community, for a competition (AXA, 2014). This competition was held to develop a model
for finding a "telematic fingerprint," which is a set of features distinguishing a driver’s driving habits. The competitors
used this published set of global positioning system (GPS) data of random driving trips from 2736 anonymous drivers.
Each anonymous driver’s data consists of 200 trip data, while each trip data can be thought of as a multi-dimensional
time series data with two variables:
positions of a vehicle per every second. The data was deleted by the request
of AXA after the competition and is no longer available to the public. However, we believe this competition itself has
brought a positive impact on the actuarial community, having given access to this new type of data in auto insurance
and related research topics.
After the AXA competition, actuarial papers about telematics analysis have been produced. Interestingly, Ayuso et al.
(2016) sought to examine gender discrimination in UBI by looking at the effect of the distance traveled on the risk
of accidents among young male and female drivers with PAYD policies. This research can be connected to the work
of Verbelen et al. (2018), who suggest a statistical modelling approach using generalized additive models. These
models utilize PAYD factors such as total distance driven or total time driven as telematics variables on expected claim
frequency modeling.
On the other hand, there are research related to PHYD insurance product. Nikulin (2016) tried to construct driving
profile using speed, acceleration, deceleration and turning speed which are factors used to detect bad drivers from the
drivers pool in AXA data. Weidner et al. (2016) and Weidner et al. (2017) present the possible application of pattern
recognition techniques and Fourier analysis to telematics data to differentiate driving behaviour and provide a good
summary of the difference between traditional and telematic pricing framework. Wüthrich (2017) suggested a heatmap
of velocity and acceleration, so-called ‘v-a heat map,’ as an object for the telematics analysis. Further research about
the v-a heatmap clustering continues in Gao and Wüthrich (2018), which suggests that the v-a heat map object can be
reconstructed by continuous low-dimensional representations using the singular value decomposition (SVD) or the
bottleneck neural network. The reason for finding the lower-dimensional representation of the v-a heatmap is so that it is
easier and more helpful to use these extracted features for clustering the driving styles instead of merely using heatmap
itself. Furthermore, Gao et al. (2019a) investigate the predictive power of the extracted features from the heatmap
object. The paper shows that the first principal component of SVD and the bottleneck activation have greater predictive
power for claims frequency than the driver’s age, which is one of the traditional pricing factors. Furthermore, in Gao
et al. (2019b), they examine the predictive power of telematics variables including driving habit variables, finding that
driving style variables are much more related to claims frequency than driving habit variables.
2.1 Telematics data beyond GPS
Since AXA has published telematics data publicly, the GPS data format has become the most popular format for
telematics analysis in actuarial science as a result. In the previous section, we have mentioned many papers about
telematics in actuarial science utilizing GPS based telematics data for analysis. However, in other fields such as
engineering, there are many papers which focus on other measurements for telematics analysis. For example, Johnson
and Trivedi (2011) tried to recognize driving events such as turns, swerving, and braking using a combination of IMUs;
accelerometer, gyroscope, magnetometer, proximity, ambient light etc. Aljaafreh et al. (2012) used a 2 dimensional
accelerometer to cluster the driving style into four categories; below normal, normal, aggressive, and very aggressive.
Van Ly et al. (2013) tried to build a driving profile using a front side radar, and CAN signals from the car including
engine speed, brake or acceleration pedal pressure, vehicle speed, angular rotation, headpose analysis, GPS, and
Front/rear camera view.
One of the popular data formats for telematics analysis is the acceleromter data format, which is one of the inertial
measurement units (IMUs) in smartphones. There are two reasons for the popularity of accelerometer data in the
literature. The first reason is that the IMU sensor has become highly accessible due to the increasing use of smartphones.
The technological development of smartphones has seen an increase in sensors installed inside of the phone. For instance,
many smartphone games use IMU sensors as input sensors such as accelerometer, gyroscope, and magnetometer. The
second reason is that the acceleromter is more sensitive than GPS concerning update rate, making it more suitable
for telematics analysis. The GPS usually records the data at the rate of 1Hz (one data point per second) while the
IMU sensors in smartphones can record data with a rate from 10Hz to 200Hz. Thus, acceleromter data can capture the
APREPRINT - JU LY 29, 2019
vehicle’s movement more precisely than GPS data since it can detect even slight movement change such as lane change
or drowsy driving. The IMU sensors, however, are not superior to GPS because they are known to suffer from bias.
Thus, proper data processing is essential to IMU sensor data for telematics analysis.
The data preparation process will play a crucial role in the analysis because it can affect not only the result of the
analysis but also the interpretation of the analysis. For example, the accelerometer measures the force to the body of the
smartphone, not the acceleration of smartphone. This characteristics of accelerometer could lead a misinterpretation
of the data such as a vehicle looks to keep decelerating itself when it stopped at a downhill. In other words, the road
topology affects the accelerometer measurement. Hence, a clustering using uncalibrated IMU data could put the drivers
who share a similar road topology in their commute route into the same cluster. Considering the importance of the
interpretation in actuarial science, establishing a formal data processing set-up for telematics data is needed.
The goal of this paper is to establish a data process to obtain calibrated and interpretable telematics data for the analysis
of drivers’ driving style, in particular, a vehicle’s longitudinal, lateral acceleration data. Our method combines the
information from GPS with the data from IMU sensors; using the Kalman filter method, IMU sensor data will be
calibrated to achieve our goal.
In the second part of this paper, we will explain how the data was recorded and the characteristics of the telematics
sensors. The third part is for explaining the theories about the Kalman filter method, which will be used to calibrate the
accelerometer data in the next part. Note that we explain the Kalman filter in Bayesian point of view, which is not the
usual way of explaining it in the engineering field. In Bayesian point of view, we can give more clear intuition about the
Kalman filtering, but also it is easy for readers to understand the Kalman filter since actuarial science community is
already familiar with Bayesian statistics. After presenting the theories, we will show the result of the implementation of
our method for the two individual drivers, and discuss the benefits of using it by drawing longitudinal-lateral acceleration
plots; one is based only on GPS data and another on the calibrated accelerometer.
3 Sensors in the smartphone
As smartphones have been developed for the past decades, many sensors are equipped in a smartphone these days. The
two most popular data format used in telematics analysis are GPS data and IMU based data. In this section, we will
explain the pros and cons of using each data format.
3.1 Global Position System (GPS)
The Global Positioning System (GPS) data gives us information on an absolute position of the vehicle. The leading five
variables of GPS data includes time, a latitude, a longitude, an altitude, and speed. Note that the speed information
in GPS is not based on the Euclidean distance formula using two GPS points but based on one GPS points using the
Doppler effect in physics, which generates more accurate speed data than the Euclidean based speed (Chalko, 2007;
Bevly, 2004). GPS positions are calculated through the transmission of signals between satellites and GPS receiver. It
means each data point generated by GPS are independent of each other, which illustrates that the GPS does not suffer
from a cumulative bias in speed calculation.
As in Figure 1a, the GPS receiver should be connected with at least four different satellites to determine the position
of a smartphone. Since it uses the signal from the satellites, it can be distracted many noises which caused by such
as the blocking from trees or reflection from buildings. According to the official U.S. government website, GPS in
smartphones is usually accurate to within a 4.9 m (16 ft.) radius under the open sky (The National Coordination
Office for Space-Based Positioning and Timing., 2017). Considering that the standard width of the U.S.A highway is
12 ft. and the refresh rate of GPS, GPS data is less accurate for capturing the lateral movement of a vehicle (Federal
Highway Administration, 2019).
3.2 Inertial Measurement Units (IMUs)
There are many sensors in the smartphone which called the inertial measurement units (IMUs). Among them, we mainly
use the following three IMUs: as accelerometer, gyroscope, and barometer. These devices can be used to describe
the movement of a smartphone body. Precisely, an accelerometer measures the force that applied to the smartphone
along with three directions; longitudinal, lateral, and vertical. A gyroscope measures the angular speed of the body
of the smartphone concerning the three directions as well; roll rate, pitch rate, and yaw rate (See Figure 4b). Using a
gyroscope, we can also obtain the information about how much the smartphone is tilted with respect to the plane which
is the perpendicular of the direction of the gravity. Those information from gyroscope will be used for the road grade
estimation and the lateral acceleration estimation in this paper. The altitude of the smartphone body can be captured
APREPRINT - JU LY 29, 2019
(a) The position determination in GPS system. (b) A visualization of GPS data (red dots)
Figure 1: To figure out the position of the vehicle, at least 4 satellites should be available at the same time. Using the
longitude and latitude from GPS data, a sample route can be visualized on the real map as in Figure 1b.
(a) A box model for understanding accelerometer (b) The three directions of accelerometer
Figure 2: An accelerometer records the force on the wall of the box, not the actual acceleration of the accelerometer.
by a barometer which measures the air pressure around it. Barometers record the relative altitude assuming that the
sea level is zero altitude. We will use the barometer measurements in the Kalman filtering combining the altitude
information from GPS.
One of the challenges of using accelerometer measurements as a telematics data is the fact that the accelerometer
does not measure the acceleration of the device. Instead, it measures the force that applied to the body of acceleration.
Figure 2a shows how the accelerometer works in general. We can conceive the accelerometer as the box with a ball
inside of it. What the accelerometer records are the force that applied to the wall of the box. For instance, when the
accelerometer is put on the flat suffice of the earth, then the ball pushes the bottom of the box because of the gravity.
Thus, in that case, the accelerometer measures
upward from the ground, which differs from zero acceleration
for the static object. Furthermore, if the accelerometer value indicates the zero at the specific time point with a certain
axis, we expect the body of the accelerometer is being accelerated by gravity to the direction; in short, it is free falling.
This characteristic of accelerometer hinders the estimation of acceleration using an accelerometer when the road grade
changes dramatically.
3.2.1 The road grade effect
The road grade
can be defined as the angle between the road plane and the ground plane, which is perpendicular
to the direction of gravity (Jauch et al., 2017). If the accelerometer is on the uphill as in Figure 3a, the ball in the
accelerometer from Figure 2a will be pull by the force of
. Thus, if the accelerometer attached to the vehicle is
accelerated on the uphill, it will record the acceleration of the vehicle and the force of
at the same time, which
is illustrated in Figure 3b. Note that there is a large deviation between the raw accelerometer data (black line) and the
longitudinal acceleration of vehicle (blue line) around 20 second, when the vehicle was on the middle of a uphill. One
might think that the road grade can be captured from the pitch angle of the gyroscope if the IMU sensor is attached to
the vehicle. However, the pitch angle of gyroscope is affected by the acceleration of the vehicle so there is a deviation
between the road grade and the measured pitch angle (See (Jauch et al., 2017)). Thus, to use the accelerometer data for
APREPRINT - JU LY 29, 2019
(a) The road grade effect on accelerometer
(b) The raw y-axis accelerometer measurements (black) vs. the
ground truth longitudinal acceleration from OBD (green)
Figure 3: The black line in Figure 3b represents the raw data from y-axis accelerometer, while the blue line indicates
the ground truth longitudinal acceleration based on wheel speed. The vehicle drove on a sample route which has an
uphill incline during the time points around 20 second. On the uphill incline, the black line is above the blue line since
the accelerometer records both the road effect and the longitudinal acceleration.
telematics analysis, we need to remove the road grade effect from the accelerometer data. We will discuss this topic in
section 4.2.2
4 Methodology
4.1 Data recording
The data used in this paper are recorded by an iPhone application named "sensor play". The used device was an
iPhone 6 and it was fastened during the recording in a vehicle, Ford fusion 2015, as shown in Figure 4a. Figure 2b
shows the directions of the three axes of accelerometer. Since the smartphone device is installed as Figure 4a in the
vehicle, it matches the y-axis accelerometer with the longitudinal movements of the vehicle, while the x-axis and z-axis
accelerometer correspond to the lateral and vertical movements of the vehicle respectively (See Figure 4b). Based on
the figure, we can infer that the z-axis of the gyroscope will capture the angular velocity of the vehicle when the vehicle
makes the turns or lane change.
4.2 Discrete Kalman filter
The Kalman filter was introduced 50 years ago in the Engineering field, but it is still used relatively often in many
engineering problems (Faragher et al., 2012). The term Kalman filter is not familiar to the most of actuaries; however,
the Kalman filter has already been introduced to actuarial science community by De Jong and Zehnwirth (1983). After
(a) The installation of the device in a vehicle (b) The direction shared between IMUs and the vehicle
Figure 4: The smartphone was attached to the vehicle as in Figure 4a. Based on Figure 2b and Figure 4a, we can infer
each direction of IMUs corresponds to the vehicle movements as in Figure 4b.
APREPRINT - JU LY 29, 2019
the introduction, the Kalman filter appears continuously in non-life insurance fields such as (Arjas, 1989), Kremer
(1994), Evans and Schmid (2007), and (Taylor, 2012).
Although many papers in actuarial science have used the Kalman filter before, they follow the explanation in the
original paper of Kalman (1960), which explained the theory in the electrical engineering context. However, Meinhold
and Singpurwalla (1983) suggested the more comfortable way of understanding the Kalman filter from a Bayesian
perspective which most actuaries are already familiar with. The Kalman filter can be understood as a recursive Bayesian
estimation with a Normal-Normal conjugate prior setting. Bayesian inference under the Normal-Normal conjugate
assumption as follows;
Theorem 1
X|µNµ, σ2
µNµ0, σ2
, then the sampling distribution of
and the prior
distribution can be written as follows;
p(x|µ) = 1
2σ2, p (µ) = 1
The posterior distribution of µgiven that we have an observation xis
p(µ|x) = 1
where the mean µand variance σ2
is as follows;
, σ2
In Bayesian statistics, setting the prior distribution is subjective, and it does not affect the posterior distribution when
the sample size is large. However, in the Kalman filter, we can think the sample size is always fixed as one, which is a
current observation from an individual sensor at a time point
and the parameters for the prior distribution at time
determined by the inference result from the previous time point
. We will elaborate this in the next section by
providing more concrete example.
4.2.1 1-Dimensional Kalman filtering
Let us assume that we want to combine the information about vehicle speed at time
from two independent sensors;
GPS and accelerometer. To apply the Kalman filter, we modeled the speed information from GPS and accelerometer at
time tas a variable Stand Θtas follows;
St= Θt+t
Θt= Θt1+byt+wt
are observation noises which follow
N(0, r)
N(0, q)
respectively and
is constant. From the
basic physics,
speedt=speedt1+ ∆t×ay
are the vehicle speed and the longitudinal acceleration of the vehicle at time
respectively. Thus,
in Equation (1), we can interpret the
as the vehicle speed at time
, if we assume the constant
as the time
difference between tand t1, and the longitudinal acceleration from y-axis accelerometer respectively.
In Bayesian point of view, we have a prior information about the vehicle speed,
, from an accelerometer by integrating
the longitudinal acceleration with the previous speed estimation Θt1. Moreover, we can observe a sample of vehicle
speed from GPS at time
. Since
are normally distributed, we can re-expressed the above relationship between
Stand Θtas follows;
St|Θt∼ N t, r),
Θt∼ N (µt1+byt, pt1+q),
=N(µt, pt)
µt1:= Et1]
pt1:= V ar t1)
, and
r:= V ar (St|Θt1)
. According to the Theorem 1, the posterior
distribution of Θtwith given observation stis
APREPRINT - JU LY 29, 2019
Figure 5: The Kalman filter as a recursive Bayesian estimation.
Θt|St∼ N µtr+ptst
=N(µt+κ(stµt), ptκpt)
, and
is an observation from
, in other words, an observation from GPS at time
. Readers
who are already familiar with Kalman filter may notice that κis called as a Kalman gain.
In the Bayesian statistics, after obtaining the posterior distribution, the next step is to predict the future value based on
the given information. Since we are in the Normal-Normal conjugate situation, the predictive distribution of
also Normal distribution with mean as same as the posterior mean. However, in the Kalman filter context, instead of
predicting the future value
, we predict the optimal value of
based on the given observation
. Thus,
estimation of the optimal (or Kalman filtered) speed estimation at time t,St, is as follows;
Thus, the estimation of the optimal
is simply the posterior mean, which is a weighted average between the prior mean
and the sample mean. Thus, what Kalman filter does is that it calculates the optimal estimation of a quantity Stas the
weighted average between the observation of a quantity
from GPS,
, and the estimated value of the same quantity
but using the information from the previous step estimation and the accelerometer,
. In other words, it adjusts the
prior knowledge about the speed from an accelerometer
with a Kalman gain based on the measurement error from
GPS, stµt.
After obtaining the optimal estimation of the speed at
, Kalman filter use the information from time
to estimate the
speed at time
t+ 1
as an prior information. For the convenience of notations, let us define the following two quantities;
t:= µt+κ(stµt)
t:= ptκpt
These two quantities can be seen as "filtered" quantities through the Kalman filter at time
. The Kalman filter can be
considered as a recursive Bayesian estimation because the estimation
and the variance of the posterior distribution of
, will be used in a prior mean and variance of
respectively at time
t+ 1
. Figure 5 illustrates these
Kalman filter process as a series of prior and posterior inference.
APREPRINT - JU LY 29, 2019
4.2.2 Multidimensional Kalman filtering
From the previous section, we discuss how to use the Kalman filter to combine the information from two independent
sensor for estimating the vehicle speed. In Equation 1, we assumed the
is the longitudinal acceleration at time
from accelerometer. However, because of the characteristics of the accelerometer, the grade effect should be removed
from accelerometer measurements to feed them into the Kalman filter. (See, Section 3.2.1 and Figure 3) Bevly (2004)
used multidimensional Kalman filter to consider the road grade effect as one factor in the modeling for estimating the
longitudinal acceleration of vehicle.
Let us define
as a road grade at time
, then the longitudinal acceleration at time
, can be obtained as the
following physics formula;
where the constant gis the force of gravity, 9.81m/s2and acctis the raw y-axis accelerometer value at time t.
In multidimensional Kalman filter,
, the object we are trying to estimate, is a vector whose components are speed of
vehicle, the road grade at time
; these three quantities are observable or can be calculated from GPS and IMU sensors.
By using the integration formula, Equation (2), and the road grade removal from Equation (4), it can be expressed by
the following formula;
αt=speedt1+ ∆t×at
=speedt1+ ∆t×(acctg sin (αt))
0 1 speedt1
Note that the approximation in Equation (5) holds under the assumption that the road grade is not abruptly changed,
and the magnitude of the road grade is small,
. Furthermore, to getting a better estimation
for the road grade, we use the weighted average between pitch angle and the previous estimate of road grade from
the previous step with weights 0.02 and 0.98 respectively. For readers who want to know investigate all the details
of the algorithm we used, please look at the code of
R package
On the other hand, from GPS, we also have an observation about the same quantities. Since the velocity of the vehicle
at time t,vt, is provided from GPS, the road grade αtcan be calculated as;
is the vertical speed of vehicle obtained using the altitudes of GPS coordinates and
is the horizontal
speed of vehicle based on latitude and longitude of GPS coordinates. Although Bevly (2004) only uses GPS data for
estimating the road grade using the inverse tangent of the vertical velocity over the the horizontal velocity, we changed
Equation 4.2.2 in order to use the Doppler velocity vtdirectly as follows;
αtsin (αt) = vup
assuming that the road grades are small. Moreover, for the calculation of the vertical velocity of the vehicle, we used
barometer which is more accurate than GPS altitude data as in Figure 6a.
Thus, using the following multidimensional Kalman filter modeling equation, the
can be written as follows;
St= Θt+t
αt1, A := 1g×t
0 1 , b := t
0, yt=accy
follows the multivariate Normal distribution
N(0, R)
N(0, Q)
respectively. Note that the variable
Stand Θtare vectors in a multidimensional setting but still has a linear relationship with Θt.
The relationship between Stand Θ
tcan be written in a Bayesian way as follows;
APREPRINT - JU LY 29, 2019
St|Θt∼ N t,Σ)
Θt∼ N (µt,Σ0)(7)
where µt=t1+byt,
Σ = R,
Since the modeling distribution and the prior distribution follow the Normal distribution, the joint distribution is also
Normal distribution with a mean vector and a covariance matrix as follows;
St∼ N  µt
Σ0Σ0+ Σ .
Thus, the posterior distribution can be easily obtained by using the conditional distribution of multivariate Normal
distribution formula;
Θt|St∼ N (µt+K(stµt),Σ0− KΣ0)(8)
where κis
K= Σ00+ Σ)1.
Similar to 1-dimensional case in Equation (3), the optimal estimation of
given the observation
, can be set as
the posterior mean as follows;
t:=Σ0− KΣ0
Note that the optimal estimation of
, and the updated posterior variance of
, will be used as a prior
knowledge for µtand Ptat t+ 1 respectively.
5 Kalman filtering Implementation
In this section, the Kalman filter will be applied to the y-axis accelerometer data to obtain the longitudinal acceleration
of the vehicle during a trip. To evaluate the performance of the Kalman filtering for acceleromter data, we choose a
sample route which has complicated road topology as in Figure 1b. The route consists of a flat road, uphill and downhill,
and turns, which can be inferred from the GPS visualization of the route and the relative altitude of the route (See
Figure 6a). We assume that if the Kalman filter works well for the sample route, it will work for most of other routes
which can be reproduced by the combination of the elements in the sample routes. Thus, in this section, we will focus
on the calibration of the accelerometer for the sample route.
5.1 Filtering the longitudinal acceleration
The black line in the upper panel of Figure 6c shows the graph of raw y-axis accelerometer data during the sample trip.
As in Figure 6b, it is known that the speed graph based on the accelerometer suffers from a long term drift error. In
Figure 6b, the black line indicates the speed based on the integration method using the raw accelerometer data, which is
the black line in the upper panel of Figure 6c. On the other hands, the green line in Figure 6b indicates the ground truth
speed graph based on the wheel rotation rate, which came from OBD device. As we can see in the figure, the two line
deviates from around the time point of 25 second to the end of the trip. Note that the road grade effect main reason of
these deviation between the two lines which occurs in the following two main events; vehicle stops and driving on the
First, if the vehicle stopped on the inclined road, the accelerometer records are affected by gravity. For example, we can
infer the stop period during the trip based on the stability of the raw accelerometer values such as horizontal line on the
time points around 50 second in Figure 6c. Since the vehicle stopped at the tilted road, the raw y-axis accelerometer
(the black line in the upper panel of Figure 6c) during the stop records non-zero values because of the gravity. These
negative accelerations yield the deviation in the speed graph during the corresponding section in Figure 6b; the negative
acceleration means a decrease in speed.
Second, the road grade affect the speed graph not only when the vehicle is stopped but also when it is driving on the
inclined hills. Note that the black line around the time points of 25 second in Figure 6b differs from the green line in the
APREPRINT - JU LY 29, 2019
(a) Relative altitude for the sample route (See Figure 1b), which
has an uphill at the beginning and a downhill at the end.
(b) The speed graph of the sample route from each source; the raw
(black), Kalman filtered (blue), and OBD (green).
(c) The longitudinal acc. (upper) and the lateral acc. (lower) of the
trip; The raw (black) vs. the Kalman filtered (blue).
Figure 6: Telematics data corresponds to the sample route in Figure 1b. The trip started from the left bottom side of
the square. The yellow point in the middle of Figure 1b is the vehicle position at 28 second from the beginning. By
considering the relative altitude of the route in Figure 6a, about the first 30 seconds of the trip corresponds to uphill
section of the route.
figure with respect to its shape, unlike the second hump around the time points of 75 second. Since the changes in road
grade around the time points of 75 second is relatively smaller than the one during the time points of 25 second (See
Figure 6a), the shape deviation around the time points of 25 second is more evident than around the one of 75 second.
The effect of road grade during the driving can also be confirmed from the significant drop in the speed graph (Figure
6b) around the time points of 140 second, when the vehicle drove on the downhill.
Equation (6) will be used for filtering the longitudinal acceleration. Note that the equalities in Equation (6) hold under
the assumption that the quantities are under the same scale. Since the values from an accelerometer,
, are recorded
as the acceleration scale
, the speed quantities used in the equation should be converted into the unit of
, and
the road grade
should be recorded as the unit of radian. The blue line in the upper panel of Figure 6c represents the
Kalman filtered longitudinal acceleration of the sample route, which can be obtained from the optimal estimates of the
vehicle speed at each time point in Kalman filter process. Note that the filtered acceleration has zero values during the
time that the vehicle was stopped. Furthermore, the speed graph based on the Kalman filtered accelerometer, the blue
line in Figure 6b, synchronized with the green line in Figure 6b, which is the ground truth speed from OBD device.
For the readers who wants to implement the Kalman filter setting that we used in this paper, all the codes are available
as a
R package
). Note that we used the identity matrix
for the matrix
, but for the matrix
, we used
for y-axis accelerometer value and the road grade value at
respectively. This indicates that we put more credit to the acceleromter data in Kalaman filter process than the
GPS observation.
APREPRINT - JU LY 29, 2019
(a) Speed comparison between Euclidean based speed vs.
Doppler effect adjusted speed from GPS
(b) The lateral acceleration can be calculated by multiplying
the vehicle speed with the angular speed ωθt/t.
Figure 7: The gray line in Figure 7a represents the speed calculated from two pair of GPS coordinates; latitude,
longitude. Since these GPS coordinates are recorded using the geographic coordinate system, the distance between
these GPS coordinates are converted into miles using distm function from a R package geosphere written by Hijmans
et al. (2019).
5.2 Filtering the lateral acceleration
After obtaining the longitudinal acceleration of the vehicle, the next step is to filter the lateral acceleration. The lateral
acceleration, ax, can be obtained by following formula;
tg×sin (φt)(10)
is the raw
-axis accelerometer value at time
is the constant of gravity,
is roll angle. The last term can
be interpreted as the adjustment of gravitational effect on the lateral accelerometer similar to the road grade effect on
the longitudinal accelerometer (See Equation (4)). However, unlike the case of longitudinal acceleration, the pitch angle
of gyroscope does not affected by vehicle movements, the lateral acceleration can be directly calculated from Equation
(10). In lower panel of Figure 6c, the blue line indicates the adjusted lateral acceleration data whose gravitational
effect is compensated by using roll angle from gyroscope. During the time points around 100 second in the figure, the
obtained lateral acceleration is also centered at zero when the vehicle was stopped at the tilted road. The two blue lines
in Figure 6c represent the finalized longitudinal acceleration and the lateral acceleration of the sample trip. In the next
section, we will discuss how to make a driving profile using these two filtered objects.
6 Driving profiling with the Lon-Lat plot
Wüthrich (2017) suggests an object called velocity-acceleration heatmap (v-a heatmap) as a driving profiling object for
telematics data. The v-a heatmap can be considered as a discrete density plot of speed and acceleration data from GPS.
Even though speed calculated Euclidean distance in Wüthrich (2017), the v-a should be directly calculated from GPS
speed information since the speeds based on Euclidean distance are less accurate than the speeds based on Doppler
effect as in Figure 7a. Moreover, acceleration values in v-a heatmap are the longitudinal acceleration which calculated
by the changes in speed, which means the v-a heatmap does not have the information about the vehicle’s movements
related to lateral direction. Considering the turns are one of the major factors to determine the driving style, the object
for the telematics analysis should be accurate not only for the information of the longitudinal movements but also the
information of the lateral movements.
In this section, we took the extended sample route as in Figure 8a for the telematics analysis. We asked two drivers to
drive the extended sample routes using same vehicle. Both driver spent around 20 minutes to finish the trip. The record
rate of the IMU was 25 Hz, and the rate of GPS was 1 Hz. Using the calibration method in previous section, we could
get the longitudinal and lateral acceleration for the trip as in Figure 6c.
Using the the longitudinal and lateral acceleration data, the longitudinal-lateral acceleration density plot (lon-lat plot)
can be drawn as in Figure 8c or 8d. Since both trips took around 20 minutes, about 30,000 data points of acceleration
are used in IMU based lon-lat plot, which corresponds to 1,140 GPS data points. Note that Figure 8b shows the GPS
based lon-lat plots; red dots in the figure corresponds to the driver 1 (Figure 8c) and blue dots corresponds to the driver
2 (Figure 8d). To calculate the longitudinal acceleration data for the GPS based lon-lat plot, we uses the differences
APREPRINT - JU LY 29, 2019
(a) GPS based Lon-Lat plot comparison. (b) GPS based Lon-Lat plot comparison.
(c) IMU based Lon-Lat plot: Driver 1 (d) IMU based Lon-Lat plot: Driver 2
Figure 8: The IMU based Lon-Lat density plot could elucidates the differences of the driving style between the two
drivers than the GPS based Lon-Lat density plot.
between the Doppler based GPS speed data. On the other hands, the lateral acceleration for the GPS based lon-lat plot
can be obtained by the following formula;
where ax
tis the later acceleration, ωtis the angular velocity, and vtis the velocity of the vehicle at time trespectively.
Using the three consecutive GPS points, for example the three red dots in Figure 7b, we can approximate the angular
velocity. Since the refresh rate of GPS is 1 Hz, the lateral acceleration at time tcan be approximated as
can be calculated from the three GPS points and the vehicle speed at time
is given by GPS data itself (See
Figure 7b).
Based on Figure 8c and Figure 8d, we can infer that the IMU based lon-lat plots are better for revealing the difference
between the two driver’s driving style than the GPS based lon-lat plot in Figure 8b. For example, comparing Figure 8c
with Figure 8d, we can see that the driver 1 has sharper turns than the driver 2 since the driver 1 has more data points
along with x-axis than the driver 2. Also, the longer positive tail of the longitudinal acceleration (y-axis) from the driver
1 than driver 2 implies that the driver 1 push the gas pedal harder than the driver 2. However, it is hard to distinguish
these differences using GPS based lon-lat plots in Figure 8b because the distribution of the red dots and the blue dots
looks almost the same. One might think that the number of data points used in the figure is small for the GPS lon-lat
plot to expose the differences. Note that even though there are small number of data sample for GPS based lon-lat plot,
we can see that the red dots are more widely spread than the blue dots along with x-axis, which implies the larger lateral
APREPRINT - JU LY 29, 2019
acceleration for the driver 2 than the driver 1. Thus, using the information of lateral movements from GPS could lead
an opposite interpretation with the one based on IMU based lon-lat plots. Note that GPS based lon-lat plot for both
driver 1 and driver 2 has almost the same length for longitudinal and lateral acceleration. Considering the mechanism of
the vehicle movements, it is clear that it should not be true because the longitudinal acceleration range should be longer
than the lateral acceleration. According to Xu et al. (2015), the lateral acceleration of 5
is the discomfort limit
for the driver, which can occur in a mountain area with 30 km/h speed. Therefore, we can assure that the IMU based
lon-lat plot has more interpretation power than the GPS based lon-lat plot for telematics analysis.
One of the benefits of using the Kalman filtered accelerometer data is that it can be easily converted into the v-a heatmap,
which is already well studied by Gao et al. (2019b). By integrating the longitudinal acceleration, we can produce the
v-a heatmap since they already synced with the vehicle speed via the filtering process. Moreover, because of the fine
recording rate of the IMU, v-a heatmap can be produced by the short trip such as 20 - 30 min. small stip like the given
extended sample route. Note that the number of data points in 20 min of IMU recording is equal to the number of data
points in 8 hrs and 20 min of GPS recording. Secondly, in the lon-lat plot, since the both accelerations are centered at
zero, it could provide researchers with the parametric approach to the driving profile analysis. By considering that Gao
and Wüthrich (2018) tried to extract the low-dimensional features from the v-a heatmap using PCA and bottle neck
neural network, the parametric approach could offer an another view to the telematics object.
7 Conclusion
In this paper, we suggest the calibration process for the accelerometer data, which can be used for the telematics
analysis. Also, a new type of the telematics object, called a lon-lat plot was suggested for the driving style profiling. By
investigating the lon-lat plot, we shed light on the benefits of using Kalman filtered accelerometer data as a building
block for telematics analysis; the information about the lateral movements of the vehicle in IMU data is more accurate
than the information in GPS telematics data. Kalman filter is used to combine the speed information from GPS and
IMU sensors such as accelerometer, gyroscope, barometer. The suggested telematics object has not only the information
about speed and the acceleration, which are used previously in the literature, but also the additional information of the
lateral movements of the driving style. Furthermore, we explained the Kalman filtering process in the Bayesian point of
view, which could help the undergraduate students and actuaries to understand the concept intuitively.
Notation Interpretations
xt, yt, ztThe longitude, latitude, altitude at time tfrom GPS
tThe lateral, longitudinal acceleration at time t
tThe x-axis, y-axis raw accelerometer value at time t
φt,θt,ψtThe roll, pitch, yaw angle from gyroscope at time t
tThe longitudinal, vertial, horizontal velocity of a vehicle at time t
αtThe road grade at time t
gThe universal gravitational constant
Aljaafreh, A., Alshabatat, N., and Al-Din, M. S. N. (2012). Driving style recognition using fuzzy logic. In 2012 IEEE
International Conference on Vehicular Electronics and Safety (ICVES 2012), pages 460–463. IEEE.
Arjas, E. (1989). The claims reserving problem in non-life insurance: Some structural ideas. ASTIN Bulletin: The
Journal of the IAA, 19(2):139–152.
AXA (2014). Axa insurance company driver’s telematics analysis.
axa-driver-telematics-analysis. (Accessed: 07/04/2019).
Ayuso, M., Guillen, M., and Pérez-Marín, A. (2016). Telematics and gender discrimination: some usage-based evidence
on whether men’s risk of accidents differs from women’s. Risks, 4(2):10.
Bevly, D. M. (2004). Global positioning system (gps): A low-cost velocity sensor for correcting inertial sensor errors
on ground vehicles. Journal of dynamic systems, measurement, and control, 126(2):255–264.
Chalko, T. J. (2007). High accuracy speed measurement using gps (global positioning system). NU Journal of Discovery,
APREPRINT - JU LY 29, 2019
De Jong, P. and Zehnwirth, B. (1983). Claims reserving, state-space models and the kalman filter. Journal of the
Institute of Actuaries, 110(1):157–181.
Evans, J. P. and Schmid, F. (2007). Forecasting workers compensation severities and frequency using the kalman filter.
In Casualty Actuarial Society Forum, pages 43–66.
Faragher, R. et al. (2012). Understanding the basis of the kalman filter via a simple and intuitive derivation. IEEE
Signal processing magazine, 29(5):128–132.
Federal Highway Administration (2019). Lane width.
mitigationstrategies/chapter3/3_lanewidth.cfm. (Accessed: 07/04/2019).
Gao, G., Meng, S., and Wüthrich, M. V. (2019a). Claims frequency modeling using telematics car driving data.
Scandinavian Actuarial Journal, 2019(2):143–162.
Gao, G. and Wüthrich, M. V. (2018). Feature extraction from telematics car driving heatmaps. European Actuarial
Journal, 8(2):383–406.
Gao, G., Wüthrich, M. V., and Yang, H. (2019b). Evaluation of driving risk at different speeds. Insurance: Mathematics
and Economics.
Hijmans, R. J., Williams, E., Vennes, C., and Hijmans, M. R. J. (2019). Package ‘geosphere’.
Jauch, J., Masino, J., Staiger, T., and Gauterin, F. (2017). Road grade estimation with vehicle-based inertial measurement
unit and orientation filter. IEEE Sensors Journal, 18(2):781–789.
Johnson, D. A. and Trivedi, M. M. (2011). Driving style recognition using a smartphone as a sensor platform. In 2011
14th International IEEE Conference on Intelligent Transportation Systems (ITSC), pages 1609–1615. IEEE.
Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of basic Engineering,
Kremer, E. (1994). Robust credibility via robust kalman filtering. ASTIN Bulletin: The Journal of the IAA, 24(2):221–
Meinhold, R. J. and Singpurwalla, N. D. (1983). Understanding the kalman filter. The American Statistician, 37(2):123–
National Association of Insurance Commissioners (2018). State insurance regulation: Key facts and market trends. (Accessed: 07/04/2019).
Nikulin, V. (2016). Driving style identification with unsupervised learning. In Machine Learning and Data Mining in
Pattern Recognition, pages 155–169. Springer.
Progressive Casualty Insurance Company (2019). Progressive Firsts.
firsts. (Accessed: 07/04/2019).
Taylor, G. (2012). Loss reserving: an actuarial perspective, volume 21. Springer Science & Business Media.
The National Coordination Office for Space-Based Positioning, N. and Timing. (2017). How accurate is gps?
// Accessed: 2018-04-15.
Tselentis, D. I., Yannis, G., and Vlahogianni, E. I. (2016). Innovative insurance schemes: pay as/how you drive.
Transportation Research Procedia, 14:362–371.
Van Ly, M., Martin, S., and Trivedi, M. M. (2013). Driver classification and driving style recognition using inertial
sensors. In 2013 IEEE Intelligent Vehicles Symposium (IV), pages 1040–1045. IEEE.
Verbelen, R., Antonio, K., and Claeskens, G. (2018). Unravelling the predictive power of telematics data in car insurance
pricing. Journal of the Royal Statistical Society: Series C (Applied Statistics), 67(5):1275–1304.
Weidner, W., Transchel, F. W., and Weidner, R. (2016). Classification of scale-sensitive telematic observables for
riskindividual pricing. European Actuarial Journal, 6(1):3–24.
Weidner, W., Transchel, F. W., and Weidner, R. (2017). Telematic driving profile classification in car insurance pricing.
Annals of Actuarial Science, 11(2):213–236.
Wüthrich, M. V. (2017). Covariate selection from telematics car driving data. European Actuarial Journal, 7(1):89–108.
Xu, J., Yang, K., Shao, Y., and Lu, G. (2015). An experimental study on lateral acceleration of cars in different
environments in sichuan, southwest china. Discrete Dynamics in nature and Society, 2015.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
The objective of this paper is to provide a critical review of the most popular and often implemented methodologies related to Usage-based motor insurance (UBI). UBI schemes, like Pay-as-you-drive (PAUD) and Pay-how-you-drive (PHUD), are a new innovative concept that has recently started to be commercialized around the world. The main idea is that instead of a fixed price, drivers have to pay a premium based on their driving behaviour and degree of exposure. Despite the fact that it has been implemented only for a few years, it is proven to be a very promising practice with a significant potential impact on traffic safety. This is achieved by the financial incentive given to drivers in order to improve their driving behaviour such as reducing the number of harsh braking and acceleration events taking place or reducing their degree of exposure such as their annual mileage, the time of the day travelling etc. and therefore reduces traffic risk. It can also be beneficial towards other social objectives such as traffic congestion and pollution emissions reduction.
Full-text available
Pay-as-you-drive (PAYD), or usage-based automobile insurance (UBI), is a policy agreement tied to vehicle usage. In this paper we analyze the effect of the distance traveled on the risk of accidents among young drivers with a PAYD policy. We use regression models for survival data to estimate how long it takes them to have their first accident at fault during the coverage period. Our empirical application with real data is presented and shows that gender differences are mainly attributable to the intensity of use. Indeed, although gender has a significant effect in explaining the time to the first crash, this effect is no longer significant when the average distance traveled per day is introduced in the model. This suggests that gender differences in the risk of accidents are, to a large extent, attributable to the fact that men drive more often than women. Estimates of the time to the first accident for different driver risk types are presented. We conclude that no gender discrimination is necessary if telematics provides enough information on driving habits.
Telematics car driving data describes drivers’ driving characteristics. This paper studies the driving characteristics at different speeds and their predictive power for claims frequency modeling. We first extract covariates from telematics car driving data using K-medoids clustering and principal components analysis. These telematics covariates are then used as explanatory variables for claims frequency modeling, in which we analyze their predictive power. Moreover, we use these telematics covariates to challenge the classical covariates usually used in practice.
Insurance companies have started to collect high-frequency GPS car driving data to analyze the driving styles of their policyholders. In previous work, we have introduced speed and acceleration heatmaps. These heatmaps were categorized with the K-means algorithm to differentiate varying driving styles. In many situations it is useful to have low-dimensional continuous representations instead of unordered categories. In the present work we use singular value decomposition and bottleneck neural networks (autoencoders) for principal component analysis. We show that a two-dimensional representation is sufficient to re-construct the heatmaps with high accuracy (measured by Kullback–Leibler divergences).
We investigate the predictive power of covariates extracted from telematics car driving data using the speed-acceleration heatmaps of Gao, G. & Wüthrich, M. V. [(2017). Feature extraction from telematics car driving heatmaps. SSRN ID: 3070069]. These telematics covariates include K-means classification, principal components, and bottleneck activations from a bottleneck neural network. In the conducted case study it turns out that the first principal component and the bottleneck activations give a better out-of-sample prediction for claims frequencies than other traditional pricing factors such as driver's age. Based on these numerical examples we recommend the use of these telematics covariates for car insurance pricing.
A data set from a Belgian telematics product aimed at young drivers is used to identify how car insurance premiums can be designed based on the telematics data collected by a black box installed in the vehicle. In traditional pricing models for car insurance, the premium depends on self‐reported rating variables (e.g. age and postal code) which capture characteristics of the policy(holder) and the insured vehicle and are often only indirectly related to the accident risk. Using telematics technology enables tailor‐made car insurance pricing based on the driving behaviour of the policyholder. We develop a statistical modelling approach using generalized additive models and compositional predictors to quantify and interpret the effect of telematics variables on the expected claim frequency. We find that such variables increase the predictive power and render the use of gender as a rating variable redundant.
The information of the road grade is an important input for Advanced Driver Assistance Systems to improve the vehicle ride comfort, safety and fuel consumption. Current approaches for road grade estimation in the literature have various disadvantages, e.g. they lack in resolution and sample rate or use data from a lot of sensors, often not from series production. This paper presents methods, which are based on filters that combine the measurements from an inexpensive gyroscope, accelerometer and magnetometer to estimate the orientation of the sensor relative to the earth’s surface. The methods are evaluated using high-resolution road grade data as reference, which were acquired with an aircraft and the Light Detection And Ranging technique. The road grade information, calculated in the vehicle in real-time, could be transmitted to a central database and merged with information of other vehicles. Therefore, digital maps for Advanced Driver Assistance Systems could be kept updated in very short intervals with high-resolution road grade information.
Car insurance companies have started to collect high-frequency GPS location data of their car drivers. This data provides detailed information about the driving habits and driving styles of individual car drivers. We illustrate how this data can be analyzed using techniques from pattern recognition and machine learning. In particular, we describe how driving styles can be categorized so that they can be used for a regression analysis in car insurance pricing.
This paper presents pricing innovations to German car insurance. The purpose is to provide an effective approach to adapting actuarial pricing decision to incorporate telematic data, which differs substantially from established tariff criteria in complexity and volume. A vehicle mobility model and a real-world sample of driving profiles form the input into the analysis. We propose an allocation of the driving profiles based on velocity and acceleration parameters to specific driving styles for evaluating the driving behaviour to subsequently enable discounts or surcharges on the premiums to obtain usage-based insurance premiums. The result is highly relevant for actuaries, who calculate the tariffs, but also for managers, as they have to make a pricing decision.
One way to optimise insurance prices and policies is to collect and to analyse driving trajectories: sequences of 2D-points, where time distance between any two consequitive points is a constant. Suppose that most of the drivers have safe driving style with similar statistical characteristics. Using above assumption as a main ground, we shall go through the list of all drivers (available in the database) assuming that the current driver is “bad”. We shall add to the training database several randomly selected drivers assuming that they are “good”. By comparing the current driver with a few randomly selected “good” drivers, we estimate the probability that the current driver is bad (or has significant deviations from usual statistical characteristics). Note as a distinguished particular feature of the presented method: it does not require availability of the training labels. The database includes 2736 drivers with 200 variable length driving trajectories each. We tested our model (with competitive results) online during Kaggle-based AXA Drivers Telematics Challenge in 2015.