Content uploaded by Artjom Lind
Author content
All content in this area was uploaded by Artjom Lind on Oct 17, 2017
Content may be subject to copyright.
A new approach for mobile positioning using the
CDR data of cellular networks
(Preprint Version Submitted to MT-ITS17’ Conference)
Artjom Lind
Institute of Computer Science,
University of Tartu
Ulikooli 17
Tartu, Estonia
Email: artjom.lind@ut.ee
Amnir Hadachi
Institute of Computer Science,
University of Tartu
Ulikooli 17
Tartu, Estonia
Email: amnir.hadachi@ut.ee
Oleg Batrashev
Institute of Computer Science,
University of Tartu
Ulikooli 17
Tartu, Estonia
Email: olegus@ut.ee
Abstract—Nowadays, mobile devices are equipped with a
number of radio transceivers which are active every day and
everywhere. As a result, vast amounts of data and technical logs
are collected by mobile operators. For this reason, mobile phones
have a great potential for sensing urban and rural mobility
and population displacement. Therefore, in this article, we are
proposing a new approach for estimating the location of mobile
subscribers within the coverage area of a mobile network. The
method created is based on enhanced Kalman filter featured
with integrated mobility models. The algorithm allows estimating
location of mobile subscribers by knowing only the network
coverage cell to which they are connected. The results are very
encouraging and they can be very beneficial for applications
in intelligent transportation systems and location based services
based on the use of Call Detail Records (CDRs) data.
I. INTRODUCTION
Over the last years, the availability of mobile data has
increased [19] and new mobile patterns that showed noticeable
impact on other fields have been revealed. CDRs are a set of
information about telecoms transactions that operator uses to
generate billings. A single call detail record (CDR) contains a
number of records’ attributes [24], such as information about
the phone itself, the subscriber, the timestamps, the coverage
area ID, etc. Apparently nowadays, 5% of mobile data traffic
is created by Machine-to-Machine (M2M) connections due to
fast growing Internet of Things (IoT) and Intelligent Trans-
portation Systems (ITS) applications [20].
The ITS field demonstrates good potential for research and
applications. It was estimated [20] that the amount of M2M
traffic will grow up to 29% by 2021. However, this type of
data raises a lot of challenges and one of them is the accuracy
of localizing mobile users in mobile networks. There were a
number of previous studies that concentrated on CDR to show
clear success in mining CDRs for human mobility aspects [21]
or base stations’ characteristics [23].
Moreover, the precision of localization is rarely set as a
primary goal. Instead, the focus is set on other mobility aspects
clearly derivable from CDR. There was an attempt to roughly
estimate the travel route [22], assuming that a significant
amount of historical data in CDR is collected.
Besides, the use of cellular data, more precisely calls’ detail
records (CDRs), can be very beneficial in saving energy when
using location based routing protocol. In addition, this type of
data can be the key to urban mobility sensing, since almost all
of the people own mobile phones and are using them everyday
and everywhere. Therefore, CDR data can be very useful for
understating the dynamics of human mobility patterns [10] and
their means of transportation and displacement.
It is clear, based on the literature [11], that mobile cellular
networks and their data can be used as ubiquitous sensors
for real mobility in space and time. In [11] the author
demonstrated the use of CDR data by creating an algorithm to
infer vehicle travel time on highways and to detect traffic jams
in real time. Their approach was focused on the macroscopic
level of traffic.
Another use of CDR data for macroscopic analysis of mobility
is illustrated in [25], where the authors present an interesting
software that is capable of estimating multiple aspects of
travel demands based on CDR data in a flexible and efficient
manner. The software system includes old and new algorithms
for generating origin-destination matrices and route trips. In
addition, the system contains an interactive graphical web
interface for visualizing the results of the algorithms analysis
and estimations.
Following the same theme related to human mobility analysis,
it is well-defined that estimating and measuring the population
displacement can help a lot in understanding many urban
phenomena. However, the use of CDR data rises a lot of issues
and challenges which are clearly stated in [26]. The authors
pointed to the existing problems with mobile phone based
measures of mobility patterns and localization. In addition,
they described new alternative approaches that can help in
fixing these issues. Furthermore, they provided a variety of
useful cases that can help in understanding the human mo-
bility patterns by using mobile phone data at different levels:
microscopic and macroscopic.
At this moment, it is important to point to the fact that
positioning mobile users at a microscopic level still constitutes
a big challenge. The localization in mobile networks has been
investigated for a while now [15], [16], [18]. One of the
techniques that has been used is the centralized localization
algorithm that runs on a base station, where all the partici-
pating cells must forward their measurement data to the base
station [1]. The accuracy of this method is acceptable when
the users are not moving and it needs deployment at the base
station. Other derivatives of this technique exist [2], but they
are not fast enough regarding the positioning processes.
Some other methods use the signal strength, triangulation
and belief propagation algorithm for self-positioning [3]. In
addition, there is also the use of various distributed recursive
estimation approaches, such as particle filters [5], Kalman filter
[4] or sequential Monte Carlo [6], since they are suitable for
non-linear cases. In general, all the existing techniques use the
signal information from neighboring cells for self-positioning.
From the same family of approaches, we can bind the use of
Bayesian inference approach for locating mobiles in cellular
networks [9]. The approach presented in the article is using the
network information layout and shows how to integrate supple-
mentary knowledge, such as interference ratio measurements
and round-trip-time, to localize mobiles. The authors presented
results that prove that their method reduces the localization
error by 20%.
Moreover, in [12] the authors gave a performance comparison
of the self-positioning techniques using fixing methods, such
as Received Signal Strength (RSS) Statistics, Least Squares
(LS) [14], Weighted Least Squares (WLS) and Constrained
Weighted Least Squares (CWLS). Their results show that
CWLS is the most adequate for mobile positioning in urban
areas. The CWLS method performs well in dense urban
locations, where there are multiple multi-storied structures.
In addition, there is also the use of neural networks [13],
[17]. For example in [13], the article presents an application
of an artificial neural network to increase the accuracy of
cellular mobile subscribers’ positioning. Their technique relies
on neural networks in fusing radio location measurements and
confidence of measurements to refine the positioning accuracy.
However, the approach was oriented on locating the mobile
station, not the users. Moreover, their experiment was carried
out based on simulated data and no real field experiment was
done.
From this perspective, in our study we focus on localization
within the coverage area, relying on historical CDR data or
dynamic Visitor Location Registry (VLR) feed. The latter al-
lows us to use our algorithm for real-time dynamic positioning
of mobile subscribers on operator’s side. Location awareness
is important for the cellular network, since many applications
depend on its data for developing environment monitoring,
road traffic management, control and tracking in emergency
situations, etc. Hence, in this article we are proposing an
algorithm for estimating the exact location of mobile devices
in the cellular network by using enhanced Kalman Filter.
The proposed method also includes mobility models that the
Kalman Filter uses for estimating the exact location of the
mobile users, plus a coverage optimization technique.
II. PRO BL EM STATE ME NT
The CDR record is generated at the moment of an explicit
phone usage (calling, sending SMS or packet data). This
means that the moment you are not using your phone, no in-
formation is collected. Hence, Call Detail Records are sparsely
sampled spatio-temporal data [7]. The collected records con-
tain the following information (Table 1):
TABLE I
TYPE OF INFORMATION CONTAINED IN CDR DATA
Inf ormationT y pe Description
IMSI The International Mobile Subscriber Identity
IMEI The International Mobile Equipment Identity
Cell ID Mobile coverage cell area identification number
timestamp Time or moment when the event was recorded
Events type The nature of action performed that triggered
the event
Therefore, the trajectories extracted are a set of cell IDs,
which reflect a coverage zone location of the mobile network
in a chronological order (Fig. 1).
The Cell ID number is used to identify the base transceivers
stations (BTS) or sectors within BTS. Each base transceiver
sector covers an area, hence the name ”coverage area”. In
our study we identify coverage areas by using the ID of
corresponding BTS sectors.
As a result, it is very challenging to know whether the user
is moving or not and also to estimate his location within the
coverage zone. Hence, our main objective is to try to position
the mobile user within the mobile network coverage area by
using CDR data and pre-defined mobility models.
Fig. 1. Illustration of Extracted Trajectory from CDR data - Polygons are the
CDR Trajectory - dots are real GPS trajectory
III. METHODOLOGY
Determining the exact location of a mobile subscriber within
the mobile network coverage area is very challenging when it
comes to the use of CDR data only. In addition, literature
discussed in previous section indicates a lack referring to the
use of CDR data to extract the exact location in the mobile
network.
Before describing the methodology adopted, we will start by
describing the data used and its purpose. There are three
sources of data that we used for three purposes. First we col-
lected GPS data using a customized mobile application capable
of recording the CellID, to which the phone is connected, and
the GPS locations simultaneously. The information collected
gives an idea about the location of the phone based on the GPS
and also the CellID or coverage area that the phone is using.
This data was used in the process of optimizing the coverage
area (described in section A) since we noticed that in some
cases the phone was located outside of the coverage area and
still connected to it. The second data is the CDR data that was
used with our proposed enhanced Kalman filter algorithm for
estimating the location of the phone. In order to evaluate the
outcome of the algorithm, we created a mobile application to
collect GPS data from the mobile phones and this data was
used as a ground truth to evaluate our estimations.
A. Coverage Optimization
The coverage area information, existing in the CDR data
provided by the mobile operators, does not reflect the real
coverage zone. The subscriber can still be connected to the
transceiver (BTS) by certain Cell ID and at the same time
he/she can be outside of the coverage area declared by the
operator for this BTS. This clearly indicates a need for updat-
ing the coverage area information for this BTS. Therefore, we
propose a solution to enhance the coverage representation by
coupling between the GPS data and the cell events.
For this purpose, we defined a function f(u)that penalizes
large distances d(x, r, xg)between the cell coverage circle
(x,r), which is defined by its center xand radius r, and the
GPS coordinates xgat the time of the cell events occurrence.
In general, the function created is performing a mathematical
optimization by minimizing a penalty function f(u). More-
over, we took into account that the coverage area should
not exceed the area that is defined by the GPS. Hence,
the function also penalizes a large coverage radius ui.The
penalizing function is defined as follows:
Let Cbe the set of all pairs (j, xg)of a cell index and its
event GPS coordinates.
f(u) = X
i
u2
i+wX
(j,xg)∈C
[min (0, d(xj, rj,xg))]2,(1)
where uiis radius extensions, d(x, r, xg) = r− |x−xg|and
we defined w= 10 as the weight for non-coverage penalty. We
used the implementation in scipy of the L-BFGS-B algorithm
[8] to minimize the coverage function.
B. Localization estimation in the mobile network
In general, the Kalman Filter (KF) permits only one single
transition matrix at each step t. Therefore, classic Kalman is
good at predicting in the scope of one behavior model, for
example stable directed movement or standing. However, in
reality there are many types of behaviors to select from and
the behavior can change multiple times within one trajectory.
Thus, for our case, in order to cast this missing mobility
behavior aspect, we add discrete random variables to the
Kalman Filter. This results in the adaptive Kalman Filter [4].
As a result, our discrete random variable Stdefines the model
used for the transition to step t. Then, KF calculates the
probability of each model Mat time t, given all the evidence
or up-to-date evidence as well as a probability distribution of
the hidden state variable associated with each model:
Mt|t(i) = P(St=i|xg1:t)(2)
Mt|T(i) = P(St=i|xg1:T)(3)
The consolidated belief state of hidden variable Xtat time t
is represented as the mixture of Gaussians of all models scaled
by the probabilities of the models:
P(Xt|xg1:τ) = X
i
Mt|τ(i)·P(Xt|St=i, xg1:τ)(4)
, and
P(Xt|St=i, xg1:τ) = N(µi
t|τ,Σi
t|τ)τ∈ {t, T }(5)
Finally, it is necessary to define the model transition prob-
ability matrix Tz(i, j) = P(St=j|St−1=i). We define the
transition probability from St−1=ito St=jwith the higher
chance to stay in the same model:
Tz(i, j) = (0.8if i=j
0.2
Num−1otherwise (6)
where, Numis the number of models. Besides, one of the
issues with KF is the exponential growth of the belief state
due to the multiplication between the number of Gaussians and
the number of models at each step. For this reason, filtering
and smoothing are a necessity in the process and they are
computed using predefined models of behavior.
C. Mobility Models
Taking location ¯
xtand velocity ϑtat time tas hidden
variables xt=¯
xt
ϑtand specifying that a moving user‘s
coordinates and velocities satisfy the following equations:
¯
xt=¯
xt−1+ϑt−1δt +¯
qt(7)
ϑt=ϑt−1+˙
qt(8)
where δt is the time difference from the previous event
resulting from the Bayes Network approach and ¯
qt
˙
qtare
noise. In general, the KF equations are as follows:
xt=Fxt−1+Qt(9)
xgt=Hxt+Rt,(10)
hence, for each model we have to define a transition matrix
Fand a noise variance matrix Qt∼ N (0,Q(M)
t). For
instance, in case of a moving user on plain 2D map, its
transition matrix (Move model) is:
F(M)=
1 0 δt 0
0 1 0 δt
0 0 1 0
0 0 0 1
(11)
Whereas, a staying user at the same location can be char-
acterized by an identity matrix F(S)=I(Stay Model).
F(S)=
1000
0100
0010
0001
(12)
Meanwhile, the observation model (Hand Rt) should
reflect the location of the antenna that the user is connected to,
and the coverage zone of the antenna expresses the observation
error. During testing, all the ”antennas” will have the same
model:
H=1000
0100Rt=1.220
0 1.22.(13)
As a consequence, the algorithm computes the probabilities
P(St=k|xg1:t)of each model kat time tand the probability
distribution of the coordinate and the velocity P(Xt|St=
k, xg1:t)for given up-to-date evidence xg1:t. Then, the same
process is applied to all the evidence xg1:T, which gives
smoothed results, that are more accurate and therefore are used
in actual testing and validation (Fig. 2).
Fig. 2. Coordinates and probabilities for Stay and Move models
IV. EXP ER IM EN TATIO N AN D TE ST IN G
A. Data
During the testing process, we will use the GPS data col-
lected described in section III as a ground truth for evaluating
the algorithm. The Kalman filter will use the CDR data for
estimating the subscribers locations.
In addition, we made sure that the CDR data corresponds
exactly to the time period and users from whom we collected
the GPS data (ground truth) in order to be able to compare
the estimated location using our algorithm and evaluate its
performance.
B. Results and discussion
The testing of our method was carried out by using GPS data
as reference and 271 CDR records from different users.Then,
we compared the estimated positions given by our algorithm
to the real GPS data. For example in (Fig. 3), we have a
case where the mobile user is traveling by train. The triangles
represent his/her GPS data during the trip and the circles are
the estimated positions using our method. In addition, every
circle dot is linked to its appropriate ground truth GPS data
by a segment. The illustration demonstrate interesting results
when we realize that we used only the coverage area of the
mobile network location generated by the mobile user as an
input to the proposed algorithm. The second example in figure
4 is a user walking around and stopping from time to time in
the city center. We can clearly notice that the algorithm is
capable of getting closer to the area where the user is located.
Fig. 3. The Algorithm output ”case of a user traveling by train”; - the
polygons are the optimized mobile network coverage; - the circle dots are
the algorithm’s estimations (Estimated Locations) and the triangles are the
GPS data (Real Positions).
For more analysis, we got the first glimpse of the results by
checking the estimation error through the comparison of the
estimated locations and the real locations provided by the GPS
data. The general view about the output illustrated in Table 2
is the estimation without applying any coverage optimization.
The algorithm produces an average error of 0.9 kilometers in
a stay model and 1.9 kilometers in a move model.
However, when applying the coverage optimization to the
algorithm the results get better (Table 3). The algorithm
produces an average error of 0.4 kilometers in the stay model
and 1.2 kilometers in the move model. After all, the coverage
optimization step has a positive impact on the algorithm‘s
estimation. Moreover, the algorithm is relatively good at
estimating the location when the subjects are staying in a
specific location instead of moving.
Fig. 4. The Algorithm output ”case of a user walking and stopping from time
to time”; - the circle dots are the algorithm’s estimations (Estimated Locations)
and the triangles are the GPS data (Real Positions- Red stay location, Green
walking).
TABLE II
PER FOR MA NCE O F THE P ROP OS ED AL GO RIT HM W ITH OU T COVE RAG E
OPTIMIZATION
Model Estimation Number
T ype Err or Averag e of
CDR
in(Km)in(K m)Records
Stay [Min:0.002 ; Max: 13.543 ] 0.931 147
Model
Move [Min:0.027 ; Max: 13.061 ] 1.946 124
Model
Total [Min:0.002 ; Max: 13.543] 1.438 271
Performance
TABLE III
PER FOR MA NCE O F THE P ROP OS ED AL GO RIT HM W ITH C OVE RAG E
OPTIMIZATION
Model Estimation Number
T ype Err or Averag e of
CDR
in(Km)in(K m)Records
Stay [Min:0.001 ; Max: 4.958 ] 0.432 147
Model
Move [Min:0.008 ; Max: 10.749 ] 1.279 124
Model
Total [Min:0.001 ; Max: 10.749] 0.892 271
Performance
V. CONCLUSION
In this paper, we have presented a method capable of
estimating the location of mobile users within the cell coverage
area by using only CDR data.
The algorithm uses the centroid of the coverage areas and
mobility models in order to estimate the positions of the
mobile users. These models allow the algorithm to simulate
the movement of the users by using the observed CDR data.
The results are very encouraging for more investigations aimed
at enhancing the algorithm. In addition, the actual results can
be very useful for applications in intelligent transportation
systems or location based services.
ACKNOWLEDGMENT
The authors gratefully acknowledge the contribution of The
Software Technology and Applications Competence Centre
(STACC) through Large-scale Mobile Positioning Data Mining
(Demograft) project and all the partners in Archimedes project
”The Real-time Location-based Big Data Algorithms” for their
help in providing the data.
This research was supported by IUT34-4 ”Data Science Meth-
ods and Applications” (DSMA) project and the European
Regional Development Fund through the Estonian Centre of
Excellence in IT (EXCITE).
REFERENCES
[1] Mao, G., Fidan, B., Anderson, B.D.O.,”Wireless sensor network local-
ization techniques”, in the Computer Networks journal, Vol. 51(10), pp.
2529-2553, 2007.
[2] Kus’y, B., Sallai, J., Balogh, G., L’edeczi, A., Protopopescu, V., Tolliver,
J., DeNap, F., Parang, M.,”Radio interferometric tracking of mobile
wireless nodes”. In: Proc. of MobiSys , 2007, pp. 139-151.
[3] F. Meyer, O. Hlinka, and F. Hlawatsch, ”Sigma point belief propagation,”
in the IEEE Signal Process. Lett., vol. 21, pp. 145-149, Feb. 2014.
[4] Feng Xiao, Mingyu Song, Xin Guo, et al, ”Adaptive Kalman filtering for
target tracking”, in the China Ocean Acoustics (COA) Conference, 2016.
[5] Carsten Fritsche and Anja Klein ”On the performance of mobile terminal
tracking in urban GSM networks using particle filters”, in The 17th
European Signal Processing Conference, 2009.
[6] Salke Hartung; Ansgar Kellner; Konrad Rieck; Dieter Hogrefe, ”Monte
Carlo Localization for path-based mobility in mobile wireless sensor
networks”, in The IEEE Wireless Communications and Networking
Conference (WCNC), 2016.
[7] Ficek M.; Kencl L, ”Inter-call mobility model: A spatio-temporal re-
finement of call data records using a gaussian mixture model”, in the
Proceedings IEEE INFOCOM, 2012, pp. 469-477.
[8] J.L. Morales and J. Nocedal, ”L-bfgs-b: Remark on algorithm 778: L-
bfgs-b, fortran routines for large scale bound constrained optimization”,
in the ACM Transactions on Mathematical Software Journal, Vol. 38(1),
2011.
[9] H. Zang ; F. Baccelli ; J. Bolot, ” Bayesian Inference for Localization in
Cellular Networks”, in the IEEE Proceedings INFOCOM, 2010 .
[10] Shan Jiang, Joseph Ferreira, Marta C. Gonzalez, ”Activity-Based Human
Mobility Patterns Inferred from Mobile Phone Data: A Case Study of
Singapore”,in the IEEE Transactions on Big Data Journal, Vol:PP, Issue:
99, 2016 .
[11] Andreas Janecek, Danilo Valerio, Karin Anna Hummel, Fabio Ricciato,
Helmut Hlavacs, ”The Cellular Network as a Sensor: From Mobile Phone
Data to Real-Time Road Traffic Monitoring”, in the IEEE Transactions
on Intelligent Transportation Systems Journal, Vol: 16, Issue: 5, 2015.
[12] D. Krishna Reddy, A.D. Sarma and V. Satya Srinivas, ”Mobile Position
Estimation with RSS Based Techniques in an Urban City with Multiple
Multi-storied Structures”, in the Radio Science Meeting (Joint with AP-S
Symposium), 2014.
[13] S. Merigeault ; M. Batariere ; J.N. Patillon, ”Data fusion based on
neural network for the mobile subscriber location”, in the IEEE Vehicular
Technology Conference, 2000.
[14] Chin-Liang Wang; Dong-Shing Wu; Shih-Cheng Chen; Kai-Jie Yang,
”A Decentralized Positioning Scheme Based on Recursive Weighted
Least Squares Optimization for Wireless Sensor Networks”, in the IEEE
Transactions on Vehicular Technology, 2015.
[15] Isaac Amundson and Xenofon D. Koutsoukos, ”A Survey on Local-
ization for Mobile Wireless Sensor Networks ”, in Proc of the Second
International Workshop, 2009, pp 235-254.
[16] Lyudmila Mihaylova, Donka Angelova, David R. Bull, and Nishan
Canagarajah, ”Localization of Mobile Nodes in Wireless Networks with
Correlated in Time Measurement Noise”, in the IEEE Transactions on
Mobile Computing, Volume 10, No. 1, JANUARY 2011.
[17] Mohammad Shaifur Rahman, Youngil Park, and Ki-Doo
Kim,”Localization of Wireless Sensor Network using artificial neural
network”, in the 9th International Symposium on Communications and
Information Technology, ISCIT, 2009.
[18] Fumio Teraoka, Tetsuya Arita, ”PNEMO: a Network-Based Localized
Mobility Management Protocol for Mobile Networks”, in the Third
International Conference onUbiquitous and Future Networks (ICUFN),
2011.
[19] Jin, Yu and Duffield, Nick and Gerber, Alexandre and Haffner, Patrick
and Hsu, Wen-Ling and Jacobson, Guy and Sen, Subhabrata and
Venkataraman, Shobha and Zhang, Zhi-Li, ”Characterizing Data Usage
Patterns in a Large Cellular Network”, in the Proc. of the ACM SIG-
COMM Workshop on Cellular Networks: Operations, Challenges, and
Future Design, 2012.
[20] Cisco and/or its affiliates, ”Cisco Visual Networking Index (tm), Global
Mobile Data Traffic Forecast (2016 to 2021)”, White paper, Cisco Public,
2017.
[21] Furletti, Barbara and Gabrielli, Lorenzo and Renso, Chiara and
Rinzivillo, Salvatore, ”Identifying Users Profiles from Mobile Calls
Habits”, in the Proceedings of the ACM SIGKDD International Workshop
on Urban Computing, 2012.
[22] H. Kanasugi and Y. Sekimoto and M. Kurokawa and T. Watanabe
and S. Muramatsu and R. Shibasaki, ”Spatiotemporal route estimation
consistent with human mobility using cellular network data”, in the IEEE
International Conference on Pervasive Computing and Communications
Workshops (PERCOM Workshops), 2013.
[23] S. Zhang and D. Yin and Y. Zhang and W. Zhou, ”Computing on Base
Station Behavior Using Erlang Measurement and Call Detail Record”,in
the IEEE Transactions on Emerging Topics in Computing Journal, 2015.
[24] International Telecommunication Union, ”Specification of TMN ap-
plications at the Q3 interface: Call detail recording”, in the ITU-T
Recommendation Q.825,1998.
[25] J. L. Toole, S. Colak, B. Sturt, P. A. Lauren, A. Evsukoff, M.C.
Gonzalez, ”The path most traveled: Travel demand estimation using big
data resources”, in Transportation Research Part C Journal, Vol. 58, pp.
162-177, 2015.
[26] N.E. Williams, T.A. Thomas, M. Dunbar, N. Eagle, A. Dobra, ”Measures
of Human Mobility Using Mobile Phone Records Enhanced with GIS
Data”, in PLoS ONE Journal, Vol. 10(7),2015.