An Online Adaptive Model for Location Prediction.
ABSTRACT Contextawareness is viewed as one of the most important aspects in the emerging pervasive computing paradigm. Mobile contextaware
applications are required to sense and react to changing environment conditions. Such applications, usually, need to recognize,
classify and predict context in order to act efficiently, beforehand, for the benefit of the user. In this paper, we propose
a mobility prediction model, which deals with context representation and location prediction of moving users. Machine Learning
(ML) techniques are used for trajectory classification. Spatial and temporal online clustering is adopted. We rely on Adaptive
Resonance Theory (ART) for location prediction. Location prediction is treated as a context classification problem. We introduce
a novel classifier that applies a Hausdorfflike distance over the extracted trajectories handling location prediction. Since
our approach is timesensitive, the Hausdorff distance is considered more advantageous than a simple Euclidean norm. A learning
method is presented and evaluated. We compare ART with Offline kMeans and Online kMeans algorithms. Our findings are very promising for the use of the proposed model in mobile context aware applications.

Dataset: aoudjit
 [Show abstract] [Hide abstract]
ABSTRACT: This paper presents a mobility prediction technique based on one of the most used Data mining technique which is the association rules. Our solution can be implemented on a nextgeneration mobile networks by exploiting the data available on existing infrastructure (roads, locations of base stations ...etc.) and the users' displacements history. Simulations carried out using a realistic model of movements showed that our strategy can accurately predict up to 90% of the users' movements by knowing only their last two movements.  SourceAvailable from: Sofiane Hamrioui[Show abstract] [Hide abstract]
ABSTRACT: This paper presents a mobility prediction technique based on one of the most used Data mining technique which is the association rules. Our solution can be implemented on a nextgeneration mobile networks by exploiting the data available on existing infrastructure (roads, locations of base stations ...etc.) and the users' displacements history. Simulations carried out using a realistic model of movements showed that our strategy can accurately predict up to 90% of the users' movements by knowing only their last two movements.
Page 1
An Online Adaptive Model for Location Prediction
Theodoros Anagnostopoulos, Christos Anagnostopoulos, and Stathes Hadjiefthymiades
Pervasive Computing Research Group, Communication Networks Laboratory, Department of
Informatics and Telecommunications, University of Athens, Panepistimiopolis, Ilissia, Athens
15784, Greece, tel: +302107275127
{thanag,bleu,shadj}@di.uoa.gr
Abstract. Contextawareness is viewed as one of the most important aspects in
the emerging pervasive computing paradigm. Mobile contextaware
applications are required to sense and react to changing environment
conditions. Such applications, usually, need to recognize, classify and predict
context in order to act efficiently, beforehand, for the benefit of the user. In this
paper, we propose a mobility prediction model, which deals with context
representation and location prediction of moving users. Machine Learning
(ML) techniques are used for trajectory classification. Spatial and temporal on
line clustering is adopted. We rely on Adaptive Resonance Theory (ART) for
location prediction. Location prediction is treated as a context classification
problem. We introduce a novel classifier that applies a Hausdorfflike distance
over the extracted trajectories handling location prediction. Since our approach
is timesensitive, the Hausdorff distance is considered more advantageous than
a simple Euclidean norm. A learning method is presented and evaluated. We
compare ART with Offline kMeans and Online kMeans algorithms. Our
findings are very promising for the use of the proposed model in mobile
context aware applications.
Keywords: Contextawareness, location prediction, Machine Learning, online
clustering, classification, Adaptive Resonance Theory.
1 Introduction
In order to render mobile contextaware applications intelligent enough to support users
everywhere / anytime and materialize the socalled ambient intelligence, information on
the present context of the user has to be captured and processed accordingly. A well
known definition of context is the following: “context is any information that can be used
to characterize the situation of an entity. An entity is a person, place or object that is
considered relevant to the integration between a user and an application, including the
user and the application themselves” [1]. Context refers to the current values of specific
ingredients that represent the activity of an entity / situation and environmental state (e.g.,
attendance of a meeting, location, temperature).
One of the more intuitive capabilities of the mobile contextaware applications is their
Page 2
proactivity. Predicting user actions and contextual ingredients enables a new class of
applications to be developed along with the improvement of existing ones. One very
important ingredient is location. Estimating and predicting the future location of a mobile
user enables the development of innovative, locationbased services/applications [2], [12].
For instance, location prediction can be used to improve resource reservation in wireless
networks and facilitate the provision of locationbased services by preparing and feeding
them with the appropriate information well in advance. The accurate determination of the
context of users and devices is the basis for contextaware applications. In order to adapt to
changing demands, such applications need to reason based on basic context ingredients
(e.g., time, location) to determine knowledge of higherlevel situation.
Prediction of context is quite similar to information classification / prediction (offline
and online). In this paper, we adopt ML techniques for predicting location through an
adaptive model. ML is the study of algorithms that improve automatically through
experience. ML provides algorithms for learning a system to cluster preexisting
knowledge, classify observations, predict unknown situations based on a history of
patterns and adapt to situation changes. Therefore, ML can provide solutions that are
suitable for the location prediction problem. Contextaware applications have a set of
pivotal requirements (e.g., flexibility and adaptation), which would strongly benefit if the
learning and prediction process could be performed in real time. We argue that the most
appropriate solutions for location prediction are offline and online clustering and
classification. Offline clustering is performed through the Offline kMeans algorithm while
online clustering is accomplished through the Online kMeans and Adaptive Resonance
Theory (ART). Offline learners typically perform complete model building, which can be
very costly, if the amount of samples rises. Online learning algorithms are able to detect
changes and adapt / update only parts of the model thus providing for fast adaptation of the
model. Both forms of algorithms extract a subset of patterns / clusters (i.e., a knowledge
base) from an initial dataset (i.e., a database of user itineraries). Moreover, online learning
is more suited for the task of classification / prediction of the user mobility behavior as in
the real life user movement data often needs to be processed in an online manner, each
time after a new portion of the data arrives. This is caused by the fact that such data is
organized in the form of a data stream (e.g., a sequence of timestamped visited locations)
rather than a static data repository, reflecting the natural flow of data. Classification
involves the matching of an unseen pattern with existing clusters in the knowledge base.
We rely on a Hausdorfflike distance [5] for matching unseen itineraries to clusters (such
metric applies to convex patterns and is considered ideal for user itineraries). Finally,
location prediction boils down to location classification w.r.t. Hausdorfflike distance.
We assess two training methods for training an algorithm: (i) the “nearly” zero
knowledge method in which an algorithm is incrementally trained starting with a little
knowledge on the user mobility behavior and the (ii) supervised method in which sets of
known itineraries are fed to the classifier. Moreover, we assess a learning method for the
online algorithms regarding the success of location prediction, in which a misclassified
instance is introduced into the knowledge base updating appropriately the model.
Page 3
We evaluate the performance of our models against the movement of mobile users. Our
objective is to predict the users’ future location (their next move) through an online
adaptive classifier. We establish some important metrics for the performance assessment
process taking into account low systemrequirements (storage capacity) and effort for
model building (processing power). Specifically, besides the prediction accuracy, i.e., the
precision of location predictions, we are also interested in the size of the derived
knowledge base; that is the produced clusters out of the volume of the training patterns,
and the capability of the classifier to adapt the derived model to unseen patterns. Surely,
we need to keep storage capacity as low as possible while maintaining good prediction
accuracy. Lastly, our objective is to assess the adaptivity of the proposed schemes, i.e., the
capability of the predictor to detect and update appropriately the specific part of the trained
model. The classifier (through the location prediction process) should rapidly detect
changes in the behavior of the mobile user and adapt accordingly through model updates,
however, often at the expense of classification accuracy (note that an ambient environment
implies high dynamicity). We show that increased adaptivity leads to high accuracy and
dependability.
The rest of the paper is structured as follows. In Section II we present the considered
ML models by introducing the Offline kMeans, Online kMeans and ART algorithms. In
Section III we elaborate on the proposed model with context representation. Section IV
presents the proposed mobility prediction model based on the ART algorithm. The
performance assessment of the considered model is presented in Section V, where different
versions of that model are evaluated. Moreover, in Section VI, we compare the ART
models with the Offline / Online kMeans algorithms. Prior work is discussed in Section
VII and we conclude the paper in Section VIII.
2 Machine Learning Models
In this section we briefly discuss the clustering algorithms used throughout the paper.
Specifically, we distinguish between offline and online clustering and elaborate on the
Offline/Online kMeans and ART.
2.1 Offline kMeans
In Offline kMeans [3] we assume that there are k > 1 initial clusters (groups) of data. The
objective of this algorithm is to minimize the reconstruction error, which is the total
Euclidean distance between the instances (patterns), ui, and their representation, i.e., the
cluster centers (clusters), ci. We define the reconstruction error as follows:
{ }
∑∑
=
ti
2
where
()
−=
itt i
b
k
i
i
UE
2
,
1

1

cuc
(1)
Page 4
⎩
⎨
⎧
, 1
−=−
=
otherwise
, 0
if
b
lt
l
it
t i
 min 
,
cucu
,
U = {ut} is the total set of patterns and C = {ci}, i = 1,…, k is the set of clusters. bi,t is 1 if
ci is the closest center to ut in Euclidean distance. For each incoming ut each ci is updated
as follows:
∑
=
i
c
∑
t
t i
t
t t i
b
b
,
,u
(2)
Since the algorithm operates in offline mode, the initial clusters can be set during the
training phase and cannot be changed (increased or relocated) during the testing phase.
2.2 Online kMeans
In Online kMeans [3] we assume that there are k > 1 initial clusters that split the data. Such
algorithm processes unseen patterns one by one and performs small updates in the position
of the appropriate cluster (ci) at each step. The algorithm does not require a training phase.
The update for each new (unseen) pattern ut is the following:
()
itt iii
b
cucc
−⋅⋅+=
,
η
This update moves the closest cluster (for which bi,t = 1) toward the input pattern ut by a
factor of η. The other clusters (found at bigger distances from the considered pattern) are
not updated. The semantics of bi,t, η and (ut – ci) are:
? bi,t ∈ {0, 1} denotes which cluster is being modified,
? η ∈ [0, 1] denotes how much is the cluster shifted toward the new pattern, and,
? (ut – ci) denotes the distance to be learned.
Since the algorithm is online, the initial clusters should be known beforehand1 and can
only be relocated during the testing phase. The number of clusters remains constant.
Therefore, the algorithm exhibits limited flexibility.
2.3 Adaptive Resonance Theory
The ART approach [4] is an online learning scheme in which the set of patterns U is not
available during training. Instead patterns are received one by one and the model is
updated progressively. The term competitive learning is used for ART denoting that the
(local) clusters compete among themselves to assume the “responsibility” for representing
an unseen pattern. The model is also called winnertakesall because one cluster “wins the
competition” and gets updated, and the others are not updated at all.
1 One possible approach to determine the initial k clusters is to select the first k distinct instances of the input
sample U.
Page 5
The ART approach is incremental, meaning that one starts with one cluster and adds a
new one, if needed. Given an input ut, the distance bt is calculated for all clusters ci, i = 1,
.., k, and the closest (e.g., minimum Euclidean distance) to ut is updated. Specifically, if the
minimum distance bt is smaller than a certain threshold value, named the vigilance, ρ, the
update is performed as in Online kMeans (see Eq.(3)). Otherwise, a new center ck+1
representing the corresponding input ut is added in the model (see Eq.(3)). It is worth
noting that the vigilance threshold refers to the criterion of considering two patterns
equivalent or not during the learning phase of a classifier. As it will be shown, the value of
vigilance is considered essential in obtaining high values of corrected classified patterns.
The following equations are adopted in each update step of ART:
k
uc
min 
1
()
⎩
⎨
⎧
−+=
>←
c
−=−=
+
=
otherwise
b
if
b
itii
ttk
tl
l
tit
cuc
uc
uc
η
ρ
 
1
(3)
3 Context Representation
Several approaches have been proposed in order to represent the movement history (or
history) of a mobile user [15]. We adopt a spatiotemporal history model in which the
movement history is represented as the sequence of 3D points (3DPs) visited by the
moving user, i.e., timestamped trajectory points in a 2D surface. The spatial attributes in
that model denote latitude and longitude.
Let e = (x, y, t) be a 3DP. The user trajectory u consists of several timeordered 3DPs, u
= [ei] = [e1, …, eN], i = 1, …, N and is stored in the system’s database. It holds that t(e1) <
t(e2) < … < t(eN), i.e., timestamped coordinates. The x and y dimensions denote the
latitude and the longitude while t denotes the time dimension (and t(⋅) returns the time
coordinate of e). Time assumes values between 00:00 and 23:59. To avoid state
information explosion, trajectories contain timestamped points sampled at specific time
instances. Specifically, we sample the movement of each user at 1.66⋅103 Hertz (i.e., once
every 10 minutes). Sampling at very high rates (e.g., in the order of a Hertz) is
meaningless, as the derived points will be highly correlated. In our model, u is a finite
sequence of N 3DPs, i.e., u is a 3·N dimension vector. We have adopted a value of N = 6
for our experiments meaning that we estimate the future position of a mobile terminal from
a movement history of 50 minutes (i.e., 5 samples). Specifically, we aim to query the
system with a N1 3DP sequence so that our classifier / predictor returns a 3DP, which is
the predicted location of the mobile terminal.
A cluster trajectory c consists of a finite number of 3DPs, c = [ei], i = 1, …, N stored in
the knowledge base. Note that a cluster trajectory c and a user trajectory u are vectors of
the same length N. This is because c, which is created from ART based on unseen user
trajectories, is a representative itinerary of the user movements. In addition, the query
trajectory q consists of a number of 3DPs, q = [ej], j = 1, …, N1. It is worth noting that q
Page 6
is a sequence of N1 3DPs. Given a q with a N1 history of 3DPs we predict the eN of the
closest c as the next user movement.
4 Mobility Prediction Model
From the ML perspective the discussed location prediction problem refers to an m+l model
[13]. In m+l models we have m steps of user movement history and we want to predict the
future user movement after l steps (the steps have timestamped coordinates). In our case,
m = N1, i.e., the query trajectory q, while l = 1, i.e., the predicted eN. We develop a new
spatiotemporal classifier (C) which given q can predict eN. Specifically, q and c are
trajectories of different length thus we use a Hausdorfflike measure for calculating the q
 c distance. Given query q, the proposed classifier C attempts to find the nearest cluster c
in the knowledge base and, then, take eN as the predicted 3DP. For evaluating C, we
compute the Euclidean distance between the predicted 3DP and the actual 3DP (i.e., the
real user movement). If such distance is greater than a preset error threshold θ then
prediction is not successful. After predicting the future location of a mobile terminal, the C
classifier receives feedback from the environment considering whether the prediction was
successful or not, and reorganize the knowledge base accordingly [14]. In our case, the
feedback is the actual 3DP observed in the terminal’s movement. Thus the C classifier
reacts with the environment and learns new patterns once an unsuccessful prediction takes
place.
Specifically,
? in case of an unsuccessful prediction, the C appends the actual 3DP to q and updates
(i.e., learns) such extended sequence in the model considering as new knowledge, i.e.,
an unseen user movement behavior.
? in the case of a successful prediction, C dos not need to learn. A successful
prediction refers to a wellestablished prediction model for handling unseen user
trajectories.
The heart of the proposed C classifier is the ART algorithm. ART clusters unseen user
trajectories to existing cluster trajectories or creating new cluster trajectories depending on
the vigilance value. ART is taking the u1 pattern from the incoming set U of patterns and
stores it as the c1 cluster in the knowledge base. For the tth unseen user trajectory the
following procedure is followed (see Table I): The algorithm computes the Euclidean
distance bt between ut and the closest ci. If bt is smaller than the vigilance ρ then ci is
updated from ut by the η factor. Otherwise, a new cluster cj ≡ ut is inserted into the
knowledge base. The ART algorithm is presented in Table I.
Table I. The ART Algorithm for the C classifier
1. j ← 1
2.
cj ← uj
3.
For (ut ∈ U) Do
4. bt = cj – ut = minl=1,…,jcl – ut
5. If (bt > ρ) Then /*expand knowledge*/
Page 7
6.
7.
8.
9.
10.
j ← j + 1
cj ← ut
Else
cj ← cj + η(ut – cj) /*update model locally*/
End If
End for
Let T, P be subsets of U for which it holds that T ⊆ P ⊆ U. The T set of patterns is used
for training the C classifier, that is, C develops a knowledge base corresponding to the
supervised training method. The P set is used for performing online predictions. We
introduce the CT classifier version, which is the C classifier trained with the T set. In
addition, once the T set is null then the C classifier is not trained beforehand corresponding
to the zeroknowledge training method and performs online prediction with the set P. In
this case, we get the CnT classifier corresponding to the C classifier, when the training
phase is foreseen.
Moreover, in order for the C classifier to achieve prediction, an approximate Hausdorff
like metric [5] is adopted to estimate the distance between q and c. Specifically, the
adopted formula calculates the pointtovector distance between ej ∈ q and c, δ’(ej, c), as
follows:
(
,'
ij
fce
−=δ
where  .  is the Euclidean norm for fi ∈ c and ej. The δ’(ej, c) value indicates the
minimum distance between ej and fi w.r.t. the time stamped information of the user
itinerary, that is the Euclidean distance of the closest 3DPs in time. Hence, the overall
distance between the N1 in length q and the N in length c is calculated as
1
),(
1
δ
)
)()(min

ji
i
ttj
ef
f
e
−
∑
∈
e
j
−
−
=
q
cecq
jΝ
N
),( '
1
δ
(4)
Figure 1 depicts the process of predicting the next user movement considering the
proposed C classifier. Specifically, once a query trajectory q arrives, then C attempts to
classify q into a known ci in the knowledge base w.r.t. Hausdorff metric. The C classifier
returns the predicted eN ∈ ci of the closest ci to q. Once such result refers to an
unsuccessful prediction w.r.t. a preset error threshold θ then the CT (or the CnT) extend
the q vector with the actual 3DP and insert q into the knowledge base for further learning
according to the algorithm in Table I (feedback).
Page 8
q
runtime
(Ν1 steps)
Classification
q ← q + actual 3DP
Prediction eN(threshold θ)
«failure»
return eN
«success»
C
CT or CnT
Data Base (U)
Knowledge Base (C)
ART
feedback
Fig. 1. The proposed adaptive classifier for location prediction.
5 Prediction Evaluation
We evaluated our adaptive model in order to assess its performance. In our experiments,
the overall user movement space has a surface of 540 km2. Such space derives from real
GPS trace captured in Denmark [6]. The GPS trace was fed into our model and the
performance of the C system w.r.t. predefined metrics was monitored. Table II indicates
the parameters used in our experiments.
Table II. Experimental Parameters
Parameter Value
Learning rate (n) 0.5
Comment
In case of a new pattern ut, the
closest cluster ci is moved
toward ut by half the spatial
and temporal distance.
Two 2D points are considered
different if their spatial distance
exceeds 100 meters.
Two timestamps are
considered different if their
temporal distance exceeds 10
minutes.
The predicted location falls
within a circle of radius 10
Spatial coefficient of
vigilance (ρs)
100 m
Temporal coefficient
of vigilance (ρt)
10 min
Precision threshold /
location accuracy (θ)
10 m
Page 9
meters from the actual
location.2
The GPS traces including 1200 patterns were preprocessed and we produced two
training files and two test files as depicted in Figure 2. The first training file, TrainA, is
produced from the first half of the GPS trace records. The second training file, TrainB,
consists of a single trace record. The first test file, TestA, is produced from the entire set of
trace records, including in ascending order the first half of the GPS traces and the other
half of unseen traces. Finally the second test file, TestB, is produced from the entire set of
the GPS trace records, including in ascending order the second half of unseen traces and
the first half of the GPS traces. During the generation of the training/test files, white noise
was artificially induced into the trace records.
GPS Pattern Instances (i.e., the U set)
u1
u600
u1200
TrainA = {u1, …, u600}
TrainB = {u1}
TestA = {u1, …, u600, u601, …, u1200}
TestB = {u601, …, u1200, u1, …, u600}
Fig. 2. The generated GPS trace files for experimentation.
We have to quantitatively and qualitatively evaluate the proposed model. For that
reason, we introduce the following quantitative and qualitative parameters: (a) the
precision achieved by the prediction scheme –the higher the precision the more accurate
the decisions on the future user location (b) the size of the underlying knowledge base –
we should adopt solutions with the lowest possible knowledge base size (such solutions are
far more efficient and feasible in terms of implementation) and (c) the capability of the
model to rapidly react to changes in the movement pattern of the user/mobile terminal and
readapt. We define precision, p, as the fraction of the correctly predicted locations, p+,
against the total number of predictions made by the C system, ptotal, that is,
total
p
p
p
+
=
In the following subsections, we evaluate the diverse versions of the C classifier w.r.t.
training methods by examining the classifier convergence (speed of learning and
adaptation) and the derived precision on prediction future locations.
5.1 Convergence of CT and CnT
The C classifier converges once the knowledge base does not expand with unseen patterns,
i.e., the set U does not evolve. In Figure 3, we plot the number of the clusters, U, that are
generated from the CT/nT models during the testing phase. The horizontal axis denotes
the incoming (timeordered) GPS patterns. The point (.) marked line depicts the behavior
of the CT1 model trained with TrainA and tested with TestA. In the training phase, the
2 Such accuracy level is considered appropriate for the kind of applications where location prediction can be
applied (see the Introduction section or [12]).
Page 10
first 600 patterns of TrainA have gradually generated 70 clusters in U. In the testing phase,
the first 600 patterns are known to the classifier so there is no new cluster creation. On the
other hand, in the rest 600 unseen patterns, the number of clusters scales up to 110
indicating that the ART algorithm learns such new patterns.
The circle (o) marked line depicts the CT2 model, which is trained with TrainA and
tested with TestB. Since the train file is the same as in the CT1 model, the first generated
clusters are the same in number (U = 70). In the testing phase, we observe a significant
difference. ART does not know the second 600 unseen patterns, thus, it learns new patterns
up to 110 clusters. In the next 600 known patters, CT2 does not need to learn additional
clusters thus it settles at 110 clusters.
0200400 6008001000 120014001600 1800
20
40
60
80
100
120
Time ordered patterns
Number of generated clusters, C
CT1
CT2
CnT
Fig. 3. Convergence of CT/nT.
We now examine the behavior of the CnT model corresponding to the zeroknowledge
training method. The asterisk (*) marked line depicts the training phase (with TrainB)
followed by the testing phase (with TestA) of CnT. In this case, we have an incremental
ART that does not need to be trained. For technical consistency reasons, it only requires a
single pattern, which is the unique cluster in the knowledge base at the beginning. In the
testing phase, for the first 600 unseen patterns of TestA we observe a progressive cluster
creation (up to 45 clusters). For the next 600 unseen patterns, we also observe a gradual
cluster creation (up to 85 clusters) followed by convergence. Comparing the CT1/2 and
CnT models, the latter one achieves the minimum number of clusters (22.72% less storage
cost). This is due to the fact that CnT starts learning only from unsuccessful predictions in
an incremental way by adapting preexisting knowledge base to new instances.
Nevertheless, we also have to take into account the prediction accuracy in order to reach
safe conclusions about the efficiency and effectiveness of the proposed models.
5.2 Precision of CT and CnT
Page 11
In Figure 4 we examine the precision achieved by the algorithms. The vertical axis depicts
the precision value p achieved during the testing phase. The point (.) marked line depicts
the precision of the CT1 model trained with TrainA and tested with TestA. During the
test phase, for the first 600 known patterns CT1 achieves precision value ranging from
97% to 100%. In the next 600 unseen patterns, we observe that for the first instances the
precision drops smoothly to 95% and as CT1 learns, i.e., learn new clusters and optimize
the old ones, the precision converges to 96%.
0 20040060080010001200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Time ordered patterns
Precision value, p
CT1
CT2
CnT
Fig. 4. Precision of CT/nT.
The circle (o) marked line depicts the precision behavior for the CT2 model tested with
TestB and trained with TrainA. With the first 600 totally unseen patterns during the test
phase, CT2 achieves precision from 26% to 96%. This indicates that the model is still
learning during the test phase increasing the precision value. In the next 600 known
patterns, the model has nothing to learn and the precision value converges to 96%.
The asterisk (*) marked line depicts the precision behavior of the CnT model tested
with TestA and trained with TrainB. In this case, CnT is trained with only one pattern
instance, i.e., the algorithm is fully incremental, thus, all the instances are treated as
unseen. In the test phase, for the first 600 patterns, the model achieves precision, which
ranges from 25% to 91% In the next 600 patterns, we can notice that for the first instances
the precision drops smoothly to 88% and as the model learns, precision gradually
converges to 93%.
Evidently, the adoption of the training method, i.e., the CT1/2 models, yields better
precision. However, if we correlate our findings with the results shown in Figure 3, we
infer that a small improvement in precision has an obvious storage cost. Specifically, we
need to store 110 clusters, in the case of CT, compared to 85 clusters in the case of CnT
(22.72% less storage cost). Furthermore, the user movement patterns can be changed
repeatedly over time. Hence, by adopting the training method, one has to regularly train
and rebuild the model. If the mobile contextaware application aims at maximizing the
supported quality of service w.r.t. precision, while keeping the storage cost stable, the C
nT model should be adopted.
Page 12
6 Comparison with other Models
We compare the CnT model with other known models that can be used for location
prediction. Such models implement the Offline kMeans and Online kMeans algorithms.
Such models require a predefined number of k > 1 initial clusters for constructing the
corresponding knowledge base. We should stress here that, the greater the k the greater the
precision value achieved by Offline/Online kMeans. In our case, we could set k = 110,
which is the convergence clustercount for the C models (Section V). For CnT, we use
TrainB for the training and TestA for the testing phase (such model adopts the zero
knowledge training method). Moreover, for the Offline/Online kMeans models we use
TrainA for the training and TestA for the testing phase because both models require k > 1
initial clusters.
Figure 5 depicts the precision achieved by the CnT (the point (.) marked line), Offline
kMeans (the asterisk (*) marked line) and Online kMeans (the circle (o) marked line)
models. The horizontal axis represents the ordered instances and the vertical axis
represents the achieved precision. We can observe in the first 600 patterns CnT achieves
precision levels ranging from 25% to 91% indicating adaptation to new knowledge. This is
attributed to the learning mechanism (CnT recognizes and learns new user movements).
In the next 600 patterns we notice that for the first instances, the precision drops smoothly
to 88% and as the knowledge base adapts to new movements and optimizes the existing
ones, precision converges to 93%.
In the case of Offline kMeans, we observe that for the first 600 patterns, it achieves
precision levels ranging from 96% to 98% once the initial clusters are set to k = 110. In the
next 600 patterns we notice that the precision drops sharply and converges to 57% as the
knowledge base is not updated by unseen user movements. By adopting Online kMeans we
observe that for the testing phase (the first 600 patterns) it achieves precision levels
ranging from 94% to 97% given the train file TrainA. In the next 600 patterns we notice
that for the first instances the precision drops rather smoothly to 86% and, as the
knowledge base is incrementally adapting to new patterns, the precision value converges to
65%. Evidently, by comparing such three models, the most suitable model for location
prediction is the CnT since (i) it achieves greater precision through model adaptation and
(ii) requires a smaller size of the underlying knowledge base (i.e., less clusters) than the
Offline/Online kMeans models.
Page 13
0 20040060080010001200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Time ordered patterns
Precision value, p
CnT
Online kMeans
Offline kMeans
Fig. 5. Comparison of CnT with the Offline/Online kMeans models.
Fig. 6. The behavior of the γ parameter vs. temporal and spatial coefficients of the vigilance
threshold.
Up to this point we have concluded that the CnT model achieves good precision with
limited memory requirements, which are very important parameters for mobile context
aware systems. However, we need to perform some tests with CnT in order to determine
the best value for the spatiotemporal parameter vigilance ρ. In other words, we aim to
determine the best values for both spatial ρs and temporal ρt vigilance coefficients in order
to obtain the highest precision with low memory requirements. We introduce the weighted
sum γ as follows:
γ = w ⋅ p + (1  w) ⋅ (1 – a)
where a is the proportion of the generated clusters by the classifier (i.e., the size of the
knowledge base in clusters) out of the total movement patterns (i.e., the size of the
database in patterns), that is: a = C/U; C is the cardinality of the set C. The weight value
0.7
0.75
0.8
0.85
0.9
0.95
1
100200 300400 500
Value of ps(in meters)
6007008009001000
Value of ?
pt= 10 min
pt= 20 min
pt= 30 min
pt= 60 min
Page 14
w ∈ [0, 1] indicates the importance of precision and memory requirements; a value of w =
0.5 assigns equal importance to a and p. In our assessment, we set w = 0.7. We require that
a assumes low values minimizing the storage cost of the classifier. A low value of a
indicates that the applied classifier appropriately adopts and learns the user movements
without retaining redundant information. The value of γ indicates which values of ρs and ρt
maximize the precision while, on the same time, minimize the memory requirements.
Hence, our aim is to achieve a high value of γ indicating an adaptive classifier with high
value of precision along with low storage cost. As illustrated in Figure 6, we obtain a
global maximum value for γ once ρs = 100m and ρt = 10min (which are the setting values
during the experiments – see Table II).
7 Prior Work
Previous work in the area of mobility prediction includes the model in [7], which uses
Naïve Bayes classification over the user movement history. Such model does not deal with
fully / semi random mobility patterns and assumes a normal density distribution for the
underlying data. However, such assumptions are not adopted in our model as long as
mobility patterns refer to real human traces with unknown distribution. Moreover, the
learning automaton in [8] follows a linear rewardpenalty reinforcement learning method
for location prediction. However, such model does not provide satisfactory prediction
accuracy, as reported in [8]. The authors in [9] apply evidential reasoning in mobility
prediction when knowledge on the mobility patterns is not available (i.e., similarly to this
paper). However, such model assumes large computational complexity (due to the adopted
DempsterSchafer algorithm) once the count of possible user locations increases and
requires detailed user information (e.g., daily profile, preferences, favorite meeting places).
Other methods for predicting trajectory have also been proposed in the literature [10] but
these have generally been limited in scope since they consider rectilinear movement
patterns only (e.g., highways) and not unknown patterns. A closely related work to ours
has been reported in [11], where a GPS system is used to collect location information. The
proposed system then automatically clusters GPS data taken into meaningful locations at
multiple scales. These locations are then incorporated into a similar Markov model to
predict the user’s future location. The authors in [16] adopt a data mining approach (i.e., rule
extraction) for predicting user locations in mobile environments. This approach achieves
prediction accuracy lower than ours (i.e., in the order of 80% for deterministic movement). In
[17], the authors adopt a clustering method for the location prediction problem. Prediction
accuracy is still low (in the order of 66% for deterministic movement). The authors in [18]
introduce a framework where for each user an individual function is computed in order to
capture its movement. This approach achieves prediction accuracy lower than ours (i.e., in the
order of 70% for deterministic movement). In [19], the authors apply movement rules in
mobility prediction given the user’s past movement patterns. Prediction accuracy is still low
(i.e., in the order of 65% for deterministic movement). The authors in [20] introduce a
prediction model that uses grey theory (i.e., a theory used to study uncertainty). This approach
achieves prediction accuracy lower than ours (i.e., in the order of 82% for deterministic
movement).
Page 15
8 Conclusions
We presented how ML techniques can be applied to the engineering of mobile context
aware applications for location prediction. Specifically, we use ART (a special Neural
Network Local Model) and introduce a learning method. Furthermore, we deal with two
training methods for each learning method: in the supervised method the model uses
training data in order to make classification and in the zeroknowledge method the model
incrementally learns from unsuccessful predictions. We evaluated our models with
different spatial and temporal parameters. We examine the knowledge bases storage cost
(i.e., emerged clusters) and the precision measures (prediction accuracy). Our findings
indicate that the CnT model suits better to contextaware systems. The advantage of CnT
model is that (1) it does not require preexisting knowledge in the user movement behavior
in order to predict future movements, (2) it adapts its online knowledge base to unseen
patterns and (3) it does not consumes much memory to store the emerged clusters. For this
reason, CnT is quite useful in contextaware applications where no prior knowledge about
the user context is available. Furthermore, through experiments, we decide on which
vigilance value achieves the appropriate precision w.r.t. memory limitations and prediction
error. Finally, in the Neural Networks Local Models literature there are other models (e.g.,
SelfOrganizing Maps) that we have not examined in this paper. We intent to implement
and evaluate them with CnT by means of knowledge base requirements, precision of the
location prediction and adaptation.
References
1.
2.
A. Dey, Understanding and using context, Personal and Ubiquitous Computing, 5(1), pp. 47, 2001.
J. Hightower, G. Borriello, Location Systems for Ubiquitous Computing, IEEE Computer, 34(8), August,
2001.
E. Alpaydin, Introduction to Machine Learning, The MIT Press, 2004.
R. Duda, P. Hart, D. Stork, Pattern Classification, WileyInterscience, 2001.
E. Belogay, C. Cabrelli, U. Molter, and R. Shonkwiler, Calculating the Hausdorff Distance between Curves,
Information Processing Letters, vol.64, no. 1, pp. 1722, 1997.
Site: http://www.openstreetmap.org/traces/tag/Denmark
S. Choi, K. G. Shin, Predictive and adaptive bandwidth reservation for handoffs in QoSsensitive cellular
networks, ACM SIGCOMM, 1998.
S. Hadjiefthymiades, L. Merakos, Proxies+Path Prediction: Improving Web Service Provision in Wireless
Mobile Communications, ACM/Kluwer Mobile Networks and Applications, Special Issue on Mobile and
Wireless Data Management,8(4), 2003.
A. Karmouch, N. Samaan, A Mobility Prediction Architecture Based on Contextual Knowledge and Spatial
Conceptual Maps, IEEE Trans. on Mobile Computing, 4(6), 2005.
10. R. Viayan, J. Holtman, A model for analyzing handoff algorithms, IEEE Trans. on Veh. Technol., 42(3),
Aug. 1993.
11. D. Ashbrook and T.Starner, Learning Significant Locations and Predicting User Movement with GPS, Proc.
Sixth Int’l Symp. Wearable Computes (ISWC 2002), pp. 101108, Oct. 2002.
12. I. Priggouris, E. Zervas, and S. Hadjiefthymiades, "Location Based Network Resource Management", in
"Handbook of Research on Mobile Multimedia" (editor: Ismail Khalil Ibrahim), Idea Group Inc., May 2006.
13. K.M.Curewitz, P. Krishnan, and J.S.Vitter, Practical Prefetching via Data Compression, Proceedings of
ACM SIGMOD, 1993, pp. 257266.
3.
4.
5.
6.
7.
8.
9.