Content uploaded by Amilcar Soares

Author content

All content in this area was uploaded by Amilcar Soares on Apr 30, 2018

Content may be subject to copyright.

A semi-supervised approach for the semantic

segmentation of trajectories

Amilcar Soares J´

unior ∗, Val´

eria Times †, Chiara Renso ‡, Stan Matwin ∗and Luc´

ıdio A. F. Cabral§

∗Institute for Big Data Analytics, Dalhousie University, Canada

†Centro de Inform´

atica (CIn), Federal University of Pernambuco, Brazil

‡ISTI-CNR, Italy

§Centro de Inform´

atica (CI), Federal University of Para´

ıba, Brazil

¶Polish Academy of Sciences, Warsaw, Poland

Email: amilcar.soares@dal.ca, vct@cin.ufpe.br, chiara.renso@isti.cnr.it, stan@cs.dal.ca, lucidio@di.ufpb.br

Abstract—A ﬁrst fundamental step in the process of analyzing

movement data is trajectory segmentation, i.e., splitting trajecto-

ries into homogeneous segments based on some criteria. Although

trajectory segmentation has been the object of several approaches

in the last decade, a proposal based on a semi-supervised

approach remains inexistent. A semi-supervised approach means

that a user labels manually a small set of trajectories with

meaningful segments and, from this set, the method infers in

an unsupervised way the segments of the remaining trajecto-

ries. The main advantage of this method compared to pure

supervised ones is that it reduces the human effort to label

the number of trajectories. In this work, we propose the use of

the Minimum Description Length (MDL) principle to measure

homogeneity inside segments. We also introduce the Reactive

Greedy Randomized Adaptive Search Procedure for semantic Semi-

supervised Trajectory Segmentation (RGRASP-SemTS) algorithm

that segments trajectories by combining a limited user labeling

phase with a low number of input parameters and no predeﬁned

segmenting criteria. The approach and the algorithm are pre-

sented in detail throughout the paper, and the experiments are

carried out on two real-world datasets. The evaluation tests prove

how our approach outperforms state-of-the-art competitors when

compared to ground truth.

Keywords-Trajectory segmentation; Semantic annotation; Se-

mantic trajectory; Semi-supervised learning;

I. INTRODUCTION

Research1on trajectory management and analysis is a

broad and mature area [17] since positioning devices are

now commonly used to track people [5,22], vessels [12],

and animals [9]. These devices produce trajectory samples

representing the object movement as a discrete collection of

spatiotemporal points, or samples. An important step that is

a prerequisite to several analysis tasks on these tracks is

the trajectory segmentation [15,19]. Segmenting a trajectory

means splitting the spatiotemporal sequence of points into

segments based on some properties or criteria that identify a

similar behavior in the segment. Examples of segment splitting

criteria are based on the temporal component like the day of

the week or based on whether the object is moving or not,

1This work has been submitted to the IEEE for publication. Copyright

may be transferred without notice, after which this version may no longer be

accessible.

thus identifying the stop segments from the move segments

[19]. Segmenting a trajectory with clear criteria is a ﬁrst step

to semantically enrich trajectories (or semantic annotation), a

process to enrich trajectory parts with meaningful contextual

information [13,15].

The segmentation task is therefore based on methods ca-

pable of distinguishing the homogeneous or similar parts of

a trajectory based on some criteria. We can distinguish two

cases: supervised and unsupervised segmentation. In super-

vised segmentation, the criteria are already known a priori.

This can be implemented with algorithms based on simple

thresholds (e.g., speed) or machine learning techniques that

learn the correct segmentation from a set of labeled segments.

When the segmentation criteria are unknown, the unsupervised

algorithm derives the homogeneity of segments based on some

cost function. Both supervised and unsupervised methods have

complementary beneﬁts and drawbacks.

The supervised methods rely on user-deﬁned rules, labels

or thresholds; therefore the segmentation is user-driven. This

kind of segmentation is particularly suitable for semantic

annotation, thanks to the human labeling phase that can

associate complex semantic labels to trajectory parts (e.g.

activity performed or transportation means). The drawback

is that, in some cases, these criteria are not clear, they may

depend on the characteristics of the trajectory dataset and the

expertise of the domain specialist to correctly label a set of

trajectories and/or conﬁgure the thresholds. Also, obtaining a

high quality labeled trajectory dataset is difﬁcult as it relies

on a huge effort by domain experts and this is one reason

why the supervised methods are not widespread in this ﬁeld.

Unsupervised algorithms, on the contrary, avoid any control

from the user and automatically detect segments using a cost

function which represents the homogeneity of the segments.

Although these algorithms may produce segments with high

homogeneity, they lack semantics and any connection to the

speciﬁc application, making the interpretation task difﬁcult.

Despite the broad spectrum of trajectory segmentation ap-

proaches already proposed in the literature (see for example

[15,17]), there is still a lack of methods attempting to combine

the beneﬁts of both supervised and unsupervised strategies. As

a possible solution to this, a semi-supervised approach to the

segmentation task is proposed in this work.

Semi-supervised means essentially that a user labels man-

ually a small set of trajectories based on some criteria, thus

giving the semantics of the segmentation and, from that, the

method infers, in an unsupervised way, the segmentation of the

remaining part of the trajectory dataset. Such approach offers

a balance between methods which are entirely supervised,

where the user precisely deﬁnes the splitting criteria, and

unsupervised, where the method infers a good splitting based

on a cost function. We observe that when the segmentation is

semantic-based (e.g., representing the activity of the moving

object), in contrast to the geometric-based segmentation (e.g.,

the speed of the object), the need for manually annotated

trajectories is crucial: minimizing the number of these human-

labeled trajectories, as stated in [18], is fundamental to keep

this task feasible.

This paper comes as advancement and extension of a

previous work [8] based on an unsupervised method named

GRASP-UTS (Greedy Randomized Search Adaptive Search

Procedure for Unsupervised Trajectory Segmentation). Com-

pared to that paper, which introduces an unsupervised algo-

rithm for trajectory segmentation, here we propose a new semi-

supervised segmentation algorithm called RGRASP-SemTS

(Reactive Greedy Randomized Adaptive Search Procedure for

Semantic semi-supervised Trajectory Segmentation).

We summarize below the original contributions of this

paper:

•Proposal of the RGRASP-SemTS as a semi-supervised

algorithm to segment trajectories that uses a small set of

labeled trajectory data during the trajectory segmentation

task to drive the unsupervised segmentation of unlabeled

trajectory data.

•Unlike previous related works, RGRASP-SemTS focuses

on performing semantic trajectory segmentation using

features evaluation, non-monotone criteria, semantic an-

notation, cost function and meta-heuristics.

•Description of a feature evaluation step that aims to ﬁnd

the best set of features for increasing the RGRASP-

SemTS’s performance.

•Proof that using labeled data helps speeding up the

RGRASP-SemTS’s performance when compared to our

previous unsupervised algorithm, GRASP-UTS.

The remainder of the paper is organized as follows. Sec-

tion II surveys the related work. Section III shows con-

cept deﬁnitions, terminologies, and theories used in the

proposed solution. Section IV presents the novel semantic

semi-supervised algorithm for trajectory segmentation named

RGRASP-SemTS. Section V presents the metrics and the

results obtained by the novel approach when applied to real

datasets. Finally, Section VI concludes the paper.

II. RE LATED WO RK S

As the interest in the literature is increasing, new methods

to segment trajectories are being proposed. Pioneering work is

the stop and move deﬁnition given by [19] where the segmen-

tation was used to identify the parts of the trajectories where

the object stays still and separate it from the moving parts. We

later come to a broader deﬁnition of semantic trajectory, where

the segments may identify and be annotated not only as stops

and moves, but also with more meaningful and context-aware

labels such as transportation means or activities [2,13,15].

The need to identify segments based on some semantics

fostered the developments of different segmentation methods.

A possible classiﬁcation of these methods is based on the

characteristics of the algorithm: supervised or unsupervised,

application-oriented or general purpose, monotone criteria

or non-monotone criteria and with a predeﬁned number of

segments versus a non-predeﬁned number of segments.

Supervised means that the segmentation criteria are based

on ad-hoc standards and predeﬁned rules. This is the case

when the rules are clear and predeﬁned by experts of the

domain as in works [12,14,22]. The second line of approaches

follows an unsupervised methodology, where no predeter-

mined criteria are imposed in the segmentation process, and

the segment split is based on data properties as in works

[8,10,20]. To the best of our knowledge, no works found in

the trajectory segmentation literature tried to combine both

supervised and unsupervised criteria as we are doing in the

present paper.

Another possible classiﬁcation of segmentation algorithms

is to distinguish between application-oriented and general

purpose methods [10]. Application-oriented algorithms for tra-

jectory segmentation are designed for a speciﬁc purpose and,

consequently, they are difﬁcult to reuse in different domains.

Examples of application-oriented algorithms for trajectory

segmentation are described in [20]–[22]. On the other hand,

the general purpose algorithms for trajectory segmentation are

easily reused in many different domains. Examples of general

purpose algorithms for trajectory segmentation are explained

in [4,8,10,12,14]

Trajectory segmentation algorithms can also be monotone

or non-monotone, and this affects the results of the splitting

task [4]. Indeed, a criterion is monotone if any sub-segment

S0of a segment Salways fulﬁlls the whole segment criterion.

Monotone criteria are found when values of the features fall

within a range or ratio. On the other hand, values computed

from means and standard deviations are non-monotone. Trajec-

tory segmentation algorithms with monotone criteria includes

works [1,4], while the approaches with non-monotone criteria

include [8,10,12,14,22]

Another issue related to the segmentation of trajectories

is the number of segments that must be found. In [10], the

number of segments is given as input to the algorithm, so it

is already predeﬁned, whereas, in [8,12,14,22], the number of

segments is found automatically by the algorithm during its

execution.

The RGRAPS-SemTS algorithm proposed in this paper is

classiﬁed as being semi-supervised, general purpose, non-

monotone and lacking a predeﬁned number of segments.

To the best of our knowledge, none of the segmentation

algorithms proposed in the literature has such classiﬁcation.

III. BASIC CONCEPTS

This section addresses concepts and terminologies used in

this work. A trajectory is a representation of the spatiotemporal

movement of an object. Trajectories are usually collected by

tracking devices into discrete samples and represented as a

sequence of spatiotemporal points [8]:

Deﬁnition 1: Atrajectory sample is a list of spatiotemporal

points τN={tp0, tp1, . . . , tpN}, where tpi= (xi, yi, ti, ωi).

Apoint feature (ωiin Deﬁnition 1, is a set of point features

with ωi={pf0, pf1, . . . , pfA}) is any numeric information

that can be extracted from a trajectory sample and associated

to a spatiotemporal point. A point feature can be acquired

by a geolocation device (e.g., the instantaneous speed) or

calculated using the trajectory sample (e.g., the direction

variation between two consecutive points) and is assigned to

a single point of the trajectory.

Asegment feature is any numeric information computed

from the trajectory sample and associated with a segment

(e.g., average or maximal speed). The difference between point

feature and segment feature is that, while the former is static,

the latter is more dynamic. Once a point feature information

is collected or computed it will not change over time. The

segment features depend on the segment deﬁnition and when

the segment is recomputed by adding or removing points, then

the segment feature has to be recomputed too.

Asemantic label (or semantic annotation) is any additional

semantic and/or contextual information that can be added to a

trajectory segment [15]. Such information can be, for example,

an activity (e.g., walking, studying or driving) or a behavioral

pattern (e.g., foraging or running from a predator). Henceforth,

the term label refers to a semantic label. A trajectory dataset

is called a labeled dataset when its trajectories’ segments have

been annotated with semantic labels. More formally:

Deﬁnition 2: Alabeled trajectory segment sis a sublist

of τand s= (tid, sid, {tpu, . . . , tpv},semantic label,ζsid),

where: (i) tid is the trajectory identiﬁer; (ii) sid is the segment

identiﬁer; (iii) tpu, . . . , tpvrepresent a sublist of τNstarting

from tpuand ending at tpv(1≤u≤v≤N); (iv)

semantic label is the additional information that characterizes

the segment and (v) ζsid is a list of segment features, with

ζsid ={sf0, sf1, . . . , sfB}.

Two more concepts are used in the segmentation algorithm

deﬁnition to refer to the representative points inside a seg-

ment and inside a labeled dataset: the segment landmark in

Deﬁnition 3, already introduced in [8] and semantic landmark

in Deﬁnition 4, introduced here for the ﬁrst time:

Deﬁnition 3: Asegment landmark lmris a representative

point of a trajectory sample τN, where lmr=tpsand 1≤

r≤s≤N, used to represent a trajectory segment in terms

of its point features.

Asegment landmark is a point inside the segment that is

chosen to represent the whole segment: it is used as reference

point to characterize the behavior of a part of the trajectory.

After deﬁning a set of segment landmarks, it is possible to

create segments by partitioning the points in the neighborhood

where each segment landmark is deﬁned (i.e. the trajectory’s

consecutive points respecting a time constraint). The decision

of which point to choose as a segment landmark depends on

a cost function that should be optimized.

Deﬁnition 4: Asemantic landmark sem lm =

(semantic label, ωA, ζB)is a set that represents a pattern

extracted from a labeled dataset consisting of: (i) a semantic

label; (ii) ωAis a list of point features values; and (iii) a list

ζBof segment features values.

Differently from segment landmarks that are decided by the

cost function, the semantic landmarks are computed from a set

of examples given by the user. Example of semantic landmarks

are segments labeled as ﬁshing or not ﬁshing for vessels or

foraging and traveling for animals.

IV. SEMANTIC AND SEMISUPERVISED TRAJECTORY

SE GM ENTATION

This section introduces the novel RGRASP-SemTS process

for semantic and semi-supervised trajectory segmentation.

This process is summarized in Figure 1 where we specify the

process tasks and their input and output. Starting with a set of

trajectories, the domain expert labels a subset of them using

some criteria (e.g., the activity performed). The ﬁrst step of the

process is the features evaluation step where the features that

will be used in the learning phase are generated and selected

for a particular domain. This step is detailed later in Section

IV-A. The second task is the actual segmentation performed by

RGRASP-SemTS. The input parameters of RGRASP-SemTS

are: (i) a set of labeled segment examples; (ii) a reactive

proportion (rp) to update internal list of parameters values

for minT ime and α, and (iii) the maximal number of itera-

tions (max it) to execute RGRASP-SemTS over a trajectory

sample. This algorithm and the input parameters are detailed

in Section IV-B. Finally, the output of RGRASP-SemTS is a

set of semantically segmented trajectories produced in a semi-

supervised way by considering both the examples provided by

the user (supervised phase) and the similarities computed by

the algorithm in the neighborhood of the segments (unsuper-

vised phase).

A. Feature evaluation

The feature evaluation follows two steps: (i) the feature

generation and (ii) the selection of a subset of features enabling

the algorithm to achieve its best performance. In the features

generation step, the objective is to create as many features as

possible to characterize the behavior of the moving object for

each dataset. The features created for each dataset used in the

experiments of this work are detailed in Section V-B.

As the number of features can grow very fast, it is necessary

to select the most representative point and segment features

to perform the segmentation, and this is the second step of

features evaluation. In this work, the Weka package and the

ﬁltering χ2algorithm [11] were used to select the features. The

χ2feature selection algorithm evaluates the value of a feature

by computing the value of the χ2statistic concerning the label.

Travelling Foraging

Labelled Segments Features Evaluation

RGRASP-SemTS

ForagingTravelling TravellingForaging

Semantically Segmented Trajectories

Feature Generation

Feature Selectio n

Trajectory

Samples

Traj ec tor ies

Dataset

rp, max_it

Fig. 1. The semantic and semi-supervised trajectory segmentation process

of RGRASP-SemTS

The advantages of this method are that it is fast, scalable and

independent from the chosen segmentation algorithm.

B. The RGRASP-SemTS Segmentation Algorithm

Semi-supervised strategies take advantage of both unsuper-

vised and supervised strategies, thus exploiting labeled and

unlabeled data. In fact, we want to achieve homogeneity

inside a segment using an unsupervised strategy (segment

landmarks), while obtaining a degree of similarity between the

segments using a supervised strategy on a labeled dataset (se-

mantic landmarks). The properties we want to optimize in the

segments (e.g., minimal distortion and maximal compression),

previously presented in [8], and the extended cost functions

we use in this work are discussed below.

1) Desirable properties and Cost function deﬁnition: In this

work, achieving minimal distortion in a trajectory segmen-

tation task is to achieve as much homogeneity as possible

inside the trajectory segments regarding its point and segment

features. On the other hand, achieving maximal compression

for the trajectory segments means that the resulting segments

(i.e. number of segment landmarks) should be as few as

possible and as similar as possible to a semantic landmark

extracted from the labeled dataset.

The concepts of maximal compression and minimal distor-

tion are inversely correlated since when one increases, the

other decreases. For example, selecting all the spatiotemporal

points as segment landmarks naturally decrease the minimal

distortion increasing the maximal compression since many

segment landmarks will be chosen. Conversely, choosing one

point as a segment landmark for the entire trajectory lowers

maximal compression, but increases the distortion produced

by the segmentation since a single segment landmark will be

compared to all the points of the trajectory. As these concepts

are inversely correlated, it is necessary to deﬁne a function

that represents the trade-off between them. We propose to use

the MDL principle to compute this trade-off as detailed below.

To achieve homogeneity inside a trajectory segment we use

Equation 1 which represents a Euclidean distance between the

trajectory point features (ωA) of two points tp1and tp2. In

Equation 1, ω1irepresents the i-th point feature value of tp1

and ω2ithe i-th point feature value of tp2.

simtf (ω, ω) = v

u

u

t

A

X

i=

(ωi−ωi)(1)

The segment cohesiveness is shown in Equation 2 and

measures the similarity between the point features of a chosen

segment landmark ωlm and all the points between tpuand tpv

(ωk).

SCohe(ωlm,{ωu, . . . , ωv}) =

v

X

k=u

simtf (ωlm, ωk)(2)

As segment feature is a new concept deﬁned in this work

and as it is necessary to compare the segment features of

different segments, Equation 3 deﬁnes a similarity between

the segment features (ζ) of two segments s1and s2, where ζ1i

represents the i-th segment feature of s1and ζ2irepresents the

i-th segment feature of s2.

simsf (ζ1, ζ2) = v

u

u

t

B

X

i=1

(ζ1i−ζ2i)2(3)

In the MDL theory, the best model is the one that minimizes

the result of L(H)+ L(D|H)[6]. In this work, the hypothesis

Hconsists of choosing an optimal set of segment landmarks

that are more similar to the semantic landmarks included in the

labeled dataset and also contains high homogeneity rates in its

neighborhood. Finding the optimal set of segment landmarks

reﬂects the decision of ﬁnding the best hypothesis according

to the MDL principle. It is crucial also to consider the use of

unsupervised and supervised data in both L(H)and L(D|H).

Given a trajectory sample τN= (tp, tp, ..., tpN), a set

of segment landmarks φT={lm, lm, ..., lmT}, a set of

semantic landmarks λV={s lm, s lm, ..., s lmV}, and

a set of trajectory segments θT={s, s, ..., sT}, the cost

function is formally deﬁned as follows.

The cost of the hypothesis (L(H)) is computed by compar-

ing, regarding their point features: (i) the chosen consecutive

segment landmarks (unsupervised measure); and (ii) each

segment landmark to the most similar semantic landmark

found in the labeled dataset (supervised measure). Equation

4a represents the cost of encoding a hypothesis of a trajectory

sample τNwhen a set φTof segment landmarks are chosen.

If Tis equal to 1, L(H) = 0. Otherwise, Equation 4a is used.

The max represents the maximum possible similarity between

two trajectory features. The ﬁrst part of Equation 4a repre-

sents the unsupervised measure (Equation 4b) that takes into

account the unlabeled data. This value will decrease when the

consecutively chosen segment landmarks are dissimilar; hence,

this Equation identiﬁes less similar movement behaviors by

comparing the current segment to the next one. The second

part of the L(H)function (Equation 4c), which stands for

the supervised measure, computes the similarity between the

chosen segment landmark and the closest semantic landmark

regarding their point features.

L(H) =L(H)Uns (φT) + L(H)S up(φT, λV)(4a)

L(H)Uns = log2(1 +

T−1

X

j=1 max −simtf (ωlmj, ωlmj+1 ))(4b)

L(H)Sup = log2(1 +

T

X

j=1 arg min

i∈[1,V ]simtf (ωlmj, ωs lmi))(4c)

The L(D|H)cost function, representing the cost of bits

for encoding a dataset D when testing a hypothesis H, is

deﬁned in Equation 5a. The cost of encoding a dataset given

a hypothesis (L(D|H)) is computed by comparing: (i) the

segment cohesiveness (Equation 2) between all the points

of each segment ({ωu, ..., ωv}k) and its respective segment

landmark (ωlmk) in terms of point features values; and (ii) the

segment features similarity (Equation 3) between each segment

found and the more similar semantic landmark. The ﬁrst part

of Equation 5a is the unsupervised measure (Equation 5b) that

will compare the chosen segment landmark of each segment

with all point features’ values inside this segment. This value

increases when fewer segments are found and decreases when

more segments are found. The supervised measure of our

L(D|H)is shown in Equation 5c. Each segment θTfound

is compared to all semantic landmarks of the set λVand

the closest one is selected. This is the part of the cost

function where semantic enrichment is performed. Since the

unsupervised measure of L(D|H)(Equation 5b) compares all

the point features’ values of each segment with the respective

segment landmark, it is necessary to multiply this similarity

by the number of points located inside this segment (|sk|)

and subtract one, since the segment landmark is a point of

the segment and the similarity between them is equal to 0. If

such approach is not adopted, the L(D|H)value will consider

the unsupervised measure more costly when compared to the

supervised measure since greater similarity costs would be

computed.

L(D|H) =L(D|H)Uns (θT) + L(D|H)Sup (φT, λV)(5a)

L(D|H)Uns = log2(1 +

T

X

k=1

SCohe(ωlmk,{ωu,...,ωv}k)) (5b)

L(D|H)Sup = log2(1 +

T

X

k=1 arg min

i∈[1,V ](simsf (ζsli, ζsk)∗ |sk|))(5c)

2) The Algorithm: In this section, we present the algorithm

RGRASP-SemTS. Two issues must be considered: (i) the

number of segments that makes the cost function minimum

is not known a priori; (ii) switching the segment landmarks

implies that the segment conﬁguration will also be affected,

since the cost function value must be recomputed every time

a modiﬁcation in the set of segment landmarks is performed.

To manage these issues, we adopt the Reactive Greedy Ran-

domized Adaptive Search Procedure (RGRASP) meta-heuristic

[16], aiming at determining the number of segments and the

boundaries between the consecutive segments.

Algorithm 1 RGRASP-SemTS

Input: A set of segment examples ψE= (ex1, ex2, ..., exT)

A reactive proportion value rp

A number of iterations max it

Output: A set of semantically enriched segments θT=

(s1, s2, ..., sT)

1: τN⇐read trajectory data;

2: λV⇐extract all semantic landmarks from examples ψE;

3: minT imelist ⇐initialize the minT ime values;

4: αlist ⇐initialize the αvalues;

5: for k= 1 →max it do

6: minT ime ⇐randomly select from minTimelist ;

7: α⇐randomly select from αlist;

8: θT⇐Greedy Randomized Construction Procedure(τ,

minT ime,α,λV);

9: θT⇐Local Search Procedure(θT,minT ime,λV);

10: Update Best Solution(θT, Best θT);

11: if mod(k, rp) == 0 then

12: Update minT imelist and αlist probabilities;

13: end if

14: end for

return Best θT;

RGRASP-SemTS is detailed in Algorithm 1. The trajectory

τNto be segmented is read in Line 1. In Line 2, the algorithm

extracts a semantic landmark for each type of semantic label

that must be found by ﬁnding the average of each point and

segment feature contained in the segment examples consider-

ing the semantic label. Lines 3 and 4 initialize minT imelist

and αlist for possible values for minT ime and αusing

equal width binning. In the case of αlist, the minimum and

maximum values are predetermined and range from 0.1to

0.6. These values were determined because, for values below

0.1, the algorithm chooses the same segment landmarks in

each iteration. For values above 0.6, the segment landmarks

chosen by RGRASP-SemTS were completely random. From

Lines 5 to 14, max it iterations are executed aiming at

building and evaluating different segmentations. Values for α

and minT ime are randomly selected from the lists αlist and

minT imelist (Lines 6 and 7). Then, a ﬁrst feasible solution

(θTsegments) is built by executing the procedure shown in

Algorithm 2 (Line 8). After building the ﬁrst set of feasible

segments θT, Local Search Procedure (Algorithm 3) is applied

to optimize the segments (Line 9) locally. RGRASP-SemTS

(Algorithm 1) veriﬁes if the new set of optimized segments

θTis the best one found by evaluating the cost according to

the MDL function (Line 10). If the cost of these segments is

lower, it updates the set of best segments (Best θT).

The reactive part of this algorithm is concluded by updating

the probabilities of minT imelist and αlist (Lines 11 to 13).

If the modulo (mod) of the multiplication between it and

rp is equal to 0, the probabilities of αlist and minT imelist

are updated using Equation 6 [16]. Equation 6 (a) deter-

mines the probability of selecting a determined value of

αlist or minT imelist. Equation 6 computes the values for

pi(i.e. probability of selecting an element of the αlist or

minT imelist) when all values for qiare established. This is

achieved by dividing each value of qiby the total sum of all

qis.

qi= ( best mdl value f ound

avg. mdl value f or ith element of the list )10 (6a)

pi=qi/

m

X

j=

qj(6b)

Finally, Algorithm 1 returns the best set of semantically

enriched segments (Best θT) found by max it iterations. The

procedure for building the initial solutions (Algorithm 2) of

the RGRASP-SemTS is explained as follows.

Algorithm 2 Greedy Randomized Construction Procedure

Input: A set of points ordered by time τN= (tp1, tp2, ..., tpN)

A minimum time threshold minT ime

An αthreshold to deﬁne the amount of greediness of the

construction algorithm

AλVset of semantic landmarks

Output: A set of semantically enriched segments θT=

(s1, s2, ..., sT)

1: while candidatelist is not empty do

2: RCLlist ⇐add points of candidatelist from index 0 to

RCLsiz e;

3: candidate ⇐randomly select a point from RCLlist ;

4: semanticLandmark ⇐get the most similar semantic

landmark from λVin terms of point features;

5: segment ⇐add candidate;

6: while minT ime threshold condition not satisﬁed do

7: best neighbor ⇐evaluate left and right neighbor;

8: segment ⇐best neighbor;

9: end while

10: θT⇐add segment;

11: remove from candidatelist unfeasible points;

12: sort points of candidatelist according to the closer semantic

landmark in terms of point features;

13: end while

return θT;

Algorithm 2 starts in Line 1 by considering all points as can-

didate segment landmarks (candidatelist). In the initialization

of all GRASP-based algorithms, the creation of a restricted

candidate list (RCL) is necessary. This list is used to manage

the amount of greediness of the initialization method that is

determined by the parameter α. This procedure is executed

until all points in candidateP ointslist are considered land-

marks (Lines 2 to 15). In this work, the RCL is built by sorting

all points inside τaccording to its distance (Equation 1) to the

closest semantic landmark regarding the point features values

(Line 1). The size of the RCL (RCLsize) is determined by

multiplying the size of the candidatelist by the value of α,

thereby creating a variable RCLlist with all points spanning

from the ﬁrst element of the candidatelist to the position

determined in RCLsize. Afterward, a point from the RCLlist

is randomly chosen as candidate segment landmark (Line 3)

and the closest semanticLandmark for this point is chosen

by computing the trajectory features distance (Equation 1)

between this candidate and all semantic landmarks of the

set ψV(Line 4).

A new segment is created in Line 5 with the initial point

being the candidate and its semantic label being the one

determined by the semanticLandmark (Line 8). From Lines

6 to 9, the size of the segment is increased by adding points

to the segment’s neighborhood (respecting the chronological

order) until the time threshold (minT ime) is reached. This

is done by determining the most suitable neighbor in terms

of point features’ values (i.e. segmentu−1and segmentv+1)

of the segment (Line 7) and adding this bestN eighbor to

the segment (Line 8). When segment contains at least the

minT ime threshold, this segment is added to the set θT

(Line 10). After, all points of the segment (Line 10) and

the neighborhoods that could not be used to build a feasible

segment regarding the time constraint (Line 11) are removed

from the candidatelist. After removing all these points, the

candidateP ointslist is re-sorted (Line 12), and the same

procedure is applied until the there are no candidate points

in this list. Subsequently, all points that were not assigned to

a segment are placed in the neighbor segment (segment in the

point’s right or left). From that point on, the algorithm detects

which position, among a set of consecutive points that had not

been assigned to a segment, is the best one - that is, the one

which reduces the cost function. These points are then added

to the respective segment (i.e., a segment on the left or the

right) whose cost function was minimized. Finally, feasible

and semantically enriched θTsegments are returned.

The procedure to locally optimize the initial solution (Algo-

rithm 3) of the RGRASP-SemTS is explained as follows. Line

1 initializes a list of optimized segments named optimized θR

that will be the output of this procedure. Lines 2 and 3

initialize the current segment (c segment) to be analyzed

with all the points contained in θ0and stores this segment’s

semantic label in p sem label.

The objective of lines from 4 to 13 is to merge segments

with the same semantic labels. For all the remaining segments

θT, if the consecutive labels (i.e., labels from the previous

segment and the current) are not equal (Line 5), the algorithm

updates the c segment’s segment landmark (Line 6), adds this

segment to the output list of segments optimized θ (Line 7),

sets the previous semantic label as the actual one in the current

segment (Line 8), and ﬁnally sets c segment as being equal to

θi(Line 9). If the p sem label is equal to the θi’s semantic

label, all points from θiare added to the c segment (Line

10).

From Lines 14 to 19, Algorithm 3 optimizes the MDL based

cost function L(H) + L(D|H). The optimization is carried

Algorithm 3 Local Search Procedure

Input: A set of semantically enriched segments

θT= (s1, s2, ..., sT)

A minimum time threshold minT ime

AψVset of semantic landmarks

Output: A set of optimized semantically enriched segments

optimized θR= (s1, s2, ..., sT)

1: optimized θR⇐ {};

2: c segment ⇐θ0;

3: p sem label ⇐c segmentlabel ;

4: for i= 1 →Tdo

5: if p sem label != θi’s semantic label then

6: update segment landmark of c segment;

7: optimized θ ⇐add c segment;

8: p sem label ⇐c segmentlabel ;

9: c segment ⇐θi;

10: else

11: current segment ⇐insert all points from θi;

12: end if

13: end for

14: for i= 0 →R−1do

15: bpp ⇐Find the best position to partition points between

indexlm1and indexlm2;

16: optimized θi⇐create segment from optimized θi’s ﬁrst

index position to bpp;

17: optimized θi+1 ⇐create segment from bpp + 1 to

optimized θi+1 ’s last index;

18: update segment landmark of optimized θiand

optimized θi+1 ;

19: end for

return optimized θR;

out by ﬁnding the best partitioning position (bpp) between

the consecutive segments on the set optimized θ (Line 15).

This method veriﬁes for all points between consecutive seg-

ment landmarks, which one causes a sharper decrease in the

MDL-based cost function result, and considers this position

as the local optimum for these successive segments. The

optimized θiand optimized θi+1 boundaries in Lines 16

and 17 are updated, as well as their segment landmarks in

Line 18. Finally, a set of optimized θRis returned by this

procedure.

Since RGRASP-SemTS works with distances, the stan-

dardization of the data is a crucial step because features

can have different variances. Indeed, when there is a feature

with a very high variance, the distances computed between

the features will be greater than the distances computed for

features with smaller variances. This difference would impact

the RGRASP-SemTS by raising the cost of the features with

higher variances and decreasing the cost of features with lower

variances. In this work, each feature was normalized using the

well-known statistical method known as standard score. This

method produces a dimensionless number that is obtained by

subtracting a raw value from the mean and then, dividing this

difference by the standard deviation.

At this point, the complexity analysis of the RGRASP-

SemTS is explained. The complexity of the construction

procedure (Algorithm 2) is deﬁned by the while structure

from Lines 2 to 17. This structure has a complexity of

O(N), and it evaluates, for each point from the trajectory

τN, the possibility of it being selected as a segment land-

mark. When these lines are executed, every time a new

segment is created, points from the list of candidate points

(candidateP ointslist) are removed in Lines 14 or 15. At

maximum, all the points from trajectory τNcould be selected

as segment landmarks to generate segments. The complexity

of the local optimization procedure (Algorithm 3) is deﬁned

by the for structure from Lines 13 to 21. Observe that the

θRsegments evaluated by Algorithm 3 contain all the points

from τN. Note also that all the possibilities for partitioning

the Npoints between two consecutive segment landmarks are

analyzed, determining a complexity of O(N)for Algorithm

3. Finally, the RGRASP-SemTS (Algorithm 1) complexity is

deﬁned as O(N∗max it). It results from the multiplication

of the number of iterations the algorithm executed (max it)

and O(N)of the Algorithm 2.

V. EXP ER IME NT S

This section details the experiments and it is organized

as follows. Section V-A presents the datasets and evaluation

metric, while Section V-B details the features evaluation.

Finally, Section V-C compares the semantically enriched seg-

ments generated by RGRASP-SemTS with other state-of-the-

art supervised and unsupervised algorithms.

A. Datasets and evaluation metric

We used two real world datasets: (i) the Atlantic hurricane

track dataset and (ii) tracked grey seals dataset.

The Atlantic hurricane track dataset 2contains information

regarding hurricanes collected from 2000 to 2012 and it was

divided into segments labeled low intensity hurricanes (e.g.,

surface wind ≤63 knots) and high intensity hurricanes (e.g.,

surface wind >64 knots). Trajectories with less than 20

points were discarded to avoid the segmentation of very small

trajectories and because most of them only contained the

semantic label low intensity.

The grey seals dataset contains information regarding seals’

trajectories collected from Argos satellite tags deployed from

Sable Island, Nova Scotia, Canada. This dataset contains

segments with labels foraging and traveling, assigned by

domain specialists [3].

In the experiments we evaluate the segments generated

by the segmentation algorithms using the Area Under the

Curve (AUC) of the Receiver Operating Characteristic (ROC)

analysis due to the presence of imbalanced data. The semantic

label has been registered for each trajectory point of all

datasets with the ground truth in the datasets used in the

experiments. The ground truth is the data classiﬁcation stored

in the database by domain specialists, and it is used to validate

the segmentation results. This validation aims at verifying if

the assignment of the semantic label to each point of the

segment done by the segmentation algorithm is correct. Thanks

to this information, it is possible to build the confusion matrix

of the ROC analysis and compute the AUC.

2http://weather.unisys.com/hurricane/atlantic/

B. Features evaluation tests

The ﬁrst step of the features evaluation is the features

generation. This step is very important for RGRASP-SemTS

because many point and segment features can be generated

from trajectory raw data and relevant features are unknown a

priori for a given domain. The key idea is to generate a large

set of point and segment features and verify in a subsequent

step which of them better characterizes a semantic label.

For the hurricanes dataset, we generated as point features:

the maximum sustained surface wind at six-hour intervals,

the estimated speed in meters per second and the direction

variation between points from 0 to 180 degrees. For segment

features, we computed: average, minimum and maximum

values of surface wind, estimated speed, and direction, the

ground distance between the ﬁrst point and the last point

of the segment and the elapsed time for each segment. For

the grey seals dataset, the point features extracted were the

depth, the distance from the shore in km, and a binary column

indicating whether the seal was near the shore using a distance

threshold of 15km, estimated speed, and direction variation.

For segment features, we computed: average, minimum and

maximum values of all the 5 point features and the ground

distance between the ﬁrst point and the last point of the

segment and the elapsed time for each segment.

Finally, for the features selection we used the feature rank-

ing method χ2[11] implemented in the Weka [7] package. We

selected the best set of features, by analyzing the RGRASP-

SemTS’s performance in terms of AUC values. In particular,

ﬁve trajectories were randomly selected, and their segment

features were extracted. Sequentially, an ARFF ﬁle (the Weka

input format) was generated, containing the acquired infor-

mation. It is worth noticing that we used here only segment

features due to a Weka package limitation since this software

only allows the representation of each labeled segment as a

single example. The χ2algorithm has been executed, and

stored the rank of each segment feature. Subsequently, we

executed RGRASP-SemTS with the maximum number of

features, and stored the AUC value. Finally, the last ranked

segment feature from the χ2was removed and the AUC value

measured once again. The procedure was repeated until only

one segment feature remained.

Figures 2(a) and 2(b) show the AUC performance of the

RGRASP-SemTS using this approach. It is worth noticing that,

for the hurricane dataset, 3 segment features (e.g., average,

minimum and maximum surface wind) have generated the

best result in terms of AUC value (0.9545). For the grey

seals dataset, 8 segment features (e.g., average, maximum and

minimum direction variation, maximum and minimum speed

and the distance between the ﬁrst and last point of the segment)

have produced the best AUC performances, which was 0.9101.

Based on these results, we decided to use three segment

features and the point feature surface wind for the hurricane

dataset. For the grey seals dataset, the decision was to use the

eight segment features and the estimated speed and direction

variation as point features.

Fig. 2. Learning curve for the features selection via χ2method.

(a) Learning curve for the hurricanes dataset.

(b) Learning curve for the grey seals dataset

C. Evaluating RGRASP-SemTS

This section evaluates the performance of the RGRASP-

SemTS when compared to other approaches from the liter-

ature. Section V-C1 compares RGRASP-SemTS’s execution

time to the unsupervised approach GRASP-UTS. Finally, in

Section V-C2, the performance of RGRASP-SemTS is com-

pared with other state-of-art algorithms.

1) Runtime analysis of RGRASP-SemTS: In this section we

evaluate the execution time of RGRASP-SemTS by comparing

with the unsupervised version GRASP-UTS. The objective is

to show how the information provided by the user improves the

execution time when compared to the unsupervised GRASP-

UTS. In this experiment, one trajectory of each dataset was

randomly selected, and both the RGRASP-SemTS and the

GRASP-UTS were executed one hundred times (100 iter-

ations). A minT ime value was used for both algorithms

(12 hours for both datasets), and a full search (partitioning

factor input parameter for GRASP-UTS) was ensured for both

algorithms.

Figure 3(a) and Figure 3(b) summarize the results. For

the hurricanes dataset, on average, the RGRASP-SemTS was

0.064 seconds faster than GRASP-UTS, while it was 6.7

seconds faster for the grey seals dataset. After one hundred

iterations, it is possible to notice that 1 second was saved

for executions of the hurricane dataset and 600 seconds were

saved for the grey seal dataset. It is important to observe that

this difference is probably because the hurricanes’ trajectories

are smaller in point length (between 80 and 140 points), while

the grey seals’ trajectories contain more points to be evaluated

(each trajectory contains more than 1000 points).

Although RGRASP-SemTS and GRASP-UTS have the

same complexity O(N∗max it), the runtime difference

between them is a result of the necessary time for the

GRASP-UTS to re-build all the solution when landmarks are

modiﬁed (i.e. inserted, removed or had positions switched).

Since RGRASP-SemTS uses some information provided by

the user, fewer modiﬁcations in the segments are made when

an iteration is executed.

Fig. 3. Execution time analysis of RGRASP-SemTS

(a) Time analysis for the hurricanes dataset.

(b) Time analysis for the grey seals dataset

2) Comparison of RGRASP-SemTS with state-of-art al-

gorithms: This section presents a performance comparison

assessed in terms of AUC between RGRASP-SemTS and other

state-of-art unsupervised (GRASP-UTS [8] and WK-Means

[10]) and supervised (e.g., CB-SMoT [12] and SPD [22])

segmentation algorithms.

The objective was to evaluate the performance of the

algorithms when only small amount of data are available for

training therefore we limit the analysis to one sub-dataset.

We divided both the hurricanes and grey seals dataset into

10 subsets.

The RGRASP-SemTS was executed in the testing set, using

as input for the segment landmarks the labeled examples con-

tained in the training set. For the other algorithms, parameters’

values estimated in the training set were used in the testing

set. This procedure was repeated using each single sub-dataset

as the training set and tested in the remaining pieces of data.

We computed the AUC values using all single sub-datasets

as training data and executing the algorithms on the best set

of input parameters’ values found for each method in the test

dataset. We computed an average AUC value (avg. AUC) for

all the combinations of training and testing datasets.

Tables I and II show the results obtained by all methods,

where one sub-dataset was used to train the algorithms and

the remaining 9 sub-datasets were used to test the average

AUC. We veriﬁed whether any substantial difference existed

between the means obtained by RGRASP-SemTS and the

means obtained by the other algorithms using a paired t-test.

with conﬁdence level of 5% with 9 degrees of freedom. If

the t-value computed by the RGRASP-SemTS and the other

algorithms is greater than 2.82, the evidence that the means are

equal is rejected, allowing to draw the conclusion that there is

a substantial evidence that the two algorithms had signiﬁcant

differences in their performances.

TABLE I

COMPARISON OF UNSUPERVISED,SU PERV IS ED A ND S EM I-S UPE RVI SE D

ALGORITHMS FOR THE HURRICANES DATASET.

Algorithm avg. AUC t-score mean

difference

RGRASP-SemTS 0.94 - -

GRASP-UTS 0.81 22.51 0.13

WK-Means 0.87 14.04 0.06

CB-SMoT 0.47 136.13 0.47

SPD 0.45 130.39 0.48

For the hurricane dataset, RGRASP-SemTS had the best

average AUC performance achieving 0.94. Compared to the

unsupervised algorithms GRASP-UTS and WK-Means, which

achieved an average AUC of 0.81 and 0.87 for testing,

respectively, the RGRASP-SemTS offers an improvement of

at least 0.06 in terms of average AUC. The differences in the

mean AUC are signiﬁcant since the t-score was higher than

2.82, amounting to 22.51 when compared to GRASP-UTS and

14.04 when compared to the WK-Means. It is important to

point out that the WK-Means algorithm received exactly the

number of segments that should be found on each trajectory,

while the RGRASP-SemTS did not. When compared to the

supervised methods named CB-SMoT and SPD, the gains were

at least 0.47 in terms of avg. AUC.

For the grey seal dataset, RGRASP-SemTS also achieved

the best average AUC performance, as depicted in Table II.

When RGRASP-SemTS is compared to GRASP-UTS, gains

of 0.08 in were obtained. This difference has signiﬁcance since

the t-score was 14.08 (higher than 2.82). The difference be-

tween RGRASP-SemTS and WK-Means were 0.07 on average

AUC in testing. This difference also has a signiﬁcance, as the

t-score was 6.23. When compared to the supervised methods,

namely CB-SMoT and SPD, gains of at least 0.37 of average

AUC were obtained.

TABLE II

COMPARISON OF UNSUPERVISED,SU PERV IS ED A ND S EM I-S UPE RVI SE D

AL GOR IT HM S FO R TH E GR EY S EA LS DATAS ET.

Algorithm avg. AUC t-score mean

difference

RGRASP-SemTS 0.88 - -

GRASP-UTS 0.80 14.08 0.08

WK-Means 0.81 6.23 0.07

CB-SMoT 0.19 128.69 0.69

SPD 0.53 74.38 0.35

VI. CONCLUSIONS

The research ﬁeld of trajectory segmentation, although

well studied in the literature, has not explored the concept

of semi-supervised learning deeply: the use a small set of

segments labeled by the user combined with an unsupervised

segmentation driven by the training set. The objective is to

achieve high accuracy even when few labeled examples are

available. This is particularly useful for segmenting trajectories

based on semantics.

This paper gives a contribution in this direction since it

proposes RGRASP-SemTS, a reactive and semi-supervised

algorithm for semantically segmenting trajectory data. This al-

gorithm exploits labeled and unlabeled data to ﬁnd an optimal

segmentation of a trajectory by modifying segment landmarks

to achieve homogeneity in the segments using a cost function

based on the MDL principle. The main advantage of this

algorithm is that few examples can be used to target the seg-

mentation task to speciﬁc domains. Furthermore, RGRASP-

SemTS is reactive in the sense that values for the input

parameters (αand minT ime) are automatically determined

by the solutions produced during algorithm’s iterations. We

performed experiments with two real-world datasets to assess

the effectiveness of our approach. The results show that the

proposed algorithm outperforms the state-of-art competitors.

We intend to extend this work to improve the overall perfor-

mance by generating better sets of semantic landmarks, instead

of just computing averages.

ACKNOWLEDGMENTS

This paper is supported by the MASTER project that has

received funding from the European Union’s Horizon 2020 re-

search and innovation programme under the Marie-Slodowska

Curie grant agreement N.777695. The authors would also like

to thank NSERC (Natural Sciences and Engineering Research

Council of Canada) for ﬁnancial support.

REFERENCES

[1] S. P. A. Alewijnse, K. Buchin, Maike B., A. K¨

olzsch, H. Kruckenberg,

and M. A. Westenberg. A framework for trajectory segmentation by

stable criteria. In 22Nd ACM SIGSPATIAL Conference, pages 351–360,

New York, NY, USA, 2014. ACM.

[2] V. Bogorny, C. Renso, A. R. de Aquino, F. de Lucca Siqueira, and L. O.

Alvares. CONSTAnT a conceptual data model for semantic trajectories

of moving objects. Trans. in GIS, 18(1):66–68, 2014.

[3] G. A. Breed, I. Jonsen, R. A. M, W. D. Bowen, and M. L. Leonard.

Sex-speciﬁc, seasonal foraging tactics of adult grey seals (Halichoerus

grypus) revealed by state-space analysis. Ecology, 90(11), 2009.

[4] M. Buchin, A. Driemel, M. Van Kreveld, and V. Sacristan. Segmenting

trajectories: A framework and algorithms using spatiotemporal criteria.

Journal of Spatial Information Science, 3(3):33–63, 2011.

[5] M. Etemad, A. Soares J´

unior, and S. Matwin. Predicting transportation

modes of GPS trajectories using feature engineering and noise removal.

In Advances in Artiﬁcial Intelligence, pages 259–264. Springer Interna-

tional Publishing, 2018.

[6] P. D. Grunwald, I. J. Myung, and M. Pitt. Advances in Minimum

Description Lenght. MIT Press, 2005.

[7] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H.

Witten. The weka data mining software: An update. SIGKDD Explor.

Newsl., 11(1):10–18, 2009.

[8] A. Soares J´

unior, B. N. Moreno, V. C. Times, S. Matwin, and L. A. F.

Cabral. GRASP-UTS: an algorithm for unsupervised trajectory segmen-

tation. Int. J. of Geographical Information Science, 29(1):46–68, 2015.

[9] Jae-Gil Lee, Jiawei Han, and Kyu-Young Whang. Trajectory clustering:

a partition-and-group framework. In Proceedings of the 2007 ACM

SIGMOD international conference on Management of data, pages 593–

604, New York, NY, USA, 2007. ACM.

[10] Luis A. Leiva and Enrique Vidal. Warped k-means: An algorithm to

cluster sequentially-distributed data. Information Sciences, 237(0):196

– 210, 2013.

[11] H. Liu and R. Setiono. Chi2: Feature selection and discretization of

numeric attributes. In In Proceedings of the Seventh International

Conference on Tools with Artiﬁcial Intelligence, pages 388–391, 1995.

[12] J. A. Manso, V. C. Times, G. Oliveira, L. O. Alvares, and V. Bogorny.

Db-smot: A direction-based spatio-temporal clustering method. In IEEE

International Conference on Intelligent Systems (IS), pages 114–119,

2010.

[13] B. N. Moreno, A. Soares J´

unior, V. C. Times, P. Tedesco, and Stan

Matwin. Weka-SAT: A hierarchical context-based inference engine to

enrich trajectories with semantics. In Advances in Artiﬁcial Intelligence,

pages 333–338, Cham, 2014. Springer International Publishing.

[14] A. T. Palma, V. Bogorny, B. Kuijpers, and L. O. Alvares. A clustering-

based approach for discovering interesting places in trajectories. In

ACMSAC, pages 863–868, 2008.

[15] C. Parent, S. Spaccapietra, C. Renso, G. L. Andrienko, N. V. Andrienko,

V. Bogorny, M. L. Damiani, A. Gkoulalas-Divanis, J. A. F. de Macˆ

edo,

N. Pelekis, Y. Theodoridis, and Z. Yan. Semantic trajectories modeling

and analysis. ACM Comput. Surv., 45(4):42, 2013.

[16] M. Prais and C. C. Ribeiro. Reactive grasp: An application to a matrix

decomposition problem in TDMA trafﬁc assignment. INFORMS J. on

Computing, 12(3):164–176, 2000.

[17] Chiara Renso, Stefano Spaccapietra, and Esteban Zimanyi, editors.

Mobility Data: Modeling, Management, and Understanding. Cambridge

Press, 2013.

[18] A. Soares J´

unior, C. Renso, and S. Matwin. ANALYTiC: An active

learning system for trajectory classiﬁcation. IEEE Computer Graphics

and Applications, 37(5):28–39, 2017.

[19] S. Spaccapietra, C. Parent, M. L. Damiani, J. A. Macedo, F. Porto, and

C. Vangenot. A conceptual view on trajectories. DKE, 65(1):126–146,

2008.

[20] Z. Yan, N. Giatrakos, V. Katsikaros, N. Pelekis, and Y. Theodor-

idis. Setrastream: Semantic-aware trajectory construction over streaming

movement data. In 12th Int. Conf. on Advances in Spatial and Temporal

Databases, SSTD’11, pages 367–385. Springer, 2011.

[21] H. Yoon and C. Shahabi. Robust time-referenced segmentation of mov-

ing object trajectories. In 2008 Eighth IEEE International Conference

on Data Mining, pages 1121–1126. IEEE, December 2008.

[22] Y. Zheng, L. Zhang, Z. Ma, X. Xie, and W. Ma. Recommending friends

and locations based on individual location history. ACM Trans. Web,

5(1):5:1–5:44, 2011.