Conference PaperPDF Available

VISTA: A visual analytics platform for semantic annotation of trajectories

Authors:

Abstract and Figures

Most of the trajectory datasets only record the spatio-temporal position of the moving object, thus lacking semantics and this is due to the fact that this information mainly depends on the domain expert labeling, a time-consuming and complex process. This paper is a contribution in facilitating and supporting the manual annotation of trajectory data thanks to a visual-analytics-based platform named VISTA. VISTA is designed to assist the user in the trajectory annotation process in a multi-role user environment. A session manager creates a tagging session selecting the trajectory data and the semantic contextual information. The VISTA platform also supports the creation of several features that will assist the tagging users in identifying the trajectory segments that will be annotated. A distinctive feature of VISTA is the visual analytics functionalities that support the users in exploring and processing the trajectory data, the associated features and the semantic information for a proper comprehension of how to properly label trajectories.
Content may be subject to copyright.
VISTA: A visual analytics platform for semantic annotation of
trajectories
Amílcar Soares
Institute for Big Data Analytics
Halifax, Canada
amilcar.soares@dal.ca
Jordan Rose
Institute for Big Data Analytics
Halifax, Canada
jrose@dal.ca
Mohammad Etemad
Institute for Big Data Analytics
Halifax, Canada
etemad@dal.ca
Chiara Renso
ISTI-CNR
Pisa, Italy
chiara.renso@isti.cnr.it
Stan Matwin
Institute for Big Data Analytics
Halifax, Canada
stan@dal.ca
ABSTRACT
Most of the trajectory datasets only record the spatio-temporal
position of the moving object, thus lacking semantics and this
is due to the fact that this information mainly depends on the
domain expert labeling, a time-consuming and complex process.
This paper is a contribution in facilitating and supporting the
manual annotation of trajectory data thanks to a visual-analytics-
based platform named VISTA. VISTA is designed to assist the
user in the trajectory annotation process in a multi-role user en-
vironment. A session manager creates a tagging session selecting
the trajectory data and the semantic contextual information. The
VISTA platform also supports the creation of several features
that will assist the tagging users in identifying the trajectory
segments that will be annotated. A distinctive feature of VISTA
is the visual analytics functionalities that support the users in
exploring and processing the trajectory data, the associated fea-
tures and the semantic information for a proper comprehension
of how to properly label trajectories.
1 INTRODUCTION
The increasing access to positioning devices technologies such
as smart-phones, GPS-enabled cameras and sensors, resulted in
vast volumes of mobility data collected, stored and available for
analysis. Such mobility data are typically modeled as streams of
spatio-temporal points, called trajectories. There is a growing
research interest in analysis methods for semantically enriched
trajectories [
6
,
7
]. However, only a few datasets containing se-
mantically labeled trajectory data are available. This is primarily
because the semantic labels tend to indicate a specic behavior
whose identication depends on the humans’ interpretation: in
fact, the understanding of the moving object behavior depends
on multiple factors and, in most of the cases, it cannot be auto-
matically inferred. The process of manual annotation may be,
therefore, complex and time-consuming, even by domain experts
who might be uncertain in the correct data interpretation or they
might disagree. Indeed, the labeling process is complex due to a
number of factors: (1) the annotator user needs to have an imme-
diate view and understanding, not only of the spatio-temporal
trajectory details and the numerical features deriving from them
but also of the contextual semantic information; (2) dierent an-
notators might have dierent roles and interpretations on how
to label trajectories (e.g., "is the ship shing or not?"); (3) not
©
2019 Copyright held by the owner/author(s). Published in Proceedings of the
22nd International Conference on Extending Database Technology (EDBT), March
26-29, 2019, ISBN 978-3-89318-081-3 on OpenProceedings.org.
Distribution of this paper is permitted under the terms of the Creative Commons
license CC-by-nc-nd 4.0.
only the whole trajectory might express a single behavior, very
often the parts of a trajectory have to be individually labeled
since dierent behavior may co-exist during the same journey
(e.g., a vessel is rst shing then traveling to harbor). How to cor-
rectly identify the switch points from one behavior to the other
is another critical issue to be solved. We believe that these issues
might be alleviated by a proper trajectory annotation platform
that eectively assists the user in coping with all these aspects
during the semantic labeling process.
A promising approach is to build a platform supporting the
annotation by exploiting concepts from Visual Analytics. The
idea of Visual Analytics is to create systems that enable analytical
reasoning about complex problems with the goal of making the
data processing and the inferred information clear and evident
for the analysis [
4
]. This is accomplished by integrating data
processing methods with visualizations designed to assist users
in making ecient decisions. We can observe that the human
mind generally conveys information more eectively through
visualization than only relying on textual and numerical data.
Combining visualization with user interactions enables a system
to explore the data from dierent perspectives, thus linking and
combining distinct information pieces to derive new insights
from the data.
Pioneering works on semantic trajectories annotations [
5
,
10
]
used a predened set of rules established by domain experts to
automatically assign the semantic labels to the whole trajectories
or to segments of trajectories. Although these methods work
well for many scenarios, they are not suitable in case there are
no clear criteria for identifying the object behavior and/or the
segment to be labeled. In this case, we need the human interven-
tion to establish the most appropriate label for each trajectory
(or trajectory segment) based on both numerical and semantic
features. Recently, some works proposed methods to simplify
the manual annotation task like in [
8
] where a web interface has
been proposed to upload personal trajectories and annotate each
segment with the activity performed by the user. However, in
this case, all trajectory segments need to be annotated directly by
the traveling user, and this becomes unfeasible when the number
of trajectory segments is very high and/or the annotator does
not represent the entity which performed the movement like
in the case of vehicles, vessels or animals. To cope with this
problem, other methods propose machine learning approaches
(e.g. semi-supervised or active learning) to automatically classify
trajectories into semantic labels by starting from a small set of
manually annotated traces [
2
,
3
]. All these approaches must rely
on an accurately labeled dataset to reach a good classication
accuracy. Therefore, there is a strong need for supporting the
manual annotation process which can lead to good quality anno-
tated datasets and therefore reliable analysis ndings. To the best
of our knowledge, no approach combines visual analytics with
trajectory processing functions to assist the user in the process
of manually tagging trajectory data.
The challenge of a visual analytics trajectory annotation sys-
tem is to provide ecient and eective support for the user inter-
action, helping the user in focussing on each specic annotation
task, highlighting the features values for each trajectory point or
segment and properly visually combine contextual knowledge.
Specically, the annotators need to use their domain knowledge
for thinking, creating associations, and generating insights from
the trajectory data. On the other hand, the system also needs
capabilities for processing and aggregating trajectory data. De-
ciding where to set the segmentation point in a trajectory is very
challenging because it will directly aect the values of the seg-
ments’ features and therefore the overall labeling of the trajectory
dataset. We propose VISTA as an interactive visual user interface
integrating spatio-temporal processing capabilities to play an
essential role in semantic trajectory annotation. With VISTA, we
provide full support to the manual trajectory annotation by tailor-
ing a visual analytics platform that guides the user in this process.
We strongly believe that VISTA can signicantly contribute to
the scientic community in supporting the domain experts in
the annotation process and produce more reliable semantically
labeled trajectory datasets available for analysis.
2 SYSTEM ARCHITECTURE
The architecture of VISTA has been designed to provide solutions
to the three issues introduced in the previous section, namely
the immediate understanding, the dierent users’ role and the
support for the segment switch points identication.
A distinctive feature of VISTA is the possibility to handle a
multi-user annotation process where the users might have dif-
ferent roles. VISTA supports both the creation of an annotation
session (e.g., upload trajectories, contextual information, feature
creation, etc) by the session manager and the annotation process
itself by the annotator (e.g., user analyses a trajectory, provide par-
titioning positions, compare the segments, annotates a trajectory,
etc).
Figure 1: VISTA architecture and workow.
The architecture and workow overview is illustrated in Fig-
ure 1. As we can see, it is organized into three main components:
(i) the data collection to handle raw trajectory and contextual
geographical information like Point of Interests (POI) and Re-
gion Of Interests (ROI), (ii) the Data Processing that deal with
trajectory and annotation data, and (iii) the Data Visualization to
interact with the users in the processes of both creating a tagging
session, detect the switch points and annotating trajectories. The
workow of VISTA to perform trajectory annotation includes
six steps depicted as the numbers of the arrows of Figure 1. In
the rst step, the session manager is requested to set the stage
for the annotation process, namely upload raw trajectories and
the POIs and ROIs that are relevant to the studied domain. In the
following step (Step 2), the data processing engine automatically
creates numerical features related to each trajectory point, called
point features like the distance traveled, the estimated speed, the
bearing, the bearing rate, and the acceleration. The engine also
calculates the shortest distances to a POIs, and detects when
the trajectory points intersect with ROIs. The shapely library
1
was used to calculate the shortest distance and intersections. At
this point, each trajectory uploaded by the user is stored as a
document in a MongoDB
2
collection. In step 3, VISTA displays
the trajectories associated to these new features to the session
manager whose tasks are: (1) the data exploration for the selec-
tion of the features that are relevant for the tagging session, (2)
the creation of the annotation classes (i.e. labels) that must be
used in a tagging session, and (3) the invitation to the annotator
users to participate in a tagging session. With a tagging session
created, it is now possible for the invited annotators to start
the tagging process of trajectory data. In step 4, the annotators
explore the trajectory data and create the trajectory segments
that must reect the annotation classes available for tagging.
Several visualization functionalities are available for the process
of tagging and are detailed in Section 3. After going through all
trajectories and tagging their segments with the labels (Step 5), a
dataset with annotated trajectories will be available for all the
users to download (Step 6).
As we discussed above, a crucial step in annotating trajec-
tory data is to determine the parts of the trajectory where the
moving object’s behavior changes. Detecting such switches in
the object movement behavior is challenging for an annotating
user since there is a need to explore a possibly high number
of features to precisely determine where and when the object
behavior changed. This is done through a process usually called
segmentation. Segmenting a trajectory means to nd the parti-
tioning points that are used to create trajectory segments, or
sub-trajectories, characterized by the fact that each segment has
uniform behavior respect to some criteria [
9
]. Once these seg-
ments have been identied, additional numerical features like
average, median, standard deviations, and percentiles may be
created to better characterize the behavior of the moving object
in that segment (the so-called segment features).
3 DEMONSTRATION
The objective of our demonstration is to involve the user in the
VISTA tagging experience by providing a trajectory dataset to
be annotated together with semantic contextual information.
We have selected two datasets: (1) 10 trajectories of vessels that
should be annotated as "shing" and "not "shing" activity; (2) 20
trajectories of people that should be annotated with transporta-
tion means as "walking", "bike", "train", "bus", and "car". During
the demo the authors will guide attendees through the whole
process where the user will experience both roles, the session
manager and the annotator, whose tasks and relative interfaces
are detailed in the next sections.
1https://pypi.org/project/Shapely/
2https://www.mongodb.com/, version 3.6.2
3.1 Session Manager
The role of this user is setting the stage for the actual annotation
process. This is done through four screens. In the rst screen,
the user is requested to create and give a name to a new tagging
session or select a session that was previously created. In the
second screen, the user is requested to upload raw trajectories in
delimited separated le format (e.g., CSV, TSV, etc) and map the
le elds to the columns representing the raw trajectory data:
trajectory_id, time, latitude, longitude. In the third screen, the
user is asked to upload POIs and/or ROIs that are relevant to the
domain. Then, VISTA executes a process in the background to
create the following point features for each raw trajectory point:
time dierence from the previous point, distance traveled from
the previous point, speed, acceleration, bearing, jerk, bearing rate,
and rate of bearing rate. The relative computation formulas and
details can be found in [
1
]. The platform also will create other
two point features using POIs and ROIs: (1) for every POIs layer,
the platform will calculate the shortest distance to a POI on the
particular layer; (2) for ROIs layers, the platform will verify if
a trajectory point intersects a ROI in the specic layer. Finally,
in the fourth screen, the user will create the labels that must
be used by annotators and invite annotators in the trajectory
tagging session by providing their emails. This sequence of ac-
tions represents steps 1, 2 and 3 of Figure 1. After this step, the
annotators will receive a notication with the invitation to enter
the tagging session.
3.2 Annotator
When a user receives an invitation for a tagging session, he/she
can start to tag trajectories. The process of annotating trajec-
tories is iterative and interactive, where a single trajectory is
presented to the user and further explored in each iteration with
the system. We recall that the two-fold objective of a tagging
session is (1) to identify the segments of the trajectory with a
uniform behavior, or, in other words, identify the change points
and (2) actually tag the segment with the appropriate label. In
VISTA, the annotator has access a dashboard providing tools to
understand the behavior of the moving object, depicted in Figure
2. We observe that the main screen is divided into two interac-
tive panels: (i) a map on the left and (ii) summary statistics on
the right. The map panel visualization needs to be eective in
showing the actual movement of the trajectory with point and
trajectory features. For this reason, we implemented two visual-
ization solutions to support the annotator in understanding the
movement data: rst, the actual moving object movement can be
played, dynamically showing how the moving object performed
its movement in a particular region. Second, the line colors are
displayed in saturation grades of red, reecting the value of the
point feature selected on the right side of the screen; a low value
is colored with a light intensity of red, and higher values are col-
ored with a more intense red. There is an automatic interaction
between the two panels: the user can select a segment and/or
point features in the right panel and the relative parts in the map
on the left are highlighted. Conversely, the annotator can select a
trajectory point in the map, and the corresponding segment and
point feature are highlighted in the right panel. On the bottom of
the map, the red color legend is presented to the annotator user
to have an immediate perception of what is happening between
the points sequence. Inside the map and in the top left, VISTA
provides some typical Geographical Information System (GIS)
visualization options, such as zoom-in and zoom-out, display or
hide POIs and ROIs, and change the colors of the annotation
classes (e.g., shing or not-shing).
In more detail, the right panel provides the following options
and statistics: (i) On the top, it displays a summary computed
from the trajectory data with its total number of points, total dis-
tance travelled in meters and the average sampling rate between
the trajectory points; (ii) at the bottom, the user may choose to
visualize the data as a line chart that shows the values of the point
features following the temporal order or a scatter plot where the
user can try to nd correlations between two point features.
When a trajectory is shown on the screen, the annotator user
is requested to provide the partitioning positions to split the tra-
jectory into the segments representing uniform behavior to be
assigned to the class labels. This is the most challenging function-
ality to develop since the correct segmentation is fundamental
for a good quality annotation process. However, identifying the
switch points is particularly dicult since many dierent aspects
have to be considered at the same time. It is, therefore, challeng-
ing to support the user in this "multi-dimensional view" where
the most crucial aspects have to be considered jointly. For this
reason, we have created a drag and drop tool on the map where
the user can add a new partitioning point to the screen, assign to
a class label and pin exactly the switch point where to split the
trajectory. By adding a new partitioning position, all the statistics
regarding the classes of the trajectory have to be automatically
recomputed since new labeled segments are provided. This is
captured by the two panels described above. First, on the map
panel, the colors of the trajectory segments change according to
the class provided. Second, the colors of both the scatter plots
and the line chart change to reect the new information provided
by the user. We also provide statistical measures regarding the
trajectory segments and their classes. Trajectories are sent to
user in a sequential way, one per interaction, which the user’s
objective is to segment it using the drag and drop pins provided
in the left side of the panel. When a trajectory is completely
annotated, the user can press the Next button on the bottom of
the screen to receive a new trajectory to be annotated.
3.3 Summarizing annotations from users
When all the users conclude the annotation process, a summariz-
ing screen (Figure 3) is created with the objective of exploring
how the dierent users tagged the data. In VISTA, the users can
confront their tagging with other annotators by exploring how
they annotated their dataset and how the results of their tagging
session are similar or not to the other users. The tagging session
results can be compared for both point and segment features.
Figure 3 shows the main screen with two panels: (1) on the left,
the user is able to analyze statistics (e.g. minimum, maximum,
average, standard deviation, and percentiles) of all point features
available on the platform; (2) on the right, the user may want
to analyze statistics regarding the segment features. For both
charts, the average behavior per user and class are plotted with
the objective of understanding if the users most likely agree or
disagrees regarding some feature.
4 CONCLUSIONS AND FUTURE WORKS
We have proposed VISTA, an interactive tool based on visual
analytics principles, supporting the users in semantically anno-
tate trajectory data. A distinctive feature of VISTA is the sup-
port for the identication of trajectory segments and the assign-
ment to the relative semantic label. We intend to expand this
Figure 2: Elements of the annotator user dashboard with the vessels trajectories dataset.
Figure 3: Summary results with the users’ annotations.
platform into two directions. First, we want to create a mod-
ule that automatically suggests how to segment the trajectory
by learning from the previous interactions with the platform.
Second, we intend to improve the results comparing how the
labels have been assigned by the dierent users, highlighting
when users agree or disagrees in identifying a specic behavior
of a moving object. The tool is available for testing at the URL
https://bigdata.cs.dal.ca/resources.
ACKNOWLEDGEMENTS
This work has been supported by the MASTER project that has re-
ceived funding from the European Union’s Horizon 2020 research
and innovation programme under the Marie Slodowska-Curie
grant agreement N. 777695. The authors would also like to thank
NSERC (Natural Sciences and Engineering Research Council of
Canada) for nancial support.
REFERENCES
[1]
Mohammad Etemad, Amílcar Soares Júnior, and Stan Matwin. 2018. Predicting
Transportation Modes of GPS Trajectories using Feature Engineering and
Noise Removal. In 31st Canadian Conference on Articial Intelligence, Canadian
AI 2018, Toronto, Canada, May 8–11. Springer, 259–264.
[2]
Amílcar Soares Júnior, Chiara Renso, and Stan Matwin. 2017. ANALYTiC: An
Active Learning System for Trajectory Classication. IEEE computer graphics
and applications 37, 5 (2017), 28–39.
[3]
Amilcar Soares Junior, Valeria Cesario Times, Chiara Renso, Stan Matwin,
and Lucidio A. F. Cabral. 2018. A Semi-Supervised Approach for the Semantic
Segmentation of Trajectories. In 2018 19th IEEE International Conference on
Mobile Data Management (MDM). 145–154.
[4]
Daniel Keim, Gennady Andrienko, Jean-Daniel Fekete, Carsten Görg, Jörn
Kohlhammer, and Guy Melançon. 2008. Visual analytics: Denition, process,
and challenges. In Information visualization. Springer, 154–175.
[5]
Christine Parent, Stefano Spaccapietra, Chiara Renso, Gennady Andrienko,
Natalia Andrienko, Vania Bogorny, Maria Luisa Damiani, Aris Gkoulalas-
Divanis, Jose Macedo, Nikos Pelekis, et al
.
2013. Semantic trajectories modeling
and analysis. ACM Computing Surveys (CSUR) 45, 4 (2013), 42.
[6]
Christine Parent, Stefano Spaccapietra, Chiara Renso, Gennady L. Andrienko,
Natalia V. Andrienko, Vania Bogorny, Maria Luisa Damiani, Aris Gkoulalas-
Divanis, José Antônio Fernandes de Macêdo, Nikos Pelekis, YannisThe odoridis,
and Zhixian Yan. 2013. Semantic trajectories modeling and analysis. ACM
Comput. Surv. 45, 4 (2013), 42:1–42:32. https://doi.org/10.1145/2501654.2501656
[7]
Chiara Renso, Stefano Spaccapietra, and Esteban Zimanyi (Eds.). 2013. Mobility
Data: Modeling, Management, and Understanding. Cambridge Press.
[8]
Salvatore Rinzivillo, Fernando de Lucca Siqueira, Lorenzo Gabrielli, Chiara
Renso, and Vania Bogorny. 2013. Where Have You Been Today? Annotating
Trajectories with DayTag. In Advances in Spatial and Temporal Databases -
13th International Symposium, SSTD 2013, Munich, Germany, August 21-23,
2013. Proceedings. 467–471. https://doi.org/10.1007/978-3- 642-40235-7_30
[9]
A. Soares Júnior, B. N. Moreno, V. C. Times, S. Matwin, and L. A. F. Cabral.
2015. GRASP-UTS: an algorithm for unsupervised trajectory segmentation.
International Journal of Geographical Information Science 29, 1 (2015), 46–68.
[10]
Zhixian Yan, Dipanjan Chakraborty, Christine Parent, Stefano Spaccapietra,
and Karl Aberer. 2011. SeMiTri: a framework for semantic annotation of
heterogeneous trajectories. In Proceedings of EDBT. ACM, 259–270.
... There is a lack of platforms which helps the user to do annotation process. The platform called VISTA [25] was the first approach to provide a platform to annotate trajectory data by providing visual insights of data aiming at assisting the user in such task. Research has shown that graphic details affect the human decision-making process more than numerical values and methods. ...
... Therefore, Visual Analytics is an appropriate choice for handling trajectory datasets. VISTA [25] provides a platform for labeling trajectory data and creates trajectory segments manually. Annotating raw trajectories is a very tedious task that involves great human interaction and time. ...
... The dataset objective is to predict the transportation mode of people moving in a city. The labels for the dataset contains a value such as a bike, walk, run, train and car [25]. ...
Thesis
Full-text available
A trajectory is a time-ordered sequence of geolocations of a moving object. With the advancement in geolocation technologies and their availability on new devices, it became easier to record any moving object's trajectory. However, the majority of tra-jectory datasets have no labels. A visual analytics platform for semantic annotation of trajectories (VISTA) was proposed to address this issue. The primary issue with the existing VISTA platform is that the manual annotation process is burdensome, i.e., the manual labeling process needs expert user involvement, and the labeling process is time-consuming and intensive. This project aims to implement a semi-automatic mechanism that overcomes the labor-intensive task of the manual trajectory annotation process and reduces the human involvement and effort with the platform. For the implementation of this new feature, we used two machine learning models. The first is the Wise Sliding Window Segmentation (WSII) algorithm, a segmentation strategy to partition trajectories into more similar subparts. The second is a labeling model that provides appropriate labels for the partitions suggested by WSII. The new semi-automated process works as follows. The user performs the manual annotation for the first trajectory. Onwards, the semi-automation pipeline handles the process and provides partitions and labels for the new trajectories until the user completes the annotation process for all trajectories. Implementing the new feature in the existing platform has significantly reduced human effort, interaction, and time with the platform.
... This resource-intensive labeling process can cause delays, impede the development of timely decision-making systems [10,11,12,13,14], and raise operating costs in real-world applications where time is frequently of the essence. ...
... A sliding window of 3 seconds corresponds to 25% exploration, while 6 seconds corresponds to 50% exploration. In the experiments, we aim to capture distinct dimensions of exploration within these time slices, and extending the exploration beyond 6 seconds would lead the module to revisit dimensions already explored in the initial 3 seconds, resulting in redundant data collection For each window, the pipeline generates a scaled instance (i.e., a MinMax scaler with values ranging from 0 to 1) comprising 11 statistical features, including mean, median, variance, skewness, standard deviation, quantiles(10,25,76,90), min, and max. In this way, a resultant statistical features data frame is generated for each time window on which the machine learning models are trained.This generation of features uses the window strategy depicted inFigure 3.5 and described below. ...
Preprint
Training machine learning models for classification tasks often requires labeling numerous samples, which is costly and time-consuming, especially in time series analysis. This research investigates Active Learning (AL) strategies to reduce the amount of labeled data needed for effective time series classification. Traditional AL techniques cannot control the selection of instances per class for labeling, leading to potential bias in classification performance and instance selection, particularly in imbalanced time series datasets. To address this, we propose a novel class-balancing instance selection algorithm integrated with standard AL strategies. Our approach aims to select more instances from classes with fewer labeled examples, thereby addressing imbalance in time series datasets. We demonstrate the effectiveness of our AL framework in selecting informative data samples for two distinct domains of tactile texture recognition and industrial fault detection. In robotics, our method achieves high-performance texture categorization while significantly reducing labeled training data requirements to 70%. We also evaluate the impact of different sliding window time intervals on robotic texture classification using AL strategies. In synthetic fiber manufacturing, we adapt AL techniques to address the challenge of fault classification, aiming to minimize data annotation cost and time for industries. We also address real-life class imbalances in the multiclass industrial anomalous dataset using our class-balancing instance algorithm integrated with AL strategies. Overall, this thesis highlights the potential of our AL framework across these two distinct domains.
... We can categorize them into different groups: ER models, object-oriented models, ontology models, systems. In [14], the author represents semantic trajectories by adding an annotation to the trajectory data. The model [15] considers that a semantic trajectory is built on different semantic subtrajectories, which introduces the new concept. ...
Chapter
Full-text available
The study of tourist movements and their behavior becomes an important issue. This paper presents a tourist trajectory model based on Event Time of Interest, Region of Interest and Place of Interest. This model enables the generation of trajectory data that will be stored in a trajectory data warehouse. This article develops and constructs an ontology modeling framework that acts as the foundational model platform for the creation of a semantic trajectory data warehouse. In addition, the modeling approach describes thematic dimensions that provide a design platform for analysis and knowledge discovery in the trajectory and processing data for moving objects.
... On the other hand, existing VA systems use these taxonomies as a theoretical or conceptual background while using the user-tracking data only for specific domain use-cases. For instance, user's Tacit Knowledge, as defined by Federico et al. [20], is tracked in VA by many different feedback methods, such as manual feedback systems [6,33], manual annotations over visualizations [43], and inference methods that attempt to discover the user's insights by analyzing their interactivity patterns [5]. Nevertheless, they do not provide the user-tracking data as a solution to the bigger knowledge provenance problem, only to their specific domain's goals. ...
Preprint
Full-text available
The importance of knowledge generation drives much of Visual Analytics (VA). User-tracking and behavior graphs have shown the value of understanding users' knowledge generation while performing VA workflows. Works in theoretical models, ontologies, and provenance analysis have greatly described means to structure and understand the connection between knowledge generation and VA workflows. Yet, two concepts are typically intermixed: the temporal aspect, which indicates sequences of events, and the atemporal aspect, which indicates the workflow state space. In works where these concepts are separated, they do not discuss how to analyze the recorded user's knowledge gathering process when compared to the VA workflow itself. This paper presents Visual Analytic Knowledge Graph (VAKG), a conceptual framework that generalizes existing knowledge models and ontologies by focusing on how humans relate to computer processes temporally and how it relates to the workflow's state space. Our proposal structures this relationship as a 4-way temporal knowledge graph with specific emphasis on modeling the human and computer aspect of VA as separate but interconnected graphs for, among others, analytical purposes. We compare VAKG with relevant literature to show that VAKG's contribution allows VA applications to use it as a provenance model and a state space graph, allowing for analytics of domain-specific processes, usage patterns, and users' knowledge gain performance. We also interviewed two domain experts to check, in the wild, whether real practice and our contributions are aligned.
... Our objective is to allow researchers to use our library to easily handle these steps using the maximum of their resources and easily answer higher-level research questions without the need of coding and validating preprocessing steps. The PTRAIL functionalities and its core have been used in several projects within our group, including feature engineering in the context of transportation mode detection [4], trajectory classification [5], trajectory annotation [6] and anomaly detection [7]. However, the idea of parallelizing and vectorizing the processes is the main novelty of the current version of our library. ...
Preprint
Full-text available
Trajectory data represent a trace of an object that changes its position in space over time. This kind of data is complex to handle and analyze, since it is generally produced in huge quantities, often prone to errors generated by the geolocation device, human mishandling, or area coverage limitation. Therefore, there is a need for software specifically tailored to preprocess trajectory data. In this work we propose PTRAIL, a python package offering several trajectory preprocessing steps, including filtering, feature extraction, and interpolation. PTRAIL uses parallel computation and vectorization, being suitable for large datasets and fast compared to other python libraries.
... Second, the vast majority of algorithms proposed to identify anomalies automatically may not work for local anomalies [22], or they require labeled data to train a model [23,24]. This means that deviations from normality that happen just in a small portion of a vessel trajectory may be left out when considering the trajectory as a whole, especially when analyzing works in the maritime domain. ...
Article
Full-text available
With the recent increase in the use of sea transportation, the importance of maritime surveillance for detecting unusual vessel behavior related to several illegal activities has also risen. Unfortunately, the data collected by surveillance systems are often incomplete, creating a need for the data gaps to be filled using techniques such as interpolation methods. However, such approaches do not decrease the uncertainty of ship activities. Depending on the frequency of the data generated, they may even confuse operators, inducing errors when evaluating ship activities and tagging them as unusual. Using domain knowledge to classify activities as anomalous is essential in the maritime navigation environment since there is a well-known lack of labeled data in this domain. In an area where identifying anomalous trips is a challenging task using solely automatic approaches, we use visual analytics to bridge this gap by utilizing users’ reasoning and perception abilities. In this work, we propose a visual analytics tool that uses spatial segmentation to divide trips into subtrajectories and score them. These scores are displayed in a tabular visualization where users can rank trips by segment to find local anomalies. The amount of interpolation in subtrajectories is displayed together with scores so that users can use both their insight and the trip displayed on the map to determine if the score is reliable.
Preprint
Visual Analytics (VA) tools provide ways for users to harness insights and knowledge from datasets. Recalling and retelling user experiences while utilizing VA tools has attracted significant interest. Nevertheless, each user sessions are unique. Even when different users have the same intention when using a VA tool, they may follow different paths and uncover different insights. Current methods of manually processing such data to recall and retell users' knowledge discovery paths may also be time-consuming, especially when there is the need to present users' findings to third parties. This paper presents a novel system that collects user intentions, behavior, and insights during knowledge discovery sessions, automatically structure the data, and extracts narrations of knowledge discovery as PowerPoint slide decks. The system is powered by a Knowledge Graph designed based on a formal and reproducible modeling process. To evaluate our system, we have attached it to two existing VA tools where users were asked to perform pre-defined tasks. Several slide decks and other analysis metrics were extracted from the generated Knowledge Graph. Experts scrutinized and confirmed the usefulness of our automated process for using the slide decks to disclose knowledge discovery paths to others and to verify whether the VA tools themselves were effective.
Article
Full-text available
The term Semantic Trajectories of Moving Objects (STMO) corresponds to a sequence of spatial-temporal points with associated semantic information (for example, annotations about locations visited by the user or types of transportation used). However, the growth of Big Data generated by users, such as data produced by social networks or collected by an electronic equipment with embedded sensors, causes the STMO to require services and standards for enabling data documentation and ensuring the quality of STMOs. Spatial Data Infrastructures (SDI), on the other hand, provide a shared interoperable and integrated environment for data documentation. The main challenge is how to lead traditional SDIs to evolve to an STMO document due to the lack of specific metadata standards and services for semantic annotation. This paper presents a new concept of SDI for STMO, named SDI4Trajectory, which supports the documentation of different types of STMO—holistic trajectories, for example. The SDI4Trajectory allows us to propose semi-automatic and manual semantic enrichment processes, which are efficient in supporting semantic annotations and STMO documentation as well. These processes are hardly found in traditional SDIs and have been developed through Web and semantic micro-services. To validate the SDI4Trajectory, we used a dataset collected by voluntary users through the MyTracks application for the following purposes: (i) comparing the semi-automatic and manual semantic enrichment processes in the SDI4Trajectory; (ii) investigating the viability of the documentation processes carried out by the SDI4Trajectory, which was able to document all the collected trajectories.
Conference Paper
Full-text available
A first fundamental step in the process of analyzing movement data is trajectory segmentation, i.e., splitting trajecto-ries into homogeneous segments based on some criteria. Although trajectory segmentation has been the object of several approaches in the last decade, a proposal based on a semi-supervised approach remains inexistent. A semi-supervised approach means that a user labels manually a small set of trajectories with meaningful segments and, from this set, the method infers in an unsupervised way the segments of the remaining trajecto-ries. The main advantage of this method compared to pure supervised ones is that it reduces the human effort to label the number of trajectories. In this work, we propose the use of the Minimum Description Length (MDL) principle to measure homogeneity inside segments. We also introduce the Reactive Greedy Randomized Adaptive Search Procedure for semantic Semi-supervised Trajectory Segmentation (RGRASP-SemTS) algorithm that segments trajectories by combining a limited user labeling phase with a low number of input parameters and no predefined segmenting criteria. The approach and the algorithm are presented in detail throughout the paper, and the experiments are carried out on two real-world datasets. The evaluation tests prove how our approach outperforms state-of-the-art competitors when compared to ground truth.
Article
Full-text available
Understanding transportation mode from GPS (Global Positioning System) traces is an essential topic in the data mobility domain. In this paper, a framework is proposed to predict transportation modes. This framework follows a sequence of five steps: (i) data preparation, where GPS points are grouped in trajectory samples; (ii) point features generation; (iii) trajectory features extraction; (iv) noise removal; (v) normalization. We show that the extraction of the new point features: bearing rate, the rate of rate of change of the bearing rate and the global and local trajectory features, like medians and percentiles enables many classifiers to achieve high accuracy (96.5%) and f1 (96.3%) scores. We also show that the noise removal task affects the performance of all the models tested. Finally, the empirical tests where we compare this work against state-of-art transportation mode prediction strategies show that our framework is competitive and outperforms most of them.
Article
Full-text available
The increasing availability and use of positioning devices has resulted in large volumes of trajectory data. However, semantic annotations for such data are typically added by domain experts, which is a time-consuming task. Machine-learning algorithms can help infer semantic annotations from trajectory data by learning from sets of labeled data. Specifically, active learning approaches can minimize the set of trajectories to be annotated while preserving good performance measures. The ANALYTiC web-based interactive tool visually guides users through this annotation process.
Article
Full-text available
An important problem in the knowledge discovery of trajectories is segmentation in subparts (subtrajectories). Existing algorithms for trajectory segmentation generally use explicit criteria to create segments. In this article, we propose segmenting trajectories using a novel, unsupervised approach, in which no explicit criteria are predetermined. To achieve this, we apply the Minimum Description Length (MDL) principle, which can measure homogeneity in the trajectory data by computing the similarities between landmarks (i.e. representative points of the trajectory) and the points in their neighborhood. Based on the homogeneity measurements, we propose an algorithm named Greedy Randomized Adaptive Search Procedure for Unsupervised Trajectory Segmentation (GRASP-UTS), which is a meta-heuristic that builds segments by modifying the number and positions of landmarks. We perform experiments with GRASP-UTS in two real-world datasets, using segment purity and coverage metrics to evaluate its efficiency. Experimental results demonstrate that GRASP-UTS correctly segmented sample trajectories without predetermined criteria, by computing similarities between landmarks and other trajectory points.
Article
Full-text available
Focus on movement data has increased as a consequence of the larger availability of such data due to current GPS, GSM, RFID, and sensors techniques. In parallel, interest in movement has shifted from raw movement data analysis to more application-oriented ways of analyzing segments of movement suitable for the specific purposes of the application. This trend has promoted semantically rich trajectories, rather than raw movement, as the core object of interest in mobility studies. This survey provides the definitions of the basic concepts about mobility data, an analysis of the issues in mobility data management, and a survey of the approaches and techniques for: (i) constructing trajectories from movement tracks, (ii) enriching trajectories with semantic information to enable the desired interpretations of movements, and (iii) using data mining to analyze semantic trajectories and extract knowledge about their characteristics, in particular the behavioral patterns of the moving objects. Last but not least, the article surveys the new privacy issues that arise due to the semantic aspects of trajectories.
Article
Full-text available
On a grand scale, visual analytics solutions provide technology that combines the strengths of human and electronic data processing. Visualization becomes the medium of a semi-automated analytical process, where humans and machines cooperate using their respective distinct capabilities for the most effective results. The diversity of these tasks can not be tackled with a single theory. Visual analytics research is highly interdisciplinary and combines various related research areas such as visualization, data mining, data management, data fusion, statistics and cognition science (among others).
Conference Paper
Traditionally, the information about human mobility behavior, called diary, is acquired from volunteers by means of paper-and-pencil surveys. These diaries, representing the mobile activities of individuals, are semantically rich, but lack in spatial and temporal precision. An alternative way is collecting diaries by annotating with activities the GPS tracks of individuals. This is more accurate from a spatio-temporal point of view, but the manual annotation becomes a burdensome work for the user. The tool we propose, called DayTag, is designed as a personal assistant to help an individual to reconstruct her/his diary from the GPS tracks collected by a smartphone. The user interacts through the software to visualize and annotate the trajectories, thus resulting in a simple way to get user diaries.
Valéria Cesário Times, Stan Matwin, and Lucídio dos Anjos Formiga Cabral
  • Amílcar Soares Júnior
  • Bruno Neiva Moreno
Amílcar Soares Júnior, Bruno Neiva Moreno, Valéria Cesário Times, Stan Matwin, and Lucídio dos Anjos Formiga Cabral. 2015. GRASP-UTS: an algorithm for unsupervised trajectory segmentation. International Journal of Geographical Information Science 29, 1 (2015), 46-68.
SeMiTri: a framework for semantic annotation of heterogeneous trajectories
  • Zhixian Yan
  • Dipanjan Chakraborty
  • Christine Parent
  • Stefano Spaccapietra
  • Karl Aberer
Zhixian Yan, Dipanjan Chakraborty, Christine Parent, Stefano Spaccapietra, and Karl Aberer. 2011. SeMiTri: a framework for semantic annotation of heterogeneous trajectories. In Proceedings of EDBT. ACM, 259-270.