ArticlePDF Available

Abstract and Figures

An important problem in the knowledge discovery of trajectories is segmentation in subparts (subtrajectories). Existing algorithms for trajectory segmentation generally use explicit criteria to create segments. In this article, we propose segmenting trajectories using a novel, unsupervised approach, in which no explicit criteria are predetermined. To achieve this, we apply the Minimum Description Length (MDL) principle, which can measure homogeneity in the trajectory data by computing the similarities between landmarks (i.e. representative points of the trajectory) and the points in their neighborhood. Based on the homogeneity measurements, we propose an algorithm named Greedy Randomized Adaptive Search Procedure for Unsupervised Trajectory Segmentation (GRASP-UTS), which is a meta-heuristic that builds segments by modifying the number and positions of landmarks. We perform experiments with GRASP-UTS in two real-world datasets, using segment purity and coverage metrics to evaluate its efficiency. Experimental results demonstrate that GRASP-UTS correctly segmented sample trajectories without predetermined criteria, by computing similarities between landmarks and other trajectory points.
Content may be subject to copyright.
A preview of the PDF is not available
... if t = |T i | then 19 subtraj ← s p , · · · , p t ; M.append((s t , a t , r t , s t+1 )); 30 Uniformly sample a minibatch from M; 31 Perform a stochastic gradient descent algorithm on the loss function; ...
... There are two main categories of work on the sub-trajectory clustering problem. The first line of work segments trajectories into sub-trajectories according to specific criteria such as location, direction, speed, and shape [3,6,10,30], ensuring homogeneity within each sub-trajectory. Subsequently, a clustering algorithm is used to group these sub-trajectories. ...
Article
Full-text available
Sub-trajectory clustering is a fundamental problem in many trajectory applications. Existing approaches usually divide the clustering procedure into two phases: segmenting trajectories into sub-trajectories and then clustering these sub-trajectories. However, researchers need to develop complex human-crafted segmentation rules for specific applications, making the clustering results sensitive to the segmentation rules and lacking in generality. To solve this problem, we propose a novel algorithm using the clustering results to guide the segmentation, which is based on reinforcement learning (RL). The novelty is that the segmentation and clustering components cooperate closely and improve each other continuously to yield better clustering results. To devise our RL-based algorithm, we model the procedure of trajectory segmentation as a Markov decision process (MDP). We apply Deep-Q-Network (DQN) learning to train an RL model for the segmentation and achieve excellent clustering results. Experimental results on real datasets demonstrate the superior performance of the proposed RL-based approach over state-of-the-art methods.
... Applying traditional supervised learning techniques to classify time series data encounters significant limitations regarding access to labeled data. The lack of annotated data for training is a significant obstacle that prevents the creation of precise classification models [3,4,5,6,7,8]. In contrast to structured datasets, time series data necessitate complex and frequently expensive labeling procedures, which makes it impossible to gather a sufficiently extensive and varied annotated dataset [9]. ...
Preprint
Training machine learning models for classification tasks often requires labeling numerous samples, which is costly and time-consuming, especially in time series analysis. This research investigates Active Learning (AL) strategies to reduce the amount of labeled data needed for effective time series classification. Traditional AL techniques cannot control the selection of instances per class for labeling, leading to potential bias in classification performance and instance selection, particularly in imbalanced time series datasets. To address this, we propose a novel class-balancing instance selection algorithm integrated with standard AL strategies. Our approach aims to select more instances from classes with fewer labeled examples, thereby addressing imbalance in time series datasets. We demonstrate the effectiveness of our AL framework in selecting informative data samples for two distinct domains of tactile texture recognition and industrial fault detection. In robotics, our method achieves high-performance texture categorization while significantly reducing labeled training data requirements to 70%. We also evaluate the impact of different sliding window time intervals on robotic texture classification using AL strategies. In synthetic fiber manufacturing, we adapt AL techniques to address the challenge of fault classification, aiming to minimize data annotation cost and time for industries. We also address real-life class imbalances in the multiclass industrial anomalous dataset using our class-balancing instance algorithm integrated with AL strategies. Overall, this thesis highlights the potential of our AL framework across these two distinct domains.
... The most common sources of collection of such data are devices such as AIS sensors, GPS sensors, and mobile devices [18], which rely heavily on internet connectivity. Several tasks are necessary to properly work with trajectory data in a data mining setup, including: (i) data fusion [33,21]; (ii) compression [10,22];(iii) segmentation [34,19,6]; (iv) classification [12,5]; (v) clustering [10,42]; and (vi) outlier detection [11,4,2] to name a few. Furthermore, the collection of trajectory data has also raised some privacy concerns regarding tracking human subjects and their movements [40]. ...
Thesis
Full-text available
Data augmentation has emerged as a powerful technique in machine learning, strengthening model robustness while mitigating overfitting and underfitting issues by generating diverse synthetic data. Nevertheless, despite its success in other domains, data augmentation's potential remains largely untapped in mobility data analysis, primarily due to the intricate nature and unique format of trajectory data. Additionally, there is a lack of frameworks capable of point-wise data augmentation, which can reliably generate synthetic trajectories while preserving the inherent characteristics of the original data. To address these challenges, this research introduces AugmenTRAJ, an open-source Python3 framework designed explicitly for trajectory data augmentation. AugmenTRAJ offers a reliable and well-controlled approach to generating synthetic trajectories, thereby enabling the harnessing of data augmentation benefits in mobility analysis. This thesis presents a comprehensive overview of the methodologies employed in developing AugmenTRAJ and showcases the various data augmentation techniques available within the framework. AugmenTRAJ opens new possibilities for enhancing mobility data analysis models' performance and generalization capabilities by providing researchers with a practical and versatile tool for augmenting trajectory data. Its user-friendly implementation in Python3 facilitates easy integration into existing workflows, offering the community an accessible resource to leverage the full potential of data augmentation in trajectory-based applications.
... Trajectory data records the change in the position of an object relative to time. There are many tasks necessary to properly work with trajectory data in a data mining setup, including: (i) data fusion [31,18]; (ii) compression [7,21];(iii) segmentation [33,17,6]; (iv) classification [9,5]; (v) clustering [7,37]; and (vi) outlier detection [8,4,1]. ...
Thesis
Full-text available
The advent of compact, handheld devices has given us a pool of tracked movement data that could be used to infer trends and patterns that can be made to use. With this flooding of various trajectory data of animals, humans, vehicles, etc., the idea of ANALYTiC originated, using active learning to infer semantic annotations from the trajectories by learning from sets of labeled data. This study explores the application of dimensionality reduction and decision boundaries in combination with the already present active learning, highlighting patterns and clusters in data. We test these features with three different trajectory datasets with objective of exploiting the the already labeled data and enhance their interpretability. Our experimental analysis exemplifi�es the potential of these combined methodologies in improving the efficiency and accuracy of trajectory labeling. This study serves as a stepping-stone towards the broader integration of machine learning and visual methods in context of movement data analysis.
Chapter
Vessel trajectory data, usually derived from AIS data, serves as a robust foundation for extensive research on vessel movements and behaviors. The task of vessel trajectory segmentation identifies the typical sub-trajectory segments in the vessel trajectories, which are recognized as natural, interpretable, and meaningful basic vessel behaviors. Consequently, vessel trajectory segmentation is essential for machine learning and recommendation in the shipping field. However, a systematic literature review on vessel trajectory segmentation is still absent. In this survey, we provide an overview of vessel trajectory segmentation, covering data description, fundamental concepts, typical methods (both supervised and unsupervised), as well as applications. Furthermore, we discuss the challenges and future directions of vessel trajectory segmentation.
Article
Full-text available
Mobility data of a moving object, called trajectory data, are continuously generated by vessel navigation systems, wearable devices, and drones, to name a few. Trajectory data consist of samples that include temporal, spatial, and other descriptive features of object movements. One of the main challenges in trajectory data analysis is to divide trajectory data into meaningful segments based on certain criteria. Most of the available segmentation algorithms are limited to processing data offline, i.e., they cannot segment a stream of trajectory samples. In this work, we propose an approach called Reactive Buffering Window - Trajectory Segmentation (RBW-TS), which partitions trajectory data into segments while receiving a stream of trajectory samples. Another novelty compared to existing work is that the proposed algorithm is based on multidimensional features of trajectories, and it can incorporate as many relevant features of the underlying trajectory as needed. This makes RBW-TS general and applicable to numerous domains by simply selecting trajectory features relevant for segmentation purposes. The proposed online algorithm incurs lower computational and memory requirements. Furthermore, it is robust to noisy samples and outliers. We validate RBW-TS on three use cases: (a) segmenting human-movement trajectories in different modes of transportation, (b) segmenting trajectories generated by vessels in the maritime domain, and (c) segmenting human-movement trajectories in a commercial shopping center. The numerical results detailed in the paper demonstrate that (i) RBW-TS is capable of detecting the true breakpoints of segments in all three usecases while processing a stream of trajectory points; (ii) despite low memory and computational requirements, the performance in terms of the harmonic mean of purity and coverage is comparable to that of state-of-the-art batch and online algorithms; (iii) RBW-TS achieves different levels of accuracy depending on the various internal parameter estimation methods used; and (iv) RBW-TS can tackle real-world trajectory data for segmentation purposes.
Article
Detecting waypoints where vessels change their behavior (i.e., maneuvers, speed changes, etc.) is essential for optimizing vessel trajectories to increase the efficiency and safety of sailing. However, accurately detecting waypoints is challenging due to potential AIS data quality issues (i.e., missing or inaccurate messages). In this paper, we propose a five-step learning approach (SafeWay) to estimate waypoints on a given AIS trajectory. First, we interpolate trajectories to tackle AIS data quality issues. Then, we annotate historical trajectories by using an existing waypoint library that contains historical waypoints. As the historical waypoints are passage plans manually created by port operators considering sailing conditions at that time, they are not specific to other historical trajectories between the same ports. We, therefore, use a similarity metric to determine overlapping segments of historical trajectories with the historical waypoints from the waypoint library. Then, we build a transformer model to capture vessel movement patterns based on speed- and location-related features. We do not process location features directly to avoid learning location-specific context, but take into account tailored delta features. We test our approach on a real-world AIS dataset collected from the Norwegian Sea between Å lesund and Måløy and show its effectiveness in terms of a harmonic mean of purity and coverage, mean absolute error and detection rate on the task of detecting trajectory waypoints compared to a state-of-the-art approach. We also show the effectiveness of the trained model on the trajectories obtained from two other regions, the North Sea (London and Rotterdam) and the North Atlantic Ocean (Setubal and Gibraltar), on which the model has not been trained. The experiments indicate that our interpolation-enabled transformer design provides improvements in the safety of the estimated waypoints.
Chapter
Full-text available
Recent improvements in positioning technology have led to a massive moving object data. A crucial task is to find the moving objects that travel together. Usually, they are called spatio-temporal patterns. Due to the emergence of many different kinds of spatio-temporal patterns in recent years, different approaches have been proposed to extract them. However, each approach only focuses on mining a specific kind of pattern. In addition to the fact that it is a painstaking task due to the large number of algorithms used to mine and manage patterns, it is also time consuming. Additionally, we have to execute these algorithms again whenever new data are added to the existing database. To address these issues, we first redefine spatio-temporal patterns in the itemset context. Secondly, we propose a unifying approach, named GeT_Move, using a frequent closed itemset-based spatio-temporal pattern-mining algorithm to mine and manage different spatio-temporal patterns. GeT_Move is implemented in two versions which are GeT_Move and Incremental GeT_Move. Experiments are performed on real and synthetic datasets and the results show that our approaches are very effective and outperform existing algorithms in terms of efficiency.
Article
Full-text available
Mobility and spatial interaction data have become increasingly available due to the wide adoption of location‐aware technologies. Examples of mobility data include human daily activities, vehicle trajectories, and animal movements, among others. In this article we focus on a special type of mobility data, i.e. origin‐destination pairs, and present a new approach to the discovery and understanding of spatio‐temporal patterns in the movements. Specifically, to extract information from complex connections among a large number of point locations, the approach involves two steps: (1) spatial clustering of massive GPS points to recognize potentially meaningful places; and (2) extraction and mapping of the flow measures of clusters to understand the spatial distribution and temporal trends of movements. We present a case study with a large dataset of taxi trajectories in Shenzhen, China to demonstrate and evaluate the methodology. The contribution of the research is two‐fold. First, it presents a new methodology for detecting location patterns and spatial structures embedded in origin‐destination movements. Second, the approach is scalable to large data sets and can summarize massive data to facilitate pattern extraction and understanding.
Article
Full-text available
The knowledge of the transportation mode used by humans e.g. bicycle, on foot, car and train is critical for travel behaviour research, transport planning and traffic management. Nowadays, new technologies such as the Global Positioning System have replaced traditional survey methods paper diaries, telephone because they are more accurate and problems such as under reporting are avoided. However, although the movement data collected timestamped positions in digital form have generally high accuracy, they do not contain the transportation mode. We present in this article a new method for segmenting movement data into single-mode segments and for classifying them according to the transportation mode used. Our fully automatic method differs from previous attempts for five reasons: 1 it relies on fuzzy concepts found in expert systems, that is membership functions and certainty factors; 2 it uses OpenStreetMap data to help the segmentation and classification process; 3 we can distinguish between 10 transportation modes including between tram, bus and car and propose a hierarchy; 4 it handles data with signal shortages and noise, and other real-life situations; 5 in our implementation, there is a separation between the reasoning and the knowledge, so that users can easily modify the parameters used and add new transportation modes. We have implemented the method and tested it with a 17-million point data set collected in the Netherlands and elsewhere in Europe. The accuracy of the classification with the developed prototype, determined with the comparison of the classified results with the reference data derived from manual classification, is 91.6%.
Article
Full-text available
Place-oriented analysis of movement data, i.e., recorded tracks of moving objects, includes finding places of interest in which certain types of movement events occur repeatedly and investigating the temporal distribution of event occurrences in these places and, possibly, other characteristics of the places and links between them. For this class of problems, we propose a visual analytics procedure consisting of four major steps: 1) event extraction from trajectories; 2) extraction of relevant places based on event clustering; 3) spatiotemporal aggregation of events or trajectories; 4) analysis of the aggregated data. All steps can be fulfilled in a scalable way with respect to the amount of the data under analysis; therefore, the procedure is not limited by the size of the computer's RAM and can be applied to very large data sets. We demonstrate the use of the procedure by example of two real-world problems requiring analysis at different spatial scales.
Chapter
Full-text available
An important problem in the study of moving objects is the identification of stops. This problem becomes more difficult due to error-prone recording devices. We propose a method that discovers stops in a trajectory that contains artifacts, namely movements that did not actually take place but correspond to recording errors. Our method is an interactive density-based clustering algorithm, for which we define density on the basis of both the spatial and the temporal properties of a trajectory. The interactive setting allows the user to tune the algorithm and to study the stability of the anticipated stops.
Article
Development in techniques of spatial data acquisition enables us to easily record the trajectories of moving objects. Movement of human beings, animals, and birds can be captured by GPS loggers. The obtained data are analyzed by visualization, clustering, and classification to detect patterns frequently or rarely found in trajectories. To extract a wider variety of patterns in analysis, this article proposes a new method for analyzing trajectories on a network space. The method first extracts primary routes as subparts of trajectories. The topological relations among primary routes and trajectories are visualized as both a map and a graph‐based diagram. They permit us to understand the spatial and topological relations among the primary routes and trajectories at both global and local scales. The graph‐based diagram also permits us to classify trajectories. The representativeness of primary routes is evaluated by two numerical measures. The method is applied to the analysis of daily travel behavior of one of the authors. Technical soundness of the method is discussed as well as empirical findings.
Article
Many devices generate large amounts of data that follow some sort of sequentiality, e.g., motion sensors, e-pens, eye trackers, etc. and often these data need to be compressed for classification, storage, and/or retrieval tasks. Traditional clustering algorithms can be used for this purpose, but unfortunately they do not cope with the sequential information implicitly embedded in such data. Thus, we revisit the well-known K-means algorithm and provide a general method to properly cluster sequentially-distributed data. We present Warped K-Means (WKM), a multi-purpose partitional clustering procedure that minimizes the sum of squared error criterion, while imposing a hard sequentiality constraint in the classification step. We illustrate the properties of WKM in three applications, one being the segmentation and classification of human activity. WKM outperformed five state-of-the-art clustering techniques to simplify data trajectories, achieving a recognition accuracy of near 97%, which is an improvement of around 66% over their peers. Moreover, such an improvement came with a reduction in the computational cost of more than one order of magnitude.
Article
A new method for encoding a videoconference image sequence, termed adaptive neural net vector quantisation (ANNVQ), has been derived. It is based on Kohonen's self-organised feature maps, a neural network type clustering algorithm. The new method differs from it, in that after training the initial codebook, a modified form of adaptation resumes, in order to respond to scene changes and motion. The main advantages are high image quality with modest bit rates and effective adaptation to motion and scene changes, with the capability to quickly adjust the instantaneous bit rate in order to keep the image quality constant. This is a good match to packet switched networks where variable bit rate and uniform image quality are highly desirable. Simulation experiments have been carried out with 4 × 4 blocks of pixels from an image sequence consisting of 20 frames of size 112 × 96 pixels each. With a codebook size of 512, ANNVQ results in high image quality upon image reconstruction, with peak signal-to-noise ratio (PSNR) of about 36 to 37 dB, at coding bit rates of about 0.50 bit/pixel. This compares quite favourably with classical vector quantisation at a similar bit rate. Moreover, this value of PSNR remains approximately constant, even when encoding image frames with considerable motion.