Article

Trajectory Data Mining: An Overview

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The advances in location-acquisition and mobile computing techniques have generated massive spatial trajectory data, which represent the mobility of a diversity of moving objects, such as people, vehicles, and animals. Many techniques have been proposed for processing, managing, and mining trajectory data in the past decade, fostering a broad range of applications. In this article, we conduct a systematic survey on the major research into trajectory data mining, providing a panorama of the field as well as the scope of its research topics. Following a road map from the derivation of trajectory data, to trajectory data preprocessing, to trajectory data management, and to a variety of mining tasks (such as trajectory pattern mining, outlier detection, and trajectory classification), the survey explores the connections, correlations, and differences among these existing techniques. This survey also introduces the methods that transform trajectories into other data formats, such as graphs, matrices, and tensors, to which more data mining and machine learning techniques can be applied. Finally, some public trajectory datasets are presented. This survey can help shape the field of trajectory data mining, providing a quick understanding of this field to the community.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Existing edge bundling algorithms assess edge similarities through compatibility metrics, which consider factors such as topology [42] and importance [39] but often fail to capture trajectory similarities. To better capture trajectory similarities, we design our compatibility metric based on dynamic time warping (DTW) [36], a widely accepted metric for assessing trajectory similarity [54]. DTW calculates the distance between two trajectories by finding the optimal alignment between points on them, thereby capturing the overall similarity between the entire trajectories [36]. ...
... Supporting more trajectory patterns. While our current method effectively highlights the global trend and local hotspots, trajectory data often contains other patterns that can provide valuable insights [54]. For instance, periodic patterns represent movements that repeat at regular intervals, such as daily commutes or seasonal migrations [5]. ...
... Since the global trend can be depicted by representative trajectories [54], we generate a smooth trajectory using B-splines as the ground truth. We include one or two bends in these B-splines, leading to 2 types of the global trend in our dataset. ...
Preprint
Animating objects' movements is widely used to facilitate tracking changes and observing both the global trend and local hotspots where objects converge or diverge. Existing methods, however, often obscure critical local hotspots by only considering the start and end positions of objects' trajectories. To address this gap, we propose RouteFlow, a trajectory-aware animated transition method that effectively balances the global trend and local hotspots while minimizing occlusion. RouteFlow is inspired by a real-world bus route analogy: objects are regarded as passengers traveling together, with local hotspots representing bus stops where these passengers get on and off. Based on this analogy, animation paths are generated like bus routes, with the object layout generated similarly to seat allocation according to their destinations. Compared with state-of-the-art methods, RouteFlow better facilitates identifying the global trend and locating local hotspots while performing comparably in tracking objects' movements.
... A unique quality of ST data that differentiates it from other data studied in classical data mining literature (e.g., see [Tan et al. 2017]) is the presence of dependencies among measurements induced presented in Mamoulis 2009;Zheng 2015]. A survey on STDM by [Shekhar et al. 2015] provides a semantic categorization of ST data types and pattern families from a database-centric perspective. ...
... For example, given a collection of crime events, we may be interested in finding regions in space and time with similar crime activities. This clustering objective has been studied in the context of crime data [Eftelioglu et al. 2014], twitter data [Abdelhaq et al. 2013;Chierichetti et al. 2014;Ihler et al. 2006;Walther and Kaisser 2013;Weng and Lee 2011], geo-tagged photos [Zheng et al. 2012], traffic accidents [Zheng et al. 2012], and epidemiological data [Glatman-Freedman et al. 2016]. A number of techniques for clustering ST points are based on the DBSCAN algorithm [Ester et al. 1996], which is a widely used method for finding arbitrarily shaped clusters of spatial points based on the density of points. ...
... For example, given a collection of crime events, we may be interested in finding regions in space and time with similar crime activities. This clustering objective has been studied in the context of crime data [Eftelioglu et al. 2014], twitter data [Abdelhaq et al. 2013;Chierichetti et al. 2014;Ihler et al. 2006;Walther and Kaisser 2013;Weng and Lee 2011], geo-tagged photos [Zheng et al. 2012], traffic accidents [Zheng et al. 2012], and epidemiological data [Glatman-Freedman et al. 2016]. A number of techniques for clustering ST points are based on the DBSCAN algorithm [Ester et al. 1996], which is a widely used method for finding arbitrarily shaped clusters of spatial points based on the density of points. ...
Preprint
Large volumes of spatio-temporal data are increasingly collected and studied in diverse domains including, climate science, social sciences, neuroscience, epidemiology, transportation, mobile health, and Earth sciences. Spatio-temporal data differs from relational data for which computational approaches are developed in the data mining community for multiple decades, in that both spatial and temporal attributes are available in addition to the actual measurements/attributes. The presence of these attributes introduces additional challenges that needs to be dealt with. Approaches for mining spatio-temporal data have been studied for over a decade in the data mining community. In this article we present a broad survey of this relatively young field of spatio-temporal data mining. We discuss different types of spatio-temporal data and the relevant data mining questions that arise in the context of analyzing each of these datasets. Based on the nature of the data mining problem studied, we classify literature on spatio-temporal data mining into six major categories: clustering, predictive learning, change detection, frequent pattern mining, anomaly detection, and relationship mining. We discuss the various forms of spatio-temporal data mining problems in each of these categories.
... The recent years have witnessed a dramatic increase in the use of large amounts of data available thanks to Information and Communication Technologies. These new sources allow to monitor and to map the dynamical properties of many complex systems at an unprecedented scale [1] and we have now access to a vast number of spatial trajectories representing movements of objects in geographical space [2]. In particular, such datasets have opened the opportunity to better understand human movements [3][4][5][6][7] and the impact of mobility on important processes such as epidemic spreading [8]. ...
... Real trajectories always exhibit a large variety of intertwined static and dynamic behaviours [20]: slow versus fast movement behaviour for animals [19], fixation versus saccade in eyetracking [21], or activities versus trips in human mobility [22]. Isolating and identifying these behaviours from a series of chronologically ordered points is an important statistical challenge [23] and a growing array of methods based on spatio-temporal characteristics of the trajectories have been developed to perform this task automatically [2,19,21]. These methods are however often tailored for the specific dataset in question [20]. ...
... (t + τ )s 2 (s + u) 2 (1 − f (s + u) g(s)) , (26) with ...
Preprint
In empirical studies of random walks, continuous trajectories of animals or individuals are usually sampled over a finite number of points in space and time. It is however unclear how this partial observation affects the measured statistical properties of the walk, and we use here analytical and numerical methods of statistical physics to study the effects of sampling in movements alternating rests and moves of random durations. We evaluate how the statistical properties estimated are affected by the way trajectories are measured and we identify an optimal sampling frequency leading to the best possible measure. We solve analytically the simplest scenario of a constant sampling interval and short-tailed distributions of rest and move durations, which allows us to show that the measured displacement statistics can be significantly different from the original ones and also to determine the optimal sampling time. The corresponding optimal fraction of correctly sampled movements, analytically predicted for this short-tail scenario, is an upper bound for the quality of a trajectory's sampling. Indeed, we show with numerical simulations that this fraction is dramatically reduced in any real-world case where we observe long-tailed distributions of rest duration. We test our results with high resolution GPS human trajectories, where a constant sampling interval allows to recover at best 18%18\% of the movements, while over-evaluating the average trip length by a factor of 2. If we use a sampling interval extracted from real communication data, we recover only 11%11\% of moves, a value that cannot be increased above 16%16\% even with ideal algorithms. These figures call for a more cautious use of data in all quantitative studies of individuals' trajectories, casting in particular serious doubts on the results of previous studies on human mobility based on mobile phone data.
... The multitude of methods for the several DM tasks related to spatio-temporal and mobility data has been described in numerous dedicated reviews [16]- [18]. A noteworthy example of such a study can be found in the work by Zheng in [19], where the different steps of processing trajectory records are presented in great detail. Together with the major DM tasks found in mobility data, this work delineates the transformation of such data into several formats and provides a list of some open dataset for experimentation. ...
... Following the structure of DM, works related to trajectory data are categorized into larger types [19]. First, classifying the trajectories in a set of predetermined groups, can be semantically translated as determining the type of movement; which can be further interpreted as labeling the transportation mode of a specific segment of the trajectory or recognizing the purpose of the trip [190]. ...
Article
Full-text available
Recent advancements in sensor and tracking technologies have facilitated the real-time tracking of marine vessels as they traverse the oceans. As a result, there is an increasing demand to analyze these datasets to derive insights into vessel movement patterns and to investigate activities occurring within specific spatial and temporal contexts. This survey offers a comprehensive review of contemporary research in trajectory data mining, with a particular focus on maritime applications. The article collects and evaluates state-of-the-art algorithmic approaches and key techniques pertinent to various use case scenarios within this domain. Furthermore, this study provides an in-depth analysis of recent developments in trajectory data mining as applied to the maritime sector, identifying available data sources and conducting a detailed examination of significant applications, including trajectory forecasting, activity recognition, and trajectory clustering.
... problems in the area include trajectory classification, clustering, prediction, simplification, and anomaly detection (see [5,30,87] for comprehensive surveys). In this research, we focus on the trajectory prediction problem, which refers to the task of predicting the future path or trajectory of an object (or individual) based on its current state and historical data. ...
... It enhances safety in autonomous driving [33] and maritime navigation [17,80] by reducing collision risks, improves resource management and efficiency in logistics and urban planning, and aids in public transport and real-time traffic management for more efficient systems and reduced congestion [78]. Additionally, it enables geospatial analysis in urban environments for optimized traffic infrastructure planning and location-based services and recommendations [87]. In the social sciences, our model offers valuable insights into human crowd behavior [66][67][68], while in epidemiology, it supports the development of mobility-based models for understanding the spread of infectious diseases [1,10,11,57,58,84]. ...
Preprint
Full-text available
Trajectory prediction aims to estimate an entity's future path using its current position and historical movement data, benefiting fields like autonomous navigation, robotics, and human movement analytics. Deep learning approaches have become key in this area, utilizing large-scale trajectory datasets to model movement patterns, but face challenges in managing complex spatial dependencies and adapting to dynamic environments. To address these challenges, we introduce TrajLearn, a novel model for trajectory prediction that leverages generative modeling of higher-order mobility flows based on hexagonal spatial representation. TrajLearn predicts the next k steps by integrating a customized beam search for exploring multiple potential paths while maintaining spatial continuity. We conducted a rigorous evaluation of TrajLearn, benchmarking it against leading state-of-the-art approaches and meaningful baselines. The results indicate that TrajLearn achieves significant performance gains, with improvements of up to ~40% across multiple real-world trajectory datasets. In addition, we evaluated different prediction horizons (i.e., various values of k), conducted resolution sensitivity analysis, and performed ablation studies to assess the impact of key model components. Furthermore, we developed a novel algorithm to generate mixed-resolution maps by hierarchically subdividing hexagonal regions into finer segments within a specified observation area. This approach supports selective detailing, applying finer resolution to areas of interest or high activity (e.g., urban centers) while using coarser resolution for less significant regions (e.g., rural areas), effectively reducing data storage requirements and computational overhead. We promote reproducibility and adaptability by offering complete code, data, and detailed documentation with flexible configuration options for various applications.
... Many techniques have been proposed for processing, managing, and mining the trajectory data in the past decade [55]. Several other studies try to leverage the spatial data in recommender systems [23]. ...
... These works include; trajectory pattern mining to nd the next location of an individual [5,28,39,53], anomaly detection to detect unexpected movement patterns [25,33], and trajectory classi cation to di erentiate between trajectories of di erent states, such as motions, transportation modes, and human activities [48]. A comprehensive review of these methods can be found in the recent survey [55]. We also discriminate our work from location recommendation and trajectory mining methods, because our goal is to model the check-ins of users not to recommend a location or to nd the trajectory patterns of users with the position data of their routes. ...
Preprint
Social networks are getting closer to our real physical world. People share the exact location and time of their check-ins and are influenced by their friends. Modeling the spatio-temporal behavior of users in social networks is of great importance for predicting the future behavior of users, controlling the users' movements, and finding the latent influence network. It is observed that users have periodic patterns in their movements. Also, they are influenced by the locations that their close friends recently visited. Leveraging these two observations, we propose a probabilistic model based on a doubly stochastic point process with a periodic decaying kernel for the time of check-ins and a time-varying multinomial distribution for the location of check-ins of users in the location-based social networks. We learn the model parameters using an efficient EM algorithm, which distributes over the users. Experiments on synthetic and real data gathered from Foursquare show that the proposed inference algorithm learns the parameters efficiently and our model outperforms the other alternatives in the prediction of time and location of check-ins.
... An example is Telschow et al. (2016), who considered the extrinsic mean function and warping for functional data lying on SO(3). Examples of data lying on a Euclidean sphere include geographical data (Zheng 2015) on S 2 , directional data on S 1 (Mardia and Jupp 2009), and square-root compositional data (Huckemann and Eltzner 2016), for which we will study longitudinal/functional versions in Section 4. Sphere-valued functional data naturally arise when data on a sphere have a time component, such as in recordings of airplane flight paths or animal migration trajectories. Our main goal is to extend and study the dimension reduction that is afforded by the popular functional principal component analysis (FPCA) in Euclidean spaces to the case of samples of smooth curves that lie on a smooth Riemannian manifold, taking into account the underlying geometry. ...
... To explain 95% of total variation, 14 components are needed for SFPCA, but 18 for L 2 FPCA. Trajectory data of this kind on geographical spaces corresponding to the surface of the earth that may be approximated by the sphere S 2 have drawn extensive interest in computer science and machine learning communities (Zheng 2015;Anirudh et al. 2017). The preprocessed flight trajectories are visualized in Figure 4, indicating that the flight trajectories from the three airlines overlap and are thus not easy to discriminate. ...
Preprint
Functional data analysis on nonlinear manifolds has drawn recent interest. Sphere-valued functional data, which are encountered for example as movement trajectories on the surface of the earth, are an important special case. We consider an intrinsic principal component analysis for smooth Riemannian manifold-valued functional data and study its asymptotic properties. Riemannian functional principal component analysis (RFPCA) is carried out by first mapping the manifold-valued data through Riemannian logarithm maps to tangent spaces around the time-varying Fr\'echet mean function, and then performing a classical multivariate functional principal component analysis on the linear tangent spaces. Representations of the Riemannian manifold-valued functions and the eigenfunctions on the original manifold are then obtained with exponential maps. The tangent-space approximation through functional principal component analysis is shown to be well-behaved in terms of controlling the residual variation if the Riemannian manifold has nonnegative curvature. Specifically, we derive a central limit theorem for the mean function, as well as root-n uniform convergence rates for other model components, including the covariance function, eigenfunctions, and functional principal component scores. Our applications include a novel framework for the analysis of longitudinal compositional data, achieved by mapping longitudinal compositional data to trajectories on the sphere, illustrated with longitudinal fruit fly behavior patterns. RFPCA is shown to be superior in terms of trajectory recovery in comparison to an unrestricted functional principal component analysis in applications and simulations and is also found to produce principal component scores that are better predictors for classification compared to traditional functional functional principal component scores.
... A few comprehensive surveys on the topic can be found in Zheng [62], Alturi et al. [6], and Hamdi et al. [16]. In this research, we focus on the problem of constructing a small set of basic building blocks that can represent a wide range of trajectories, known as a (trajectory) pathlet dictionary (PD). ...
... Trajectory data mining has been an active research direction for a long time [6,16,62]. This high interest can largely be attributed to the rapid development and prominence of geospatial technologies [11], location-based smart devices [40], abundance of GPS-based applications [37], and generation of massive trajectory datasets [13]. ...
Preprint
Full-text available
Advances in tracking technologies have spurred the rapid growth of large-scale trajectory data. Building a compact collection of pathlets, referred to as a trajectory pathlet dictionary, is essential for supporting mobility-related applications. Existing methods typically adopt a top-down approach, generating numerous candidate pathlets and selecting a subset, leading to high memory usage and redundant storage from overlapping pathlets. To overcome these limitations, we propose a bottom-up strategy that incrementally merges basic pathlets to build the dictionary, reducing memory requirements by up to 24,000 times compared to baseline methods. The approach begins with unit-length pathlets and iteratively merges them while optimizing utility, which is defined using newly introduced metrics of trajectory loss and representability. We develop a deep reinforcement learning framework, PathletRL, which utilizes Deep Q-Networks (DQN) to approximate the utility function, resulting in a compact and efficient pathlet dictionary. Experiments on both synthetic and real-world datasets demonstrate that our method outperforms state-of-the-art techniques, reducing the size of the constructed dictionary by up to 65.8%. Additionally, our results show that only half of the dictionary pathlets are needed to reconstruct 85% of the original trajectory data. Building on PathletRL, we introduce PathletRL++, which extends the original model by incorporating a richer state representation and an improved reward function to optimize decision-making during pathlet merging. These enhancements enable the agent to gain a more nuanced understanding of the environment, leading to higher-quality pathlet dictionaries. PathletRL++ achieves even greater dictionary size reduction, surpassing the performance of PathletRL, while maintaining high trajectory representability.
... With the rapid development of Global Positioning Systems (GPS) and Geographic Information Systems (GIS), the number of human mobility trajectories has soared, significantly advancing research in spatio-temporal data mining, such as urban planning (Bao et al. 2017;Wang et al. 2023Wang et al. , 2024b, business location selection (Li et al. 2016), and travel time estimation (Reich et al. 2019;Wen et al. 2024). However, due to obstacles including privacy issues (Cao and Li 2021), government regulations (Chen et al. 2024a), and data processing costs (Zheng 2015), it is not easy for researchers to obtain high-quality real-world trajectory data. A promising solution to these challenges is trajectory generation, which not only meets privacy requirements but also allows for the creation of diverse high-fidelity trajectories. ...
Article
Trajectory generation has garnered significant attention from researchers in the field of spatio-temporal analysis, as it can generate substantial synthesized human mobility trajectories that enhance user privacy and alleviate data scarcity. However, existing trajectory generation methods often focus on improving trajectory generation quality from a singular perspective, lacking a comprehensive semantic understanding across various scales. Consequently, we are inspired to develop a HOlistic SEmantic Representation (HOSER) framework for navigational trajectory generation. Given an origin-and-destination (OD) pair and the starting time point of a latent trajectory, we first propose a Road Network Encoder to expand the receptive field of road- and zone-level semantics. Second, we design a Multi-Granularity Trajectory Encoder to integrate the spatio-temporal semantics of the generated trajectory at both the point and trajectory levels. Finally, we employ a Destination-Oriented Navigator to seamlessly integrate destination-oriented guidance. Extensive experiments on three real-world datasets demonstrate that HOSER outperforms state-of-the-art baselines by a significant margin. Moreover, the model's performance in few-shot learning and zero-shot learning scenarios further verifies the effectiveness of our holistic semantic representation.
... For example, by incorporating a time dimension into the Moran scatterplot, a LISA trajectory can be obtained by chronologically connecting all Moran points. Trajectory mining methods (Zheng 2015) can then be applied to analyze S-T LISA trajectories. Furthermore, S-T LISA trajectories can be enriched with type semantics of S-T autocorrelation recorded in LISA sequences, leading to the generation of a LISA semantic trajectory (Parent et al. 2013). ...
Article
Full-text available
The Local Indicators of Spatial Association (LISA) is one of the most widely used methods for identifying local patterns of spatial association in geographical elements. However, the dynamic trends of spatial-temporal (S-T) autocorrelation remain poorly understood, yet capturing these patterns is essential for analyzing the evolution of spatial processes. To fill the gap, we propose a novel S-T LISA methodology to automatically discover co-occurrences LISA subsequences over time by incorporating sequence analysis techniques. First, we extend the classical LISA to a dynamic context, and clarify the definition, properties, and classification of S-T LISA sequences. Second, we adopt an enhanced Hamming distance to quantify the similarity of LISA sequences, followed by hierarchical clustering to group similar LISA sequences. Next, an improved FP-Growth algorithm is applied to identify frequent patterns. Finally, we conduct experiments using grid-scale social media check-in records and city-scale carbon emission data to discover significant evolutionary patterns. The results verified the applicability of the proposed method in both human and physical geography. The proposed approach outperforms traditional S-T cube methods in its ability to automatically capture dynamic, complex, and transient S-T association trends as well as irregular outliers. The integration of sequence analysis with LISA statistics presented in this article provides an effective framework for identifying evolutionary patterns of S-T association.
... Typically, one type of applications, such as trajectory prediction, location-based recommendation [1] and epidemic control, requires the mining of trajectory co-movement patterns [2]. In essence, a co-movement pattern describes a group of objects that moves together during a certain period of time [3]. As exemplified in Fig. 1, where O 1 to O 6 represent the spatial movements of six objects during timestamps T1 to T4. ...
Article
Full-text available
The ubiquitous of GPS-equipped devices has been generating massive trajectory data that record the movements of pedestrians, vehicles, and other moving objects. As one of the fundamental trajectory applications, the mining of co-movement patterns has been used extensively in location-based services (LBS), such as future motion forecasting and social recommendations. These applications require massive real-time data processing capabilities. However, the majority of existing research primarily focuses on historical data, while the rest minority targets the stream scenario, but only works on optimizing efficiency rather than scalability. In this paper, we focus on the distributed real-time co-movement pattern detection on trajectory stream. First, we propose a distributed streaming data processing framework based on Apache Flink. This framework consists of two stages: clustering and pattern mining. To accelerate clustering, we developed partitioning methods such as QGrid structure to achieve effective parallel processing of the spatial aspect of the clustering process. To efficiently perform pattern enumeration, we fully utilize the spatial information carried by cluster to narrow down the search space and perform verification effectively. Compared with the existing research on co-movement, we use spatial information to effectively integrate the two stages of data processing, reduce the search space of the pattern mining process, and improve the overall efficiency of the framework. Extensive experiments compared to existing methods have fully confirmed the efficiency of the proposed framework and its constituent technologies.
... (2) Alternatively, we directly modify longitude (x i ) and latitude (y i ) values within certain original GPS points (P i = (x i , y i , t i )), transforming them into unfamiliar routes that exhibit unusual patterns. 1 The Beijing dataset consists of the trajectory data of 12,000 taxis in the city of Beijing from November 12 to November 25, 2012, containing 200 million trajectory points. After filtering out non-passenger-carrying data, there are 2.46 million trajectory points, which are then extracted to obtain 4928 trajectories. ...
Article
Full-text available
Anomalous trajectory detection within urban road traffic networks is crucial for identifying operational vehicle fraud in intelligent transportation systems. However, most existing approaches are limited to detecting anomalous trajectories solely based on the same original point, neglecting the extraction of spatiotemporal features and contextual information embedded in trajectory data. To address these limitations, a Parallel Recurrent Neural Network with Transformer (PRNNT) model is proposed for anomalous trajectory detection. Specifically, the position embedding and a transformer encoder module are utilized to train trajectory embeddings, allowing the model to learn sequential features and contextual information of trajectories. Moreover, a parallel recurrent neural network is employed to extract hidden trajectory features, capturing the differences between normal and anomalous trajectories. Finally, a linear layer is applied to fuse the spatiotemporal features and output the probability of an anomalous trajectory, enhancing the detection of vehicle trajectory anomalies. Experimental results on Beijing and Porto datasets demonstrate that the proposed PRNNT model significantly outperforms the iBAT (Isolation-Based Anomalous Trajectory), ATDC (Anomalous Trajectory Detection and Classification), ATD-RNN (Anomalous Trajectory Detection using Recurrent Neural Network), XGBoost (Extreme Gradient Boosting), GM-VSAE (Gaussian Mixture Variational Sequence AutoEncoder), and UA-OATD (Deep Unified Attention-based Sequence Modeling for Online Anomalous Trajectory Detection) models, achieving at least a 3.8%, 22.7%, 3.8%, 22.7%, 15%, and 16.7% improvement in F1-score, respectively.
... SAInf uses data recorded by surveillance cameras to design a three-stage method to detect stop events and identify stop areas, thereby overcoming the uncertainty caused by sparse trajectories, making trajectory mining more accurate and comprehensive, and improving the reliability and quality of data sources for prediction models. The first stage of the three-stage method is data preparation, which uses GPS trajectory data to construct ground truth by performing trajectory noise filtering and stop area detection algorithms [12]. Then, the SCR pairs are matched with stop events in chronological order; the second stage is stop event detection, which determines whether a stop event occurs by analyzing the driving speed between surveillance camera records and uses data aggregation methods to build a unified detection model to improve the detection effect of stop events; the third stage is stop area identification, which uses the spatial distribution characteristics of vehicles in surveillance records to generate potential stop areas. ...
Article
Full-text available
As an important application direction of urban computing, traffic flow prediction plays an important role in modern traffic management, urban planning and sustainable development. In recent years, many cutting-edge studies in the field of traffic flow prediction have had a significant impact and promoted the development of practical applications in this field. This paper mainly focuses on the research results of various traffic prediction directions. According to the actual environment and the functional characteristics of the research results, the research is classified into three aspects: data acquisition, feature engineering, and prediction model optimization. It also summarizes the optimization effects of research on traffic flow prediction in sensor data acquisition, data outlier processing, neural network prediction technology, etc. This paper first proposes three important aspects that affect traffic flow prediction and classifies recent research results. Then, the functions and impacts are analyzed from various aspects, and the advantages and progress of the research results are analyzed by comparing most mainstream methods. Then, the problems and limitations of the research are analyzed and discussed in combination with the actual road environment. Finally, the future research direction and development trend of this field are prospected, and the full text is summarized.
... For Porto and Geolife, we conducted the same data preprocessing as in [39,41]: trajectories that are too long (> 200 points), too short (< 10 points), or located too far from the central city area are removed. For T-Drive and AIS, which primarily consist of long-term, continuously tracked trajectories that have not been segmented into trips, we first segmented them according to stay points [48], Then, the segmented trajectories were processed following the same procedures as Geolife and Porto. Moreover, the spatial span of the AIS data nearly covers all U.S. waters. ...
Article
Full-text available
Free-space trajectory similarity calculation, e.g., DTW, Hausdorff, and Fréchet, often incur quadratic time complexity, thus learning-based methods have been proposed to accelerate the computation. The core idea is to train an encoder to transform trajectories into representation vectors and then compute vector similarity to approximate the ground truth. However, existing methods face dual challenges of effectiveness and efficiency: 1) they all utilize Euclidean distance to compute representation similarity, which leads to the severe curse of dimensionality issue - reducing the distinguishability among representations and significantly affecting the accuracy of subsequent similarity search tasks; 2) most of them are trained in triplets manner and often necessitate additional information which downgrades the efficiency; 3) previous studies, while emphasizing the scalability in terms of efficiency, overlooked the deterioration of effectiveness when the dataset size grows. To cope with these issues, we propose a simple, yet accurate, fast, scalable model that only uses a single-layer vanilla transformer encoder as the feature extractor and employs tailored representation similarity functions to approximate various ground truth similarity measures. Extensive experiments demonstrate our model significantly mitigates the curse of dimensionality issue and outperforms the state-of-the-arts in effectiveness, efficiency, and scalability.
... Trajectory modelling is becoming increasingly common in order to explore spatio-temporal patterns in mobility and the movement of multiple objects. A spatial-temporal trajectory can be defined as the observation of a moving object in geographical spaces recorded chronologically in ordered points [252], [116], [244]. In other words, trajectory data is described as the collection of geo-tagged data points that are ordered by a timestamp [49], [233]. ...
Thesis
In recent years, the field of environmental sciences has gained considerable attention, driven by increases in global population and rapid urbanisation. The issues have been widely recognised, as has the need for solutions to address it. Previous work to explore the impact relationship has shown poor air quality is harmful not only for health, mental health, and wellbeing but also in recent years as serious as death with the first landmark case of ‘air pollution’ as a cause of death. DigitalExposome, a novel conceptual framework is introduced to quantify the impact of environment and mental wellbeing. The investigation uses real-time air quality with the approach of making inferences based on an individual's personal characteristics, behaviour and momentary wellbeing within urban spaces. Using a multimodal sensor-fusion approach in this work with the purpose of utilising miniaturised sensing and smartphone technologies aims to acquire environmental, human on-body physiological and mental wellbeing data, specifically labelled at the point of collection. This has entailed the creation of an affordable, sensor-based environmental monitoring station incorporating Internet of Things (IoT) technologies. To address this, a practical approach is explored of three stages to unravel and understand the impact of the environment on wellbeing. Firstly, to observe a more human-based personalised approach, the use of trajectories were studied alongside the addition of semantics to collect environmental air quality and on-body physiological data. As a result, semantic-enriched trajectories combined with episodes supports the limitation to quantifying the impact at the point of exposure. Secondly, a study involving 40 participants in the real-world is conducted in a novel multimodal sensor fusion approach involving real-time data collection using self-labelled wellbeing, air quality characteristics and on-body physiological data. The study extends previous literature by quantifying multiple sensors and self-labelled wellbeing using a more digital approach through low-cost, affordable sensors and mobile technology. The aggregated approach supported a higher accuracy level and produces a more comprehensive relationship impact between the environment, human physiology, behaviour and wellbeing. Thirdly, this work explores data analysis used to quantify the impact between air quality factors and wellbeing. To observe variable importance, statistical approaches such as Principle Component Analysis and Multiple Variant Regression, results in Particulate Matter and Nitrogen Dioxide having considerable negative impact to human wellbeing. Various models such as Dynamic Time Warping (DTW), Deep Belief Network (DBN) and Convolutional Neural Networks (CNN) have created new opportunities for real-world inference of mental wellbeing using environmental and on-body physiological sensor data. A personalised approach using DTW is proposed as a way to observe changes in wellbeing at a personal human-interaction level which in this work demonstrates a high level of accuracy achieving an F1-Score of 0.88 using a DTW network classifying on a 5-point wellbeing scale. To leverage the concept in quantifying an individual's exposure to the environment using technology combined with artificial intelligence (AI) detailed in this thesis gains a deeper understanding into the negative impact air quality exposures can have towards mental wellbeing. This thesis offers the first attempt towards assessing the relationship of air quality and mental wellbeing incorporating innovative methods of digital technology and artificial intelligence for the first time. This work has the potential to shed light on how individuals breathe, feel and interact with their environment in different surroundings.
... The most probable path is the order of road segments (i.e., states) with most likely transitions from one segment to another [20], where the optimal path is calculated based on both probabilities using different algorithms such as Viterbi [21]. Reconstructed the corrected trajectory, GPS trajectory analysis can be divided into methods which aim to categorise the trajectories based on their properties [22], [23], and data mining methods that are aimed towards uncovering and describing the hidden movement patterns in trajectories and to predict the future behaviour of moving entities [23], [24]. ...
... Current state-of-theart models are trained and evaluated using estimation data provided by publicly available autonomous driving datasets [3]- [5]. However, these datasets introduce significant limitations: estimation data is often derived using varying sensor modalities and post-processing methods [6]. This variability leads to inconsistencies across datasets, as similarly formatted data may have subtle but impactful differences in underlying attributes. ...
Preprint
Full-text available
This paper presents a framework capable of accurately and smoothly estimating position, heading, and velocity. Using this high-quality input, we propose a system based on Trajectron++, able to consistently generate precise trajectory predictions. Unlike conventional models that require ground-truth data for training, our approach eliminates this dependency. Our analysis demonstrates that poor quality input leads to noisy and unreliable predictions, which can be detrimental to navigation modules. We evaluate both input data quality and model output to illustrate the impact of input noise. Furthermore, we show that our estimation system enables effective training of trajectory prediction models even with limited data, producing robust predictions across different environments. Accurate estimations are crucial for deploying trajectory prediction models in real-world scenarios, and our system ensures meaningful and reliable results across various application contexts.
... Indeed, several methods are developed in this direction to discover road anomalies, however, these methods have some challenges that make these methods incomplete to be utilized. These challenges include (1) real-time detection of outlier trajectories and (2) inaccurate alarm rate for detecting anomalies [1,2,6,8,10,16,21,26,27,30,42,43,[50][51][52][53]. ...
... Hence, the formulation captures context dynamics across trajectories that may have been observed at different sampling intervals. This kind of approach would normalize contextual attributes to consider temporal variability in trajectory analysis, as discussed in similar works [29,30]. The numerator, (c i − c j ), stands for the difference in the contextual dimension, which can be environmental, like temperature or wind speed; semantic, such as points of interest or activities; or internal states, like energy consumption. ...
Article
Full-text available
Most traditional trajectory compression methods, such as the Douglas–Peucker (DP) method, consider only spatial characteristics and disregard contextual factors, including environmental context. This paper proposes a new way of trajectory formulation by considering all spatial, internal, environmental, and semantic contexts to capture all contextual aspects of moving objects. Then, we propose the Context-Aware Douglas–Peucker (CADP) method for trajectory compression. These facts are confirmed by experiments with real AIS data showing that, while CADP preserves the same computational efficiency of DP (i.e., at O(n²)), it outperforms DP and two-stage Context-Aware Piecewise Linear Segmentation (two-stage CPLS) methods in preserving agent movement behavior, obtaining compressed trajectories that are closer to the original ones and that are much more useful in base analyses such as trajectory prediction. Specifically, the LSTM-based models trained on CADP-compressed trajectories have relatively lower RMSEs than others compressed by either DP or two-stage CPLS. Therefore, CADP is more scalable and efficient, thus making it more practical for large-scale engineering applications; with the improvement in trajectory analysis accuracy achieved by the suggested method, a wide range of critical engineering applications can be potentially improved, such as collision avoidance and route planning. Future work will focus on spatial auto-correlation and uncertainty to extend the robustness and applicability of the approach.
... The feature extraction step requires segmentation, analysis, and aggregation of the original trajectory, which increases the difficulty of data processing on the one hand and ignores the dynamic characteristics of the time dimension in AIS data on the other hand. It is difficult to fully utilize the dynamic time series information of AIS data [8], resulting in low recognition accuracy. ...
Article
Full-text available
Achieving accurate and efficient ship-type recognition is crucial for the development and management of modern maritime traffic systems. To overcome the limitations of existing methods that rely solely on AIS time series data or navigation trajectory images as single-modal approaches, this study introduces TrackAISNet, a multimodal ship classification model that seamlessly integrates ship trajectory images with AIS time series data for improved performance. The model employs a parallel structure, utilizing a lightweight neural network to extract features from trajectory images, and a specially designed TCN-GA (Temporal Convolutional Network with Global Attention) to capture the temporal dependencies and long-range relationships in the AIS time series data. The extracted image features and temporal features are then fused, and the combined features are fed into a classification network for final classification. We conducted experiments on a self-constructed dataset of variable-length AIS time series data comprising four types of ships. The results show that the proposed model achieved an accuracy of 81.38%, recall of 81.11%, precision of 80.95%, and an F1 score of 81.38%, outperforming the benchmark single-modal algorithms. Additionally, on a publicly available dataset containing three types of fishing vessel operations, the model demonstrated improvements in accuracy, recall, and F1 scores by 5.5%, 4.88%, and 5.88%, respectively.
... In this study, we introduce a framework for mining collectively-behaving bots, one of the most prevalent forms of abuse in MMORPGs inspired by movingtogether patterns in the real world [2,3,7,13,21,23,24,31]. However, the collectivelybehaving groups we aim to identify differ from real-world patterns due to the unique behaviors of bots, such as automatic and sporadic actions needed for purchasing potions, strategic hunting, and returning from dying. ...
Preprint
In MMORPGs (Massively Multiplayer Online Role-Playing Games), abnormal players (bots) using unauthorized automated programs to carry out pre-defined behaviors systematically and repeatedly are commonly observed. Bots usually engage in these activities to gain in-game money, which they eventually trade for real money outside the game. Such abusive activities negatively impact the in-game experiences of legitimate users since bots monopolize specific hunting areas and obtain valuable items. Thus, detecting abnormal players is a significant task for game companies. Motivated by the fact that bots tend to behave collectively with similar in-game trajectories due to the auto-programs, we developed BotTRep, a framework that comprises trajectory representation learning followed by clustering using a completely unlabeled in-game trajectory dataset. Our model aims to learn representations for in-game trajectory sequences so that players with contextually similar trajectories have closer embeddings. Then, by applying DBSCAN to these representations and visualizing the corresponding moving patterns, our framework ultimately assists game masters in identifying and banning bots.
... With the rapid development of Global Positioning Systems (GPS) and Geographic Information Systems (GIS), the number of human mobility trajectories has soared, significantly advancing research in spatio-temporal data mining, such as urban planning (Bao et al. 2017;Wang et al. 2023Wang et al. , 2024b, business location selection (Li et al. 2016), and travel time estimation (Reich et al. 2019;Wen et al. 2024). However, due to obstacles including privacy issues (Cao and Li 2021), government regulations (Chen et al. 2024a), and data processing costs (Zheng 2015), it is not easy for researchers to obtain high-quality real-world trajectory data. A promising solution to these challenges is trajectory generation, which not only meets privacy requirements but also allows for the creation of diverse high-fidelity trajectories. ...
Preprint
Trajectory generation has garnered significant attention from researchers in the field of spatio-temporal analysis, as it can generate substantial synthesized human mobility trajectories that enhance user privacy and alleviate data scarcity. However, existing trajectory generation methods often focus on improving trajectory generation quality from a singular perspective, lacking a comprehensive semantic understanding across various scales. Consequently, we are inspired to develop a HOlistic SEmantic Representation (HOSER) framework for navigational trajectory generation. Given an origin-and-destination (OD) pair and the starting time point of a latent trajectory, we first propose a Road Network Encoder to expand the receptive field of road- and zone-level semantics. Second, we design a Multi-Granularity Trajectory Encoder to integrate the spatio-temporal semantics of the generated trajectory at both the point and trajectory levels. Finally, we employ a Destination-Oriented Navigator to seamlessly integrate destination-oriented guidance. Extensive experiments on three real-world datasets demonstrate that HOSER outperforms state-of-the-art baselines by a significant margin. Moreover, the model's performance in few-shot learning and zero-shot learning scenarios further verifies the effectiveness of our holistic semantic representation.
... With the rapid growth of ride-hailing services, trajectory data has been collected at an unprecedented speed leading to the flourishing of related research [1]- [3]. To ensure trip safety, trajectory anomaly detection, distinguishing anomalous ongoing detour trajectories under the given source-destination (SD) pairs, is critical yet fundamental for ride-hailing platforms. ...
Preprint
Trajectory anomaly detection, aiming to estimate the anomaly risk of trajectories given the Source-Destination (SD) pairs, has become a critical problem for many real-world applications. Existing solutions directly train a generative model for observed trajectories and calculate the conditional generative probability P(TC)P({T}|{C}) as the anomaly risk, where T{T} and C{C} represent the trajectory and SD pair respectively. However, we argue that the observed trajectories are confounded by road network preference which is a common cause of both SD distribution and trajectories. Existing methods ignore this issue limiting their generalization ability on out-of-distribution trajectories. In this paper, we define the debiased trajectory anomaly detection problem and propose a causal implicit generative model, namely CausalTAD, to solve it. CausalTAD adopts do-calculus to eliminate the confounding bias of road network preference and estimates P(Tdo(C))P({T}|do({C})) as the anomaly criterion. Extensive experiments show that CausalTAD can not only achieve superior performance on trained trajectories but also generally improve the performance of out-of-distribution data, with improvements of 2.1%5.7%2.1\% \sim 5.7\% and 10.6%32.7%10.6\% \sim 32.7\% respectively.
... Tracking technologies like GPS gather huge and growing collections of trajectory data, for instance for cars, mobile devices, and animals. The analysis of these collections poses many interesting problems, which has been the subject of much attention recently [1]. One of these problems is the identification of the region, in which an entity spends a large amount of time. ...
Preprint
In this paper we study the problem of finding hotspots, i.e. regions in which a moving entity has spent a significant amount of time, for polygonal trajectories. The fastest optimal algorithm, due to Gudmundsson, van Kreveld, and Staals (2013) finds an axis-parallel square hotspot of fixed side length in O(n2)O(n^2). Limiting ourselves to the case in which the entity moves in a direction parallel either to the x or the y-axis, We present an approximation algorithm with the time complexity O(nlog3n)O(n \log^3 n) and approximation factor 1/2.
... 1) Context of travel movement: Given detailed mobility datasets, intelligent data mining strategies can be utilized to derive meaning and context from the locations visited. A recent paper [5] provides a thorough overview of the field, distilling trajectory data mining into the following phases: (a) preprocessing (trajectory compression, stay-point detection, trajectory segmentation and map matching), (b) data management (indexing and storing data so it can be retrieved quickly) and (c) pattern mining (clustering by time/shape/segment, classifying, and detecting outliers). The last phase is particularly interesting for transportation, because its application involves grouping similar trip origins, destinations, times of day, trip durations, and sections of road, in order to extract prevailing patterns and answer transportation-related questions. ...
Preprint
Transportation agencies have an opportunity to leverage increasingly-available trajectory datasets to improve their analyses and decision-making processes. However, this data is typically purchased from vendors, which means agencies must understand its potential benefits beforehand in order to properly assess its value relative to the cost of acquisition. While the literature concerned with trajectory data is rich, it is naturally fragmented and focused on technical contributions in niche areas, which makes it difficult for government agencies to assess its value across different transportation domains. To overcome this issue, the current paper explores trajectory data from the perspective of a road transportation agency interested in acquiring trajectories to enhance its analyses. The paper provides a literature review illustrating applications of trajectory data in six areas of road transportation systems analysis: demand estimation, modeling human behavior, designing public transit, traffic performance measurement and prediction, environment and safety. In addition, it visually explores 20 million GPS traces in Maryland, illustrating existing and suggesting new applications of trajectory data.
... In this study, we introduce a framework for mining collectively-behaving bots, one of the most prevalent forms of abuse in MMORPGs inspired by moving-together patterns in the real world [2,3,7,13,21,23,24,31]. However, the collectively-behaving groups we aim to identify differ from real-world patterns due to the unique behaviors of bots, such as automatic and sporadic actions needed for purchasing potions, strategic hunting, and returning from dying. ...
... Recent years have witnessed a tremendous growth in the collection of trajectory data and trajectory data analysis has become a prominent research stream with important applications in e.g. urban computing, intelligent transportation, animal ecology (Giannotti and Pedreschi, 2008;Zheng et al, 2014;Zheng, 2015;Parent et al, 2013;Güting et al, 2015). Spatial trajectories, in particular (simply trajectories hereinafter), are sequences of temporally correlated observations describing the movement of an object through a series of points sampling the time-varying location of the object (Zheng and Zhou, 2011). ...
Preprint
We present a framework for the partitioning of a spatial trajectory in a sequence of segments based on spatial density and temporal criteria. The result is a set of temporally separated clusters interleaved by sub-sequences of unclustered points. A major novelty is the proposal of an outlier or noise model based on the distinction between intra-cluster (local noise) and inter-cluster noise (transition): the local noise models the temporary absence from a residence while the transition the definitive departure towards a next residence. We analyze in detail the properties of the model and present a comprehensive solution for the extraction of temporally ordered clusters. The effectiveness of the solution is evaluated first qualitatively and next quantitatively by contrasting the segmentation with ground truth. The ground truth consists of a set of trajectories of labeled points simulating animal movement. Moreover, we show that the approach can streamline the discovery of additional derived patterns, by presenting a novel technique for the analysis of periodic movement. From a methodological perspective, a valuable aspect of this research is that it combines the theoretical investigation with the application and external validation of the segmentation framework. This paves the way to an effective deployment of the solution in broad and challenging fields such as e-science.
... By employing a motion model for home and away team behaviors in soccer, Bialkowski et al. [69] visually summarized a soccer competition and provided indications of dominance and tactics. Recently, Zheng [70] conducted a survey on trajectory data mining, offering a thorough understanding of the field. More recently, a smart coaching assistant (SAETA) designed for professional volleyball training was introduced by Vales-Alonso et al. [71]. ...
Preprint
Sports data analysis is becoming increasingly large-scale, diversified, and shared, but difficulty persists in rapidly accessing the most crucial information. Previous surveys have focused on the methodologies of sports video analysis from the spatiotemporal viewpoint instead of a content-based viewpoint, and few of these studies have considered semantics. This study develops a deeper interpretation of content-aware sports video analysis by examining the insight offered by research into the structure of content under different scenarios. On the basis of this insight, we provide an overview of the themes particularly relevant to the research on content-aware systems for broadcast sports. Specifically, we focus on the video content analysis techniques applied in sportscasts over the past decade from the perspectives of fundamentals and general review, a content hierarchical model, and trends and challenges. Content-aware analysis methods are discussed with respect to object-, event-, and context-oriented groups. In each group, the gap between sensation and content excitement must be bridged using proper strategies. In this regard, a content-aware approach is required to determine user demands. Finally, the paper summarizes the future trends and challenges for sports video analysis. We believe that our findings can advance the field of research on content-aware video analysis for broadcast sports.
... Trajectory mining is a hot research topic nowadays, where a spectrum of applications have been successfully developed, such as city-scale map creation [12], human transportation mode detection [51] and crowd mobility prediction [43]. A nice survey can be found in [50]. More speci cally, as the taxi traces are perhaps the most easily accessible large-scale open data for trajectory mining [9,30,35,39,42], plenty of research studies are conducted by using it as an important data source, e.g., anomaly detection [48], environment monitoring [52], bus route planning [14], travel time estimation [44] and personalized trip navigation [13]; a comprehensive survey on taxi trajectory mining can be found in [10]. ...
Preprint
Ridesourcing platforms like Uber and Didi are getting more and more popular around the world. However, unauthorized ridesourcing activities taking advantages of the sharing economy can greatly impair the healthy development of this emerging industry. As the first step to regulate on-demand ride services and eliminate black market, we design a method to detect ridesourcing cars from a pool of cars based on their trajectories. Since licensed ridesourcing car traces are not openly available and may be completely missing in some cities due to legal issues, we turn to transferring knowledge from public transport open data, i.e, taxis and buses, to ridesourcing detection among ordinary vehicles. We propose a two-stage transfer learning framework. In Stage 1, we take taxi and bus data as input to learn a random forest (RF) classifier using trajectory features shared by taxis/buses and ridesourcing/other cars. Then, we use the RF to label all the candidate cars. In Stage 2, leveraging the subset of high confident labels from the previous stage as input, we further learn a convolutional neural network (CNN) classifier for ridesourcing detection, and iteratively refine RF and CNN, as well as the feature set, via a co-training process. Finally, we use the resulting ensemble of RF and CNN to identify the ridesourcing cars in the candidate pool. Experiments on real car, taxi and bus traces show that our transfer learning framework, with no need of a pre-labeled ridesourcing dataset, can achieve similar accuracy as the supervised learning methods.
... There are several settings of recommendation problems for locations and routes, as illustrated in Figure 1. We summarise recent work most related to formulating and solving learning problems on assembling routes from POIs, and refer the reader to a number of recent surveys [1,27,28] for general overviews of the area. The first setting can be called POI recommendation (Figure 1(a)). ...
Preprint
The problem of recommending tours to travellers is an important and broadly studied area. Suggested solutions include various approaches of points-of-interest (POI) recommendation and route planning. We consider the task of recommending a sequence of POIs, that simultaneously uses information about POIs and routes. Our approach unifies the treatment of various sources of information by representing them as features in machine learning algorithms, enabling us to learn from past behaviour. Information about POIs are used to learn a POI ranking model that accounts for the start and end points of tours. Data about previous trajectories are used for learning transition patterns between POIs that enable us to recommend probable routes. In addition, a probabilistic model is proposed to combine the results of POI ranking and the POI to POI transitions. We propose a new F1_1 score on pairs of POIs that capture the order of visits. Empirical results show that our approach improves on recent methods, and demonstrate that combining points and routes enables better trajectory recommendations.
... For example, with Brightkite you can track on your friends or any other Brightkite users nearby using the phone's built in GPS. A combination of social networking and location-based services has lead to a specific style of social networks, termed as location-based social networks (LBSN) [Cho et al. 2011;Bao et al. 2012;Zheng 2015]. We present an illustrative example for LBSNs in Fig. 1, and it can been seen that LBSNs usually include both the social network and mobile trajectory data. ...
Preprint
The accelerated growth of mobile trajectories in location-based services brings valuable data resources to understand users' moving behaviors. Apart from recording the trajectory data, another major characteristic of these location-based services is that they also allow the users to connect whomever they like. A combination of social networking and location-based services is called as location-based social networks (LBSN). As shown in previous works, locations that are frequently visited by socially-related persons tend to be correlated, which indicates the close association between social connections and trajectory behaviors of users in LBSNs. In order to better analyze and mine LBSN data, we present a novel neural network model which can joint model both social networks and mobile trajectories. In specific, our model consists of two components: the construction of social networks and the generation of mobile trajectories. We first adopt a network embedding method for the construction of social networks: a networking representation can be derived for a user. The key of our model lies in the component of generating mobile trajectories. We have considered four factors that influence the generation process of mobile trajectories, namely user visit preference, influence of friends, short-term sequential contexts and long-term sequential contexts. To characterize the last two contexts, we employ the RNN and GRU models to capture the sequential relatedness in mobile trajectories at different levels, i.e., short term or long term. Finally, the two components are tied by sharing the user network representations. Experimental results on two important applications demonstrate the effectiveness of our model. Especially, the improvement over baselines is more significant when either network structure or trajectory data is sparse.
Article
Trajectory prediction aims to estimate an entity’s future path using its current position and historical movement data, benefiting fields like autonomous navigation, robotics, and human movement analytics. Deep learning approaches have become key in this area, utilizing large-scale trajectory datasets to model movement patterns, but face challenges in managing complex spatial dependencies and adapting to dynamic environments. To address these challenges, we introduce TrajLearn , a novel model for trajectory prediction that leverages generative modeling of higher-order mobility flows based on hexagonal spatial representation. TrajLearn predicts the next k steps by integrating a customized beam search for exploring multiple potential paths while maintaining spatial continuity. We conducted a rigorous evaluation of TrajLearn , benchmarking it against leading state-of-the-art approaches and meaningful baselines. The results indicate that TrajLearn achieves significant performance gains, with improvements of up to ∼ 40% across multiple real-world trajectory datasets. In addition, we evaluated different prediction horizons (i.e., various values of k ), conducted resolution sensitivity analysis, and performed ablation studies to assess the impact of key model components. Furthermore, we developed a novel algorithm to generate mixed-resolution maps by hierarchically subdividing hexagonal regions into finer segments within a specified observation area. This approach supports selective detailing , applying finer resolution to areas of interest or high activity (e.g., urban centers) while using coarser resolution for less significant regions (e.g., rural or uninhabited areas), effectively reducing data storage requirements and computational overhead. We promote reproducibility and adaptability by offering complete code, data, and detailed documentation with flexible configuration options for various applications.
Article
With the widespread use of GPS-enabled devices and services, trajectory data fuels services in a variety of fields, such as transportation and smart cities. However, trajectory data often contains errors stemming from inaccurate GPS measurements, low sampling rates, and transmission interruptions, yielding low-quality trajectory data with negative effects on downstream services. Therefore, a crucial yet tedious endeavor is to assess the quality of trajectory data, serving as a guide for subsequent data cleaning and analyses. Despite some studies addressing general-purpose data quality assessment, no studies exist that are tailored specifically for trajectory data. To more effectively diagnose the quality of trajectory data, we propose T-Assess, an automated trajectory data quality assessment system. T-Assess is built on three fundamental principles: i) extensive coverage, ii) versatility, and iii) efficiency. To achieve comprehensive coverage, we propose assessment criteria spanning validity, completeness, consistency, and fairness. To provide high versatility, T-Assess supports both offline and online evaluations for full-batch trajectory datasets as well as real-time trajectory streams. In addition, we incorporate an evaluation optimization strategy to achieve assessment efficiency. Extensive experiments on four real-life benchmark datasets offer insight into the effectiveness of T-Assess at quantifying trajectory data quality beyond the capabilities of state-of-the-art data quality systems.
Article
With the rapid growth in maritime traffic, navigational safety has become a pressing concern. Some vessels deliberately manipulate their type information to evade regulatory oversight, either to circumvent legal sanctions or engage in illicit activities. Such practices not only undermine the accuracy of maritime supervision but also pose significant risks to maritime traffic management and safety. Therefore, accurately identifying vessel types is essential for effective maritime traffic regulation, combating maritime crimes, and ensuring safe maritime transportation. However, the existing methods fail to fully exploit the long-term sequential dependencies and intricate mobility patterns embedded in vessel trajectory data, leading to suboptimal identification accuracy and reliability. To address these limitations, we propose MESTR, a Multi-Task Enhanced Ship-Type Recognition model based on Automatic Identification System (AIS) data. MESTR leverages a Transformer-based deep learning framework with a motion-pattern-aware trajectory segment masking strategy. By jointly optimizing two learning tasks—trajectory segment masking prediction and ship-type prediction—MESTR effectively captures deep spatiotemporal features of various vessel types. This approach enables the accurate classification of six common vessel categories: tug, sailing, fishing, passenger, tanker, and cargo. Experimental evaluations on real-world maritime datasets demonstrate the effectiveness of MESTR, achieving an average accuracy improvement of 12.04% over the existing methods.
Article
Unregistered illegal facilities that do not qualify for chemical production pose substantial threats to human lives and the environment. For human safety and environmental protection, the government needs to figure out the illegal facilities and shut them down. A new, convenient, and affordable approach to detect such facilities is to analyze the trajectories of hazardous chemicals transportation (HCT) trucks. The existing study leverages a machine learning model to predict how likely a place is illegal. However, such a model lacks interpretability and cannot provide actionable justifications required for decision-making. In this study, we collaborate with HCT experts and propose an interactive visual analytics approach to explore the suspicious stay points, analyze abnormal HCT truck behaviors, and figure out unregistered illegal chemical facilities. First, experts receive an initial result from the detection model for reference. Then, they are supported to check the detailed information of the suspicious places with three coordinated views. We apply a visualization that tightly encodes the geo-referred movement activities along the timeline to present the HCT truck behaviors, which can help experts finally verify their conclusions. We demonstrate the effectiveness of the system with two case studies on real-world data. We also received experts’ positive feedback from an expert interview.
Article
Multi-dimensional Flight Trajectory Prediction (MFTP) in Flight Operations Quality Assessment (FOQA) refers to the estimation of flight status at the future time, accurate prediction future flight positions, flight attitude and aero-engine monitoring parameters are its goals. Due to differences between flight trajectories and other kinds trajectories and difficult access to data and complex domain knowledge, MFTP in FOQA is much more challenging than Flight Trajectory Prediction (FTP) in Air Traffic Control (ATC) and other trajectory prediction. In this work, a deep Koopman neural operator-based multi-dimensional flight trajectory prediction framework, called Deep Koopman Neural Operator-Based Multi-Dimensional Flight Trajectories Prediction (FlightKoopman), is first proposed to address this challenge. This framework is based on data-driven Koopman theory, enables to construct a prediction model using only data without any prior knowledge, and approximate operator pattern to capture flight maneuver for downstream tasks. The framework recovers the complete state space of the flight dynamics system with Hankle embedding and reconstructs its phase space, and combines a fully connected neural network to generate the observation function of the state space and the approximation matrix of the Koopman operator to obtain an overall model for predicting the evolution. The paper also reveals a virgin dataset Civil Aviation Flight University of China (CAFUC) that could be used for MFTP tasks or other flight trajectory tasks. CAFUC Datasets and code is available at this repository: https://github.com/CAFUC-JJJ/FlightKoopman . Experiments on the real-world dataset demonstrate that FlightKoopman outperforms other baselines.
Article
Full-text available
Time geography is an elegant framework to analyze human activity-travel behaviors across time and space dimensions. Central to this framework are the concepts of space-time paths, representing historical trajectories, and space-time prisms, delimiting potential future activity spaces. However, most existing studies treat space-time paths and prisms in isolation, neglecting the integrative nature of time geography as a theoretical framework that unifies past, present, and future within a continuous temporal dimension. This study addresses this critical gap by developing novel methods for measuring and querying the spatiotemporal proximity of these heterogeneous and heterochronous time-geographical entities. To operationalize these methods, a GIS tool is implemented, enabling local density analysis of time-geographical entities. Comprehensive computational experiments are carried out to validate the developed methods using large-scale, network-constrained time-geographical datasets. Experimental results demonstrate the effectiveness of developed methods in analyzing and visualizing local space-time density of historical space-time paths in the near future. Furthermore, the developed methods exhibit high computational efficiency, i.e., about 2 seconds to compute the local density for each path within extensive prism collections.
Article
Location-based mobile services, while improving user daily life, also raise significant privacy concerns in the sharing of location data. These trajectories indicate users’ traveling behavioural traces with rich semantics derived from open-source information. Behavioral-semantic analysis reveals users’ travelling motivations and underlying behavioral patterns. It contributes to attackers launching inferential attacks for behavior prediction, identity identification, or other privacy invasions, even when the location data is protected. It remains open to the issues of behavioral-semantic privacy-risk quantification and privacy-protection evaluation. This paper aims to reveal such semantic privacy risks of user behaviors arising from the publication of location trajectories in mobile scenarios. We formalize user semantic-mobility process to analyze his underlying behavior patterns. Then, we design semantic inference algorithms conditional on the released trajectory to reason about the observation-based likelihood of the user’s actual staying and transfer behaviours and behavioural-trace tracking. Extensive experiments with real-world data demonstrate their performance on inference accuracy and semantic similarity, offering a quantification criterion for deploying mobile privacy protection.
Book
Full-text available
Embark on a journey into the heart of a new industrial revolution―one that promises to redefine human mobility for generations to come. In this groundbreaking exploration, we confront the promises and perils of new mobility, navigating the intricate landscape where technology intersects with urban society. As cities evolve and technology shapes our daily lives, the ethical dimensions of this transformation remain largely uncharted territory. Amidst the rapid advancement of new mobility systems, this book sheds light on the moral dilemmas and philosophical underpinnings that often go unnoticed. From the ethical implications of technology to the systemic flaws in planning and design, we delve into the core of this paradigm shift. By understanding the foundational principles of mobility and the hidden codes that govern human movement, we pave the way for a more equitable and inclusive future. At the heart of this transformative vision lies a comprehensive framework for building a new mobility ecosystem―one that prioritizes human well-being and equity above all else. Through innovative planning processes and redesign concepts, we aim to bridge the gap between technology and society, ensuring that every individual has access to safe, efficient, and sustainable modes of transportation. From low-emission vehicles to multimodal transit hubs, this book presents a blueprint for reimagining urban spaces and redefining the way we move. By embracing shared values and collective responsibility, we strive to create a world where mobility is not just a privilege, but a fundamental human right. As we embark on this journey towards a more sustainable future, let us remember that the true measure of progress lies not in technological innovation alone, but in our ability to build communities that thrive together. Join us in shaping the future of mobility―one where humanity and equity reign supreme.
Article
Full-text available
Location data becomes more and more important. In this paper, we focus on the trajectory data, and propose a new framework, namely PRESS (Paralleled Road-Network-Based Trajectory Compression), to effectively compress trajectory data under road network constraints. Different from existing work, PRESS proposes a novel representation for trajec-tories to separate the spatial representation of a trajectory from the temporal representation, and proposes a Hybrid Spatial Compression (HSC) algorithm and error Bounded Temporal Compression (BTC) algorithm to compress the spatial and temporal information of trajectories respectively. PRESS also supports common spatial-temporal queries without fully decompressing the data. Through an extensive experimental study on real trajectory dataset, PRESS significantly outperforms existing approaches in terms of saving storage cost of trajectory data with bounded errors.
Article
Full-text available
Recent advances in localization techniques have fundamentally enhanced social networking services, allowing users to share their locations and location-related contents, such as geo-tagged photos and notes. We refer to these social networks as location-based social networks (LBSNs). Location data bridges the gap between the physical and digital worlds and enables a deeper understanding of users’ preferences and behavior. This addition of vast geo-spatial datasets has stimulated research into novel recommender systems that seek to facilitate users’ travels and social interactions. In this paper, we offer a systematic review of this research, summarizing the contributions of individual efforts and exploring their relations. We discuss the new properties and challenges that location brings to recommender systems for LBSNs. We present a comprehensive survey analyzing 1) the data source used, 2) the methodology employed to generate a recommendation, and 3) the objective of the recommendation. We propose three taxonomies that partition the recommender systems according to the properties listed above. First, we categorize the recommender systems by the objective of the recommendation, which can include locations, users, activities, or social media. Second, we categorize the recommender systems by the methodologies employed, including content-based, link analysis-based, and collaborative filtering-based methodologies. Third, we categorize the systems by the data sources used, including user profiles, user online histories, and user location histories. For each category, we summarize the goals and contributions of each system and highlight the representative research effort. Further, we provide comparative analysis of the recommender systems within each category. Finally, we discuss the available data-sets and the popular methods used to evaluate the performance of recommender systems. Finally, we point out promising research topics for future work. This article presents a panorama of the recommender systems in location-based social networks with a balanced depth, facilitating research into this important research theme.
Conference Paper
Full-text available
This demo presents Minnesota Traffic Generator (MNTG); an extensible web-based road network traffic generator. MNTG enables its users to generate traffic data at any arbitrary road networks with different traffic generators. Unlike existing traffic generators that require a lot of time/effort to install, configure, and run, MNTG is a web service with a user-friendly interface where users can specify an arbitrary spatial region, select a traffic generator, and submit their traffic generation request. Once the traffic data is generated by MNTG, users can then download and/or visualize the generated data. MNTG can be extended to support: (1) various traffic generators. It is already shipped with the two most common traffic generators, Brinkhoff and BerlinMOD, but other generators can be easily added. (2) various road network sources. It is shipped with U.S. Tiger files and OpenStreetMap, but other sources can be also added. A beta version of MNTG is launched at: http://mntg.cs.umn.edu.
Conference Paper
Full-text available
The popularity of location-based social networks provide us with a new platform to understand users' preferences based on their location histories. In this paper, we present a location-based and preference-aware recommender system that offers a particular user a set of venues (such as restaurants) within a geospatial range with the consideration of both: 1) User preferences, which are automatically learned from her location history and 2) Social opinions, which are mined from the location histories of the local experts. This recommender system can facilitate people's travel not only near their living areas but also to a city that is new to them. As a user can only visit a limited number of locations, the user-locations matrix is very sparse, leading to a big challenge to traditional collaborative filtering-based location recommender systems. The problem becomes even more challenging when people travel to a new city. To this end, we propose a novel location recommender system, which consists of two main parts: offline modeling and online recommendation. The offline modeling part models each individual's personal preferences with a weighted category hierarchy (WCH) and infers the expertise of each user in a city with respect to different category of locations according to their location histories using an iterative learning model. The online recommendation part selects candidate local experts in a geospatial range that matches the user's preferences using a preference-aware candidate selection algorithm and then infers a score of the candidate locations based on the opinions of the selected local experts. Finally, the top-k ranked locations are returned as the recommendations for the user. We evaluated our system with a large-scale real dataset collected from Foursquare. The results confirm that our method offers more effective recommendations than baselines, while having a good efficiency of providing location recommendations.
Conference Paper
Full-text available
Existing algorithms for trajectory-based clustering usually rely on simplex representation and a single proximity-related distance (or similarity) measure. Consequently, additional information markers (e.g., social interactions or the semantics of the spatial layout) are usually ignored, leading to the inability to fully discover the communities in the trajectory database. This is especially true for human-generated trajectories, where additional fine-grained markers (e.g., movement velocity at certain locations, or the sequence of semantic spaces visited) can help capture latent relationships between cluster members. To address this limitation, we propose TODMIS: a general framework for Trajectory cOmmunity Discovery using Multiple Information Sources. TODMIS combines additional information with raw trajectory data and creates multiple similarity metrics. In our proposed approach, we first develop a novel approach for computing semantic level similarity by constructing a Markov Random Walk model from the semantically-labeled trajectory data, and then measuring similarity at the distribution level. In addition, we also extract and compute pair-wise similarity measures related to three additional markers, namely trajectory level spatial alignment (proximity), temporal patterns and multi-scale velocity statistics. Finally, after creating a single similarity metric from the weighted combination of these multiple measures, we apply dense sub-graph detection to discover the set of distinct communities. We evaluated TODMIS extensively using traces of (i) student movement data in a campus, (ii) customer trajectories in a shopping mall, and (iii) city-scale taxi movement data. Experimental results demonstrate that TODMIS correctly and efficiently discovers the real grouping behaviors in these diverse settings.
Article
Full-text available
GPS based navigation and route guidance systems are becoming increasingly popular among bus operators, fleet managers and travelers. To provide this functionality, one has to have a GPS receiver, a digital map of the traveled network and software that can associate (match) the user's position with a location on the digital map. Matching the user's location has to be done even when the GPS location and the underlying digital map have inaccuracies and errors. There are several approaches for solving this map matching task. Some only match the user's location to the nearest street node while others are able to locate the user at specific location on the traveled street segment. In this paper a topologically based matching procedure is presented. The procedure was tested with low quality GPS data to assess its robustness. The performance of the algorithms was found to produce outstanding results.
Article
Full-text available
The increasing availability of GPS-embedded mobile devices has given rise to a new spectrum of location-based services, which have accumulated a huge collection of location trajectories. In practice, a large portion of these trajectories are of low-sampling-rate. For instance, the time interval between consecutive GPS points of some trajectories can be several minutes or even hours. With such a low sampling rate, most details of their movement are lost, which makes them difficult to process effectively. In this work, we investigate how to reduce the uncertainty in such kind of trajectories. Specifically, given a low-sampling-rate trajectory, we aim to infer its possible routes. The methodology adopted in our work is to take full advantage of the rich information extracted from the historical trajectories. We propose a systematic solution, History based Route Inference System (HRIS), which covers a series of novel algorithms that can derive the travel pattern from historical data and incorporate it into the route inference process. To validate the effectiveness of the system, we apply our solution to the map-matching problem which is an important application scenario of this work, and conduct extensive experiments on a real taxi trajectory dataset. The experiment results demonstrate that HRIS can achieve higher accuracy than the existing map-matching algorithms for low-sampling-rate trajectories.
Article
Full-text available
Nearest neighbor (NN) queries in trajectory databases have received significant attention in the past, due to their application in spatio-temporal data analysis. Recent work has considered the realistic case where the trajectories are uncertain; however, only simple uncertainty models have been proposed, which do not allow for accurate probabilistic search. In this paper, we fill this gap by addressing probabilistic nearest neighbor queries in databases with uncertain trajectories modeled by stochastic processes, specifically the Markov chain model. We study three nearest neighbor query semantics that take as input a query state or trajectory q and a time interval. For some queries, we show that no polynomial time solution can be found. For problems that can be solved in PTIME, we present exact query evaluation algorithms, while for the general case, we propose a sophisticated sampling approach, which uses Bayesian inference to guarantee that sampled trajectories conform to the observation data stored in the database. This sampling approach can be used in Monte-Carlo based approximation solutions. We include an extensive experimental study to support our theoretical results.
Book
Spatial trajectories have been bringing the unprecedented wealth to a variety of research communities. A spatial trajectory records the paths of a variety of moving objects, such as people who log their travel routes with GPS trajectories. The field of moving objects related research has become extremely active within the last few years, especially with all major database and data mining conferences and journals. Computing with Spatial Trajectories introduces the algorithms, technologies, and systems used to process, manage and understand existing spatial trajectories for different applications. This book also presents an overview on both fundamentals and the state-of-the-art research inspired by spatial trajectory data, as well as a special focus on trajectory pattern mining, spatio-temporal data mining and location-based social networks. Each chapter provides readers with a tutorial-style introduction to one important aspect of location trajectory computing, case studies and many valuable references to other relevant research work. Computing with Spatial Trajectories is designed as a reference or secondary text book for advanced-level students and researchers mainly focused on computer science and geography. Professionals working on spatial trajectory computing will also find this book very useful.
Article
Many cities suffer from noise pollution, which compromises people's working efficiency and even mental health. New York City (NYC) has opened a platform, entitled 311, to allow people to complain about the city's issues by using a mobile app or making a phone call; noise is the third largest category of complaints in the 311 data. As each complaint about noises is associated with a location, a time stamp, and a fine-grained noise category, such as "Loud Music" or "Construction", the data is actually a result of "human as a sensor" and "crowd sensing", containing rich human intelligence that can help diagnose urban noises. In this paper we infer the fine-grained noise situation (consisting of a noise pollution indicator and the composition of noises) of different times of day for each region of NYC, by using the 311 complaint data together with social media, road network data, and Points of Interests (POIs). We model the noise situation of NYC with a three dimension tensor, where the three dimensions stand for regions, noise categories, and time slots, respectively. Supplementing the missing entries of the tensor through a context-aware tensor decomposition approach, we recover the noise situation throughout NYC. The information can inform people and officials' decision making. We evaluate our method with four real datasets, verifying the advantages of our method beyond four baselines, such as the interpolation-based approach. Copyright
Article
Ranking residential real estates based on investment values can provide decision making support for home buyers and thus plays an important role in estate marketplace. In this paper, we aim to develop methods for ranking estates based on investment values by mining users' opinions about estates from online user reviews and offline moving behaviors (e.g., Taxi traces, smart card transactions, check-ins). While a variety of features could be extracted from these data, these features are Interco related and redundant. Thus, selecting good features and integrating the feature selection into the fitting of a ranking model are essential. To this end, in this paper, we first strategically mine the fine-grained discrminative features from user reviews and moving behaviors, and then propose a probabilistic sparse pair wise ranking method for estates. Specifically, we first extract the explicit features from online user reviews which express users' opinions about point of interests (POIs) near an estate. We also mine the implicit features from offline moving behaviors from multiple perspectives (e.g., Direction, volume, velocity, heterogeneity, topic, popularity, etc.). Then we learn an estate ranking predictor by combining a pair wise ranking objective and a sparsity regularization in a unified probabilistic framework. And we develop an effective solution for the optimization problem. Finally, we conduct a comprehensive performance evaluation with real world estate related data, and the experimental results demonstrate the competitive performance of both features and the proposed model.
Article
Location-based services allow users to perform geospatial recording actions, which facilitates the mining of the moving activities of human beings. This article proposes to recommend time-sensitive trip routes consisting of a sequence of locations with associated timestamps based on knowledge extracted from large-scale timestamped location sequence data (e.g., check-ins and GPS traces). We argue that a good route should consider (a) the popularity of places, (b) the visiting order of places, (c) the proper visiting time of each place, and (d) the proper transit time from one place to another. By devising a statistical model, we integrate these four factors into a route goodness function that aims to measure the quality of a route. Equipped with the route goodness, we recommend time-sensitive routes for two scenarios. The first is about constructing the route based on the user-specified source location with the starting time. The second is about composing the route between the specified source location and the destination location given a starting time. To handle these queries, we propose a search method, Guidance Search, which consists of a novel heuristic satisfaction function that guides the search toward the destination location and a backward checking mechanism to boost the effectiveness of the constructed route. Experiments on the Gowalla check-in datasets demonstrate the effectiveness of our model on detecting real routes and performing cloze test of routes, comparing with other baseline methods. We also develop a system TripRouter as a real-time demo platform.
Article
We proposed and developed a taxi-sharing system that accepts taxi passengers’ real-time ride requests sent from smartphones and schedules proper taxis to pick up them via ridesharing, subject to time, capacity, and monetary constraints. The monetary constraints provide incentives for both passengers and taxi drivers: passengers will not pay more compared with no ridesharing and get compensated if their travel time is lengthened due to ridesharing; taxi drivers will make money for all the detour distance due to ridesharing. While such a system is of significant social and environmental benefit, e.g., saving energy consumption and satisfying people's commute, real-time taxi-sharing has not been well studied yet. To this end, we devise a mobile-cloud architecture based taxi-sharing system. Taxi riders and taxi drivers use the taxi-sharing service provided by the system via a smart phone App. The Cloud first finds candidate taxis quickly for a taxi ride request using a taxi searching algorithm supported by a spatio-temporal index. A scheduling process is then performed in the cloud to select a taxi that satisfies the request with minimum increase in travel distance. We built an experimental platform using the GPS trajectories generated by over 33,000 taxis over a period of three months. A ride request generator is developed (available at http://cs.uic.edu/∼sma/ridesharing) in terms of the stochastic process modelling real ride requests learned from the data set. Tested on this platform with extensive experiments, our proposed system demonstrated its efficiency, effectiveness and scalability. For example, when the ratio of the number of ride requests to the number of taxis is 6, our proposed system serves three times as many taxi riders as that when no ridesharing is performed while saving 11 percent in total travel distance and 7 percent taxi fare per rider.
Article
Urbanization's rapid progress has modernized many people's lives but also engendered big issues, such as traffic congestion, energy consumption, and pollution. Urban computing aims to tackle these issues by using the data that has been generated in cities (e.g., traffic flow, human mobility, and geographical data). Urban computing connects urban sensing, data management, data analytics, and service providing into a recurrent process for an unobtrusive and continuous improvement of people's lives, city operation systems, and the environment. Urban computing is an interdisciplinary field where computer sciences meet conventional city-related fields, like transportation, civil engineering, environment, economy, ecology, and sociology in the context of urban spaces. This article first introduces the concept of urban computing, discussing its general framework and key challenges from the perspective of computer sciences. Second, we classify the applications of urban computing into seven categories, consisting of urban planning, transportation, the environment, energy, social, economy, and public safety and security, presenting representative scenarios in each category. Third, we summarize the typical technologies that are needed in urban computing into four folds, which are about urban sensing, urban data management, knowledge fusion across heterogeneous data, and urban data visualization. Finally, we give an outlook on the future of urban computing, suggesting a few research topics that are somehow missing in the community.
Article
Urban transportation is an important factor in energy consumption and pollution, and is of increasing concern due to its complexity and economic significance. Its importance will only increase as urbanization continues around the world. In this article, we explore drivers' refueling behavior in urban areas. Compared to questionnaire-based methods of the past, we propose a complete data-driven system that pushes towards real-time sensing of individual refueling behavior and citywide petrol consumption. Our system provides the following: detection of individual refueling events (REs) from which refueling preference can be analyzed; estimates of gas station wait times from which recommendations can be made; an indication of overall fuel demand from which macroscale economic decisions can be made, and a spatial, temporal, and economic view of urban refueling characteristics. For individual behavior, we use reported trajectories from a fleet of GPS-equipped taxicabs to detect gas station visits. For time spent estimates, to solve the sparsity issue along time and stations, we propose context-aware tensor factorization (CATF), a factorization model that considers a variety of contextual factors (e.g., price, brand, and weather condition) that affect consumers' refueling decision. For fuel demand estimates, we apply a queue model to calculate the overall visits based on the time spent inside the station. We evaluated our system on large-scale and real-world datasets, which contain 4-month trajectories of 32,476 taxicabs, 689 gas stations, and the self-reported refueling details of 8,326 online users. The results show that our system can determine REs with an accuracy of more than 90%, estimate time spent with less than 2 minutes of error, and measure overall visits in the same order of magnitude with the records in the field study.
Article
The step of urbanization and modern civilization fosters different functional zones in a city, such as residential areas, business districts, and educational areas. In a metropolis, people commute between these functional zones every day to engage in different socioeconomic activities, e.g., working, shopping, and entertaining. In this paper, we propose a data-driven framework to discover functional zones in a city. Specifically, we introduce the concept of latent activity trajectory (LAT), which captures socioeconomic activities conducted by citizens at different locations in a chronological order. Later, we segment an urban area into disjointed regions according to major roads, such as highways and urban expressways. We have developed a topic-modeling-based approach to cluster the segmented regions into functional zones leveraging mobility and location semantics mined from LAT. Furthermore, we identify the intensity of each functional zone using Kernel Density Estimation. Extensive experiments are conducted with several urban scale datasets to show that the proposed framework offers a powerful ability to capture city dynamics and provides valuable calibrations to urban planners in terms of functional zones.
Article
It is traditionally a challenge for home buyers to understand, compare and contrast the investment values of real estates. While a number of estate appraisal methods have been developed to value real property, the performances of these methods have been limited by the traditional data sources for estate appraisal. However, with the development of new ways of collecting estate-related mobile data, there is a potential to leverage geographic dependencies of estates for enhancing estate appraisal. Indeed, the geographic dependencies of the value of an estate can be from the characteristics of its own neighborhood (individual), the values of its nearby estates (peer), and the prosperity of the affiliated latent business area (zone). To this end, in this paper, we propose a geographic method, named ClusRanking, for estate appraisal by leveraging the mutual enforcement of ranking and clustering power. ClusRanking is able to exploit geographic individual, peer, and zone dependencies in a probabilistic ranking model. Specifically, we first extract the geographic utility of estates from geography data, estimate the neighborhood popularity of estates by mining taxicab trajectory data, and model the influence of latent business areas via ClusRanking. Also, we use a linear model to fuse these three influential factors and predict estate investment values. Moreover, we simultaneously consider individual, peer and zone dependencies, and derive an estate-specific ranking likelihood as the objective function. Finally, we conduct a comprehensive evaluation with real-world estate related data, and the experimental results demonstrate the effectiveness of our method.
Article
In this paper, we propose a citywide and real-time model for estimating the travel time of any path (represented as a sequence of connected road segments) in real time in a city, based on the GPS trajectories of vehicles received in current time slots and over a period of history as well as map data sources. Though this is a strategically important task in many traffic monitoring and routing systems, the problem has not been well solved yet given the following three challenges. The first is the data sparsity problem, i.e., many road segments may not be traveled by any GPS-equipped vehicles in present time slot. In most cases, we cannot find a trajectory exactly traversing a query path either. Second, for the fragment of a path with trajectories, they are multiple ways of using (or combining) the trajectories to estimate the corresponding travel time. Finding an optimal combination is a challenging problem, subject to a tradeoff between the length of a path and the number of trajectories traversing the path (i.e., support). Third, we need to instantly answer users' queries which may occur in any part of a given city. This calls for an efficient, scalable and effective solution that can enable a citywide and real-time travel time estimation. To address these challenges, we model different drivers' travel times on different road segments in different time slots with a three dimension tensor. Combined with geospatial, temporal and historical contexts learned from trajectories and map data, we fill in the tensor's missing values through a context-aware tensor decomposition approach. We then devise and prove an object function to model the aforementioned tradeoff, with which we find the most optimal concatenation of trajectories for an estimate through a dynamic programming solution. In addition, we propose using frequent trajectory patterns (mined from historical trajectories) to scale down the candidates of concatenation and a suffix-tree-based index to manage the trajectories received in the present time slot. We evaluate our method based on extensive experiments, using GPS trajectories generated by more than 32,000 taxis over a period of two months. The results demonstrate the effectiveness, efficiency and scalability of our method beyond baseline approaches.
Article
This paper instantly infers the gas consumption and pollution emission of vehicles traveling on a city's road network in a current time slot, using GPS trajectories from a sample of vehicles (e.g., taxicabs). The knowledge can be used to suggest cost-efficient driving routes as well as identifying road segments where gas has been wasted significantly. The instant estimation of the emissions from vehicles can enable pollution alerts and help diagnose the root cause of air pollution in the long run. In our method, we first compute the travel speed of each road segment using the GPS trajectories received recently. As many road segments are not traversed by trajectories (i.e., data sparsity), we propose a Travel Speed Estimation (TSE) model based on a context-aware matrix factorization approach. TSE leverages features learned from other data sources, e.g., map data and historical trajectories, to deal with the data sparsity problem. We then propose a Traffic Volume Inference (TVI) model to infer the number of vehicles passing each road segment per minute. TVI is an unsupervised Bayesian Network that incorporates multiple factors, such as travel speed, weather conditions and geographical features of a road. Given the travel speed and traffic volume of a road segment, gas consumption and emissions can be calculated based on existing environmental theories. We evaluate our method based on extensive experiments using GPS trajectories generated by over 32,000 taxis in Beijing over a period of two months. The results demonstrate the advantages of our method over baselines, validating the contribution of its components and finding interesting discoveries for the benefit of society.
Article
The increasing pervasiveness of location-acquisition technologies has enabled collection of huge amount of trajectories for almost any kind of moving objects. Discovering useful patterns from their movement behaviors can convey valuable knowledge to a variety of critical applications. In this light, we propose a novel concept, called gathering, which is a trajectory pattern modeling various group incidents such as celebrations, parades, protests, traffic jams and so on. A key observation is that these incidents typically involve large congregations of individuals, which form durable and stable areas with high density. In this work, we first develop a set of novel techniques to tackle the challenge of efficient discovery of gathering patterns on archived trajectory dataset. Afterwards, since trajectory databases are inherently dynamic in many real-world scenarios such as traffic monitoring, fleet management and battlefield surveillance, we further propose an online discovery solution by applying a series of optimization schemes, which can keep track of gathering patterns while new trajectory data arrive. Finally, the effectiveness of the proposed concepts and the efficiency of the approaches are validated by extensive experiments based on a real taxicab trajectory dataset.
Conference Paper
Urban transportation is increasingly studied due to its complexity and economic importance. It is also a major component of urban energy use and pollution. The importance of this topic will only increase as urbanization continues around the world. A less researched aspect of transportation is the refueling behavior of drivers. In this paper, we propose a step toward real-time sensing of refueling behavior and citywide petrol consumption. We use reported trajectories from a fleet of GPS-equipped taxicabs to detect gas station visits, measure the time spent, and estimate overall demand. For times and stations with sparse data, we use collaborative filtering to estimate conditions. Our system provides real-time estimates of gas stations' waiting times, from which recommendations could be made, an indicator of overall gas usage, from which macro-scale economic decisions could be made, and a geographic view of the efficiency of gas station placement.
Conference Paper
The advances in mobile computing and social networking services enable people to probe the dynamics of a city. In this paper, we address the problem of detecting and describing traffic anomalies using crowd sensing with two forms of data, human mobility and social media. Traffic anomalies are caused by accidents, control, protests, sport events, celebrations, disasters and other events. Unlike existing traffic-anomaly-detection methods, we identify anomalies according to drivers' routing behavior on an urban road network. Here, a detected anomaly is represented by a sub-graph of a road network where drivers' routing behaviors significantly differ from their original patterns. We then try to describe the detected anomaly by mining representative terms from the social media that people posted when the anomaly happened. The system for detecting such traffic anomalies can benefit both drivers and transportation authorities, e.g., by notifying drivers approaching an anomaly and suggesting alternative routes, as well as supporting traffic jam diagnosis and dispersal. We evaluate our system with a GPS trajectory dataset generated by over 30,000 taxicabs over a period of 3 months in Beijing, and a dataset of tweets collected from WeiBo, a Twitter-like social site in China. The results demonstrate the effectiveness and efficiency of our system.
Conference Paper
Information about urban air quality, e.g., the concentration of PM2.5, is of great importance to protect human health and control air pollution. While there are limited air-quality-monitor-stations in a city, air quality varies in urban spaces non-linearly and depends on multiple factors, such as meteorology, traffic volume, and land uses. In this paper, we infer the real-time and fine-grained air quality information throughout a city, based on the (historical and real-time) air quality data reported by existing monitor stations and a variety of data sources we observed in the city, such as meteorology, traffic flow, human mobility, structure of road networks, and point of interests (POIs). We propose a semi-supervised learning approach based on a co-training framework that consists of two separated classifiers. One is a spatial classifier based on an artificial neural network (ANN), which takes spatially-related features (e.g., the density of POIs and length of highways) as input to model the spatial correlation between air qualities of different locations. The other is a temporal classifier based on a linear-chain conditional random field (CRF), involving temporally-related features (e.g., traffic and meteorology) to model the temporal dependency of air quality in a location. We evaluated our approach with extensive experiments based on five real data sources obtained in Beijing and Shanghai. The results show the advantages of our method over four categories of baselines, including linear/Gaussian interpolations, classical dispersion models, well-known classification models like decision tree and CRF, and ANN.
Conference Paper
The rise of GPS-equipped mobile devices has led to the emergence of big trajectory data. In this paper, we study a new path finding query which finds the most frequent path (MFP) during user-specified time periods in large-scale historical trajectory data. We refer to this query as time period-based MFP (TPMFP). Specifically, given a time period T, a source v_s and a destination v_d, TPMFP searches the MFP from v_s to v_d during T. Though there exist several proposals on defining MFP, they only consider a fixed time period. Most importantly, we find that none of them can well reflect people's common sense notion which can be described by three key properties, namely suffix-optimal (i.e., any suffix of an MFP is also an MFP), length-insensitive (i.e., MFP should not favor shorter or longer paths), and bottleneck-free (i.e., MFP should not contain infrequent edges). The TPMFP with the above properties will reveal not only common routing preferences of the past travelers, but also take the time effectiveness into consideration. Therefore, our first task is to give a TPMFP definition that satisfies the above three properties. Then, given the comprehensive TPMFP definition, our next task is to find TPMFP over huge amount of trajectory data efficiently. Particularly, we propose efficient search algorithms together with novel indexes to speed up the processing of TPMFP. To demonstrate both the effectiveness and the efficiency of our approach, we conduct extensive experiments using a real dataset containing over 11 million trajectories.
Conference Paper
Due to the prevalence of GPS-enabled devices and wireless communications technologies, spatial trajectories that describe the movement history of moving objects are being generated and accumulated at an unprecedented pace. Trajectory data in a database are intrinsically heterogeneous, as they represent discrete approximations of original continuous paths derived using different sampling strategies and different sampling rates. Such heterogeneity can have a negative impact on the effectiveness of trajectory similarity measures, which are the basis of many crucial trajectory processing tasks. In this paper, we pioneer a systematic approach to trajectory calibration that is a process to transform a heterogeneous trajectory dataset to one with (almost) unified sampling strategies. Specifically, we propose an anchor-based calibration system that aligns trajectories to a set of anchor points, which are fixed locations independent of trajectory data. After examining four different types of anchor points for the purpose of building a stable reference system, we propose a geometry-based calibration approach that considers the spatial relationship between anchor points and trajectories. Then a more advanced model-based calibration method is presented, which exploits the power of machine learning techniques to train inference models from historical trajectory data to improve calibration effectiveness. Finally, we conduct extensive experiments using real trajectory datasets to demonstrate the effectiveness and efficiency of the proposed calibration system.
Conference Paper
Taxi ridesharing can be of significant social and environmental benefit, e.g. by saving energy consumption and satisfying people's commute needs. Despite the great potential, taxi ridesharing, especially with dynamic queries, is not well studied. In this paper, we formally define the dynamic ridesharing problem and propose a large-scale taxi ridesharing service. It efficiently serves real-time requests sent by taxi users and generates ridesharing schedules that reduce the total travel distance significantly. In our method, we first propose a taxi searching algorithm using a spatio-temporal index to quickly retrieve candidate taxis that are likely to satisfy a user query. A scheduling algorithm is then proposed. It checks each candidate taxi and inserts the query's trip into the schedule of the taxi which satisfies the query with minimum additional incurred travel distance. To tackle the heavy computational load, a lazy shortest path calculation strategy is devised to speed up the scheduling algorithm. We evaluated our service using a GPS trajectory dataset generated by over 33,000 taxis during a period of 3 months. By learning the spatio-temporal distributions of real user queries from this dataset, we built an experimental platform that simulates user real behaviours in taking a taxi. Tested on this platform with extensive experiments, our approach demonstrated its efficiency, effectiveness, and scalability. For example, our proposed service serves 25% additional taxi users while saving 13% travel distance compared with no-ridesharing (when the ratio of the number of queries to that of taxis is 6).
Conference Paper
The increasing pervasiveness of location-acquisition technologies has enabled collection of huge amount of trajectories for almost any kind of moving objects. Discovering useful patterns from their movement behaviours can convey valuable knowledge to a variety of critical applications. In this light, we propose a novel concept, called gathering, which is a trajectory pattern modelling various group incidents such as celebrations, parades, protests, traffic jams and so on. A key observation is that these incidents typically involve large congregations of individuals, which form durable and stable areas with high density. Since the process of discovering gathering patterns over large-scale trajectory databases can be quite lengthy, we further develop a set of well thought out techniques to improve the performance. These techniques, including effective indexing structures, fast pattern detection algorithms implemented with bit vectors, and incremental algorithms for handling new trajectory arrivals, collectively constitute an efficient solution for this challenging task. Finally, the effectiveness of the proposed concepts and the efficiency of the approaches are validated by extensive experiments based on a real taxicab trajectory dataset.
Conference Paper
Nowadays, many applications return to the user a set of results that take the query as their nearest neighbor, which are commonly expressed through reverse nearest neighbor (RNN) queries. When considering moving objects, users would like to find objects that appear in the RNN result set for a period of time in some real-world applications such as collaboration recommendation and anti-tracking. In this work, we formally define the problem of interval reverse nearest neighbor (IRNN) queries over moving objects, which return the objects that maintain nearest neighboring relations to the moving query objects for the longest time in the given interval. Location uncertainty of moving data objects and moving query objects is inherent in various domains, and we investigate objects that exhibit Markov correlations, that is, each object's location is only correlated with its own location at previous timestamp while being independent of other objects. There exists the efficiency challenge for answering IRNN queries on uncertain moving objects with Markov correlations since we have to retrieve not only all the possible locations of each object at current time but also its historically possible locations. To speed up the query processing, we present a general framework for answering IRNN queries on uncertain moving objects with Markov correlations in two phases. In the first phase, we apply space pruning and probability pruning techniques, which reduce the search space significantly. In the second phase, we verify whether each unpruned object is an IRNN of the query object. During this phase, we propose an approach termed Probability Decomposition Verification (PDV) algorithm which avoid computing the probability of any object being an RNN of the query object exactly and thus improve the efficiency of verification. The performance of the proposed algorithm is demonstrated by extensive experiments on synthetic and real datasets, and the experimental results show that our algori- hm is more efficient than the Monte-Carlo based approximate algorithm.
Conference Paper
We propose a novel two-step mining and optimization framework for inferring the root cause of anomalies that appear in road traffic data. We model road traffic as a time-dependent flow on a network formed by partitioning a city into regions bounded by major roads. In the first step we identify link anomalies based on their deviation from their historical traffic profile. However, link anomalies on their own shed very little light on what caused them to be anomalous. In the second step we take a generative approach by modeling the flow in a network in terms of the origin-destination (OD) matrix which physically relates the latent flow between origin and destination and the observable flow on the links. The key insight is that instead of using all of link traffic as the observable vector we only use the link anomaly vector. By solving an L1 inverse problem we infer the routes (the origin-destination pairs) which gave rise to the link anomalies. Experiments on a very large GPS data set consisting on nearly eight hundred million data points demonstrate that we can discover routes which can clearly explain the appearance of link anomalies. The use of optimization techniques to explain observable anomalies in a generative fashion is, to the best of our knowledge, entirely novel.
Article
Taxi is a major transportation in the urban area, offering great benefits and convenience to our daily life. However, one of the major business fraud in taxis is the charging fraud, specifically overcharging for the actual distance. In practice, it is hard for us to always monitor taxis and detect such fraud. Due to the Global Positioning System (GPS) embedded in taxis, we can collect the GPS reports from the taxis' locations, and thus, it is possible for us to retrieve their traces. Intuitively, we can utilize such information to construct taxis' trajectories, compute the actual service distance on the city map, and detect fraudulent behaviors. However, in practice, due to the extremely limited reports, notable location errors, complex city map, and road networks, our task to detect taxi fraud faces significant challenges, and the previous methods cannot work well. In this paper, we have a critical and interesting observation that fraudulent taxis always play a secret trick, i.e., modifying the taximeter to a smaller scale. As a result, it not only makes the service distance larger but also makes the reported taxi speed larger. Fortunately, the speed information collected from the GPS reports is accurate. Hence, we utilize the speed information to design a system, which is called the Speed-based Fraud Detection System (SFDS), to model taxi behaviors and detect taxi fraud. Our method is robust to the location errors and independent of the map information and road networks. At the same time, the experiments on real-life data sets confirm that our method has better accuracy, scalability, and more efficient computation, compared with the previous related methods. Finally, interesting findings of our work and discussions on potential issues are provided in this paper for future city transportation and human behavior research.
Article
Location data becomes more and more important. In this paper, we focus on the trajectory data, and propose a new framework, namely PRESS (Paralleled Road-Network-Based Trajectory Compression), to effectively compress trajectory data under road network constraints. Different from existing work, PRESS proposes a novel representation for trajectories to separate the spatial representation of a trajectory from the temporal representation, and proposes a Hybrid Spatial Compression (HSC) algorithm and error Bounded Temporal Compression (BTC) algorithm to compress the spatial and temporal information of trajectories respectively. PRESS also supports common spatial-temporal queries without fully decompressing the data. Through an extensive experimental study on real trajectory dataset, PRESS significantly outperforms existing approaches in terms of saving storage cost of trajectory data with bounded errors.
Conference Paper
Destination prediction is an essential task for many emerging location based applications such as recommending sightseeing places and targeted advertising based on destination. A common approach to destination prediction is to derive the probability of a location being the destination based on historical trajectories. However, existing techniques using this approach suffer from the “data sparsity problem”, i.e., the available historical trajectories is far from being able to cover all possible trajectories. This problem considerably limits the number of query trajectories that can obtain predicted destinations. We propose a novel method named Sub-Trajectory Synthesis (SubSyn) algorithm to address the data sparsity problem. SubSyn algorithm first decomposes historical trajectories into sub-trajectories comprising two neighbouring locations, and then connects the sub-trajectories into “synthesised” trajectories. The number of query trajectories that can have predicted destinations is exponentially increased by this means. Experiments based on real datasets show that SubSyn algorithm can predict destinations for up to ten times more query trajectories than a baseline algorithm while the SubSyn prediction algorithm runs over two orders of magnitude faster than the baseline algorithm. In this paper, we also consider the privacy protection issue in case an adversary uses SubSyn algorithm to derive sensitive location information of users. We propose an efficient algorithm to select a minimum number of locations a user has to hide on her trajectory in order to avoid privacy leak. Experiments also validate the high efficiency of the privacy protection algorithm.
Article
The increasing availability of large-scale trajectory data provides us great opportunity to explore them for knowledge discovery in transportation systems using advanced data mining techniques. Nowadays, large number of taxicabs in major metropolitan cities are equipped with a GPS device. Since taxis are on the road nearly 24 h a day (with drivers changing shifts), they can now act as reliable sensors to monitor the behavior of traffic. In this article, we use GPS data from taxis to monitor the emergence of unexpected behavior in the Beijing metropolitan area, which has the potential to estimate and improve traffic conditions in advance. We adapt likelihood ratio test statistic (LRT) which have previously been mostly used in epidemiological studies to describe traffic patterns. To the best of our knowledge the use of LRT in traffic domain is not only novel but results in accurate and rapid detection of anomalous behavior.
Article
The availability of massive network and mobility data from diverse domains has fostered the analysis of human behavior and interactions. Broad, extensive, and multidisciplinary research has been devoted to the extraction of non-trivial knowledge from this novel form of data. We propose a general method to determine the influence of social and mobility behavior over a specific geographical area in order to evaluate to what extent the current administrative borders represent the real basin of human movement. We build a network representation of human movement starting with vehicle GPS tracks and extract relevant clusters, which are then mapped back onto the territory, finding a good match with the existing administrative borders. The novelty of our approach is the focus on a detailed spatial resolution, we map emerging borders in terms of individual municipalities, rather than macro regional or national areas. We present a series of experiments to illustrate and evaluate the effectiveness of our approach.
Article
With the increasing popularity of location-based services, we have accumulated a lot of location data on the Web. In this paper, we are interested in answering two popular location-related queries in our daily life: 1) if we want to do something such as sightseeing or dining in a large city like Beijing, where should we go? 2) If we want to visit a place such as the Bird's Nest in Beijing Olympic park, what can we do there? We develop a mobile recommendation system to answer these queries. In our system, we first model the users' location and activity histories as a user-location-activity rating tensor1. Because each user has limited data, the resulting rating tensor is essentially very sparse. This makes our recommendation task difficult. In order to address this data sparsity problem, we propose three algorithms2 based on collaborative filtering. The first algorithm merges all the users' data together, and uses a collective matrix factorization model to provide general recommendation [3]. The second algorithm treats each user differently and uses a collective tensor and matrix factorization model to provide personalized recommendation [4]. The third algorithm is a new algorithm which further improves our previous two algorithms by using a ranking-based collective tensor and matrix factorization model. Instead of trying to predict the missing entry values as accurately as possible, it focuses on directly optimizing the ranking loss w.r.t. user preferences on the locations and activities. Therefore, it is more consistent with our ultimate goal of ranking locations/activities for recommendations. For these three algorithms, we also exploit some additional information, such as user-user similarities, location features, activity-activity correlations and user-location preferences, to help the CF tasks. We extensively evaluate our algorithms using a real-world GPS dataset collected by 119 users over 2.5 years. We show that all our three algorithms can consistently outperform the competing baselines, and our newly proposed third algorithm can also outperform our other two previous algorithms.
Article
The advance of mobile technologies leads to huge volumes of spatio-temporal data collected in the form of trajectory data stream. In this study, we investigate the problem of discovering object groups that travel together (i.e., traveling companions) from trajectory data streams. Such technique has broad applications in the areas of scientific study, transportation management and military surveillance. To discover traveling companions, the monitoring system should cluster the objects of each snapshot and intersect the clustering results to retrieve moving-together objects. Since both clustering and intersection steps involve high computational overhead, the key issue of companion discovery is to improve the efficiency of algorithms. We propose the models of closed companion candidates and smart intersection to accelerate data processing. A data structure termed traveling buddy is designed to facilitate scalable and flexible companion discovery from trajectory streams. The traveling buddies are micro-groups of objects that are tightly bound together. By only storing the object relationships rather than their spatial coordinates, the buddies can be dynamically maintained along trajectory stream with low cost. Based on traveling buddies, the system can discover companions without accessing the object details. In addition, we extend the proposed framework to discover companions on more complicated scenarios with spatial and temporal constraints, such as on the road network and battlefield. The proposed methods are evaluated with extensive experiments on both real and synthetic datasets. Experimental results show that our proposed buddy-based approach is an order of magnitude faster than the baselines and achieves higher accuracy in companion discovery.
Article
The location-based social networks have been becoming flourishing in recent years. In this paper, we aim to estimate the similarity between users according to their physical location histories (represented by GPS trajectories). This similarity can be regarded as a potential social tie between users, thereby enabling friend and location recommendations. Different from previous work using social structures or directly matching users' physical locations, this approach model a user's GPS trajectories with a semantic location history (SLH), e.g., shopping malls → restaurants→ cinemas. Then, we measure the similarity between different users' SLHs by using our maximal travel match (MTM) algorithm. The advantage of our approach lies in two aspects. First, SLH carries more semantic meanings of a user's interests beyond low-level geographic positions. Second, our approach can estimate the similarity between two users without overlaps in the geographic spaces, e.g., people living in different cities. When matching SLHs, we consider the sequential property, the granularity and the popularity of semantic locations. We evaluate our method based on a real-world GPS dataset collected by 109 users in a period of 1 year. The results show that SLH outperforms a physical-location-based approach and MTM is more effective than several widely used sequence matching approaches given this application scenario.
Article
This is a tutorial on location-based social networks (LBSNs), introducing the concept, unique features, and research philosophy of LBSNs and the representative research into LBSNs. The homepage of LBSN is http://research.microsoft.com/en-us/projects/lbsn/default.aspx.
Article
Region-based analysis is fundamental and crucial in many geospatial-related applications and research themes, such as trajectory analysis, human mobility study and urban planning. In this paper, we report on an image-processing-based approach to segment urban areas into regions by road networks. Here, each segmented region is bounded by the high-level road segments, covering some neighborhoods and low-level streets. Typically, road segments are classified into different levels (e.g., highways and expressways are usually high-level roads), providing us with a more natural and semantic segmentation of urban spaces than the grid-based partition method. We show that through simple morphological operators, an urban road network can be efficiently segmented into regions. In addition, we present a case study in trajectory mining to demonstrate the usability of the proposed segmentation method. Please cite the following papers when using this segmentation tool: [1] Yu Zheng, Yanchi Liu, Jing Yuan, and Xing Xie. Urban Computing with Taxicabs, ACM Ubicomp, 16 September 2011. [2] Nicholas Jing Yuan, Yu Zheng and Xing Xie, Segmentation of Urban Areas Using Road Networks, MSR-TR-2012-65, 2012.
Article
This paper presents a recommender system for both taxi drivers and people expecting to take a taxi, using the knowledge of 1) passengers' mobility patterns and 2) taxi drivers' picking-up/dropping-off behaviors learned from the GPS trajectories of taxicabs. First, this recommender system provides taxi drivers with some locations and the routes to these locations, towards which they are more likely to pick up passengers quickly (during the routes or in these locations) and maximize the profit of the next trip. Second, it recommends people with some locations (within a walking distance) where they can easily find vacant taxis. In our method, we learn the above-mentioned knowledge (represented by probabilities) from GPS trajectories of taxis. Then, we feed the knowledge into a probabilistic model which estimates the profit of the candidate locations for a particular driver based on where and when the driver requests the recommendation. We build our system using historical trajectories generated by over 12,000 taxis during 110 days and validate the system with extensive evaluations including in-the-field user studies.
Article
Advanced technology in GPS and sensors enables us to track physical events, such as human movements and facility usage. Periodicity analysis from the recorded data is an important data mining task which provides useful insights into the physical events and enables us to report outliers and predict future behaviors. To mine periodicity in an event, we have to face real-world challenges of inherently complicated periodic behaviors and imperfect data collection problem. Specifically, the hidden temporal periodic behaviors could be oscillating and noisy, and the observations of the event could be incomplete. In this paper, we propose a novel probabilistic measure for periodicity and design a practical method to detect periods. Our method has thoroughly considered the uncertainties and noises in periodic behaviors and is provably robust to incomplete observations. Comprehensive experiments on both synthetic and real datasets demonstrate the effectiveness of our method.
Article
The advances in location-acquisition technologies have led to a myriad of spatial trajectories. These trajectories are usually generated at a low or an irregular frequency due to applications' characteristics or energy saving, leaving the routes between two consecutive points of a single trajectory uncertain (called an uncertain trajectory). In this paper, we present a Route Inference framework based on Collective Knowledge (abbreviated as RICK) to construct the popular routes from uncertain trajectories. Explicitly, given a location sequence and a time span, the RICK is able to construct the top-k routes which sequentially pass through the locations within the specified time span, by aggregating such uncertain trajectories in a mutual reinforcement way (i.e., uncertain + uncertain → certain). Our work can benefit trip planning, traffic management, and animal movement studies. The RICK comprises two components: routable graph construction and route inference. First, we explore the spatial and temporal characteristics of uncertain trajectories and construct a routable graph by collaborative learning among the uncertain trajectories. Second, in light of the routable graph, we propose a routing algorithm to construct the top-k routes according to a user-specified query. We have conducted extensive experiments on two real datasets, consisting of Foursquare check-in datasets and taxi trajectories. The results show that RICK is both effective and efficient.
Article
The problem of modeling and managing uncertain data has received a great deal of interest, due to its manifold applications in spatial, temporal, multimedia and sensor databases. There exists a wide range of work covering spatial uncertainty in the static (snapshot) case, where only one point of time is considered. In contrast, the problem of modeling and querying uncertain spatio-temporal data has only been treated as a simple extension of the spatial case, disregarding time dependencies between consecutive timestamps. In this work, we present a framework for efficiently modeling and querying uncertain spatio-temporal data. The key idea of our approach is to model possible object trajectories by stochastic processes. This approach has three major advantages over previous work. First it allows answering queries in accordance with the possible worlds model. Second, dependencies between object locations at consecutive points in time are taken into account. And third it is possible to reduce all queries on this model to simple matrix multiplications. Based on these concepts we propose efficient solutions for different probabilistic spatio-temporal queries. In an experimental evaluation we show that our approaches are several order of magnitudes faster than state-of-the-art competitors.
Article
Improving on Our ExplanationIntellectual Impact and LegacyFurther ReadingReferences
Article
The development of a city gradually fosters different functional regions, such as educational areas and business districts. In this paper, we propose a framework (titled DRoF) that Discovers Regions of different Functions in a city using both human mobility among regions and points of interests (POIs) located in a region. Specifically, we segment a city into disjointed regions according to major roads, such as highways and urban express ways. We infer the functions of each region using a topic-based inference model, which regards a region as a document, a function as a topic, categories of POIs (e.g., restaurants and shopping malls) as metadata (like authors, affiliations, and key words), and human mobility patterns (when people reach/leave a region and where people come from and leave for) as words. As a result, a region is represented by a distribution of functions, and a function is featured by a distribution of mobility patterns. We further identify the intensity of each function in different locations. The results generated by our framework can benefit a variety of applications, including urban planning, location choosing for a business, and social recommendations. We evaluated our method using large-scale and real-world datasets, consisting of two POI datasets of Beijing (in 2010 and 2011) and two 3-month GPS trajectory datasets (representing human mobility) generated by over 12,000 taxicabs in Beijing in 2010 and 2011 respectively. The results justify the advantages of our approach over baseline methods solely using POIs or human mobility.
Article
The implementation of automated cartography has resulted in the digitization of linear data and the development of simplification algorithms for generalizing these data. Some algorithms, such as the nth point and random point methods are simple in both practice and operation. Others, such as polynomial reconstruction, appear to be conceptually overly complex and computationally time consuming. This study presents a method for the evaluation of simplification algorithms. Thirty mathematical measures are developed for this purpose, including both single attribute measurements, which may be applied to a single line, and measures of displacement for evaluating differences between a line and its simplification.