Article

Habit2vec: Trajectory Semantic Embedding for Living Pattern Recognition in Population

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Recognizing representative living patterns in population is extremely valuable for urban planning and decision making. Thanks to the growing popularity of location-based applications and check-ins on social networking sites, Point of Interest (POI) of a location is quite often available in the trajectory data, which expresses user living semantics. However, adopting trajectory semantics for living pattern recognition is an open and challenging research problem due to three major technical challenges: effective feature representation, suitable granularity selection for habit unit, and reliable habit distance measurement. In this paper, we propose a representation learning based system named habit2vec to represent user trajectory semantics in vector space, which preserves the original user living habit information. We evaluated our proposed system on a large-scale real-world dataset provided by a popular social network operator including 123,803 users for 1.5 months in Beijing. The results justify the representation ability of our system in preserving user habit pattern, and demonstrate the effectiveness of clustering users with similar living habits.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... State-of-the-art papers include the adoption of neural embeddings for model locations, points of interest (PoIs) [10][11][12] or to learn the temporal interactions between users and items [13]. The papers [14][15][16] also focus on the next location prediction task from trajectories obtained by external sensors and assume that a road network constrains object movements. ...
... The Zipf Law controls the word frequency distribution in natural language [22]. In [10], the authors observed the same Zipf's Law behavior on human mobility habits and used this analogy to apply natural language models to learning representation for living habits. Figure 3 (observed in the dataset utilized in our experiments) shows the distribution of sensor observations, which roughly follow Zipf's Law, as expected. ...
... Habit2vec [10] models a person's habit as a vector that upgrades the word2vec model according to the particular characteristics of trajectories. The paper aims to find the similarities between living patterns that engage in similar behavior at similar times instead of staying in geographically neighboring locations. ...
Article
Full-text available
Representation learning seeks to extract useful and low-dimensional attributes from complex and high-dimensional data. Natural language processing (NLP) was used to investigate the representation learning models to extract words’ feature vectors using their sequential order in the text via word embeddings and language models that maintain their semantic meaning. Inspired by NLP, in this paper, we tackle the representation learning problem for trajectories, using NLP methods to encode external sensors positioned in the road network and generate the features’ space to predict the next vehicle movement. We evaluate the vector representations of on-road sensors and trajectories using extrinsic and intrinsic strategies. Our results have shown the potential of natural language models to describe the space of features on trajectory applications as the next location prediction.
... From the perspective of the types of raw data, the recognition of urban mobility patterns can be divided into two categories, namely, researches based on trajectory data and that based on AFC data. e former is mainly meant to reproduce the movement track of residents through GPS data, social media data, or mobile phone signaling data to identify mobility patterns [6][7][8][9][10][11][12]. Unlike this, the latter often uses the tap-in or tap-out data of passengers to describe the travel process in order to realize the analysis of travel patterns [1,[13][14][15][16][17][18][19][20]. ...
... First, the concrete spatial information and temporal information being transformed into abstract vector forms are convenient for large-scale processing by computers (for example, similarity calculation). Second, vectorization can extract the characteristics of travel records to the maximum extent while saving storage space to explore the internal mechanism of passenger mobility [7]. e contribution of this paper is threefold. ...
... ird, a density-based clustering algorithm is used to identify passenger mobility patterns. It can generate the number of clusters according to the data distribution without manually specifying the number of clusters, avoiding the human intervention of existing methods [7,24]. e structure of this paper is as follows. ...
Article
Full-text available
Urban mobility pattern recognition has great potential in revealing human travel mechanism, discovering passenger travel purpose, and predicting and managing traffic demand. This paper aims to propose a data-driven method to identify metro passenger mobility patterns based on Automatic Fare Collection (AFC) data and geo-based data. First, Point of Information (POI) data within 500 meters of the metro stations are captured to characterize the spatial attributes of the stations. Especially, a fusion method of multisource geo-based data is proposed to convert raw POI data into weighted POI data considering service capabilities. Second, an unsupervised learning framework based on stacked auto-encoder (SAE) is designed to embed the spatiotemporal information of trips into low-dimensional dense trip vectors. In detail, the embedded spatiotemporal information includes spatial features (POI categories around the origin station and that around the destination station) and temporal features (start time, day of the week, and travel time). Third, a density-based clustering algorithm is introduced to identify passenger mobility patterns based on the embedded dense trip vectors. Finally, a case of Beijing metro network is used to verify the feasibility of the above methodology. The results show that the proposed method performs well in recognizing mobility patterns and outperforms the existing methods.
... Figure 4 shows various stay points of moving objects. Various stay points and Pointof-Interest (POI) algorithm have been enforced to get the future location [16,65,80,86]. ...
... Such a model presents the concept of social relationships among living entities. Social relation [24,49,51], Semantic data [65] are some examples of the group model. Zhu et al. [47] studied the social relationship and its features like no. of visits, friend's recommendations, etc. Cao et al. [65] analysed the similar living habits of the population. ...
... Social relation [24,49,51], Semantic data [65] are some examples of the group model. Zhu et al. [47] studied the social relationship and its features like no. of visits, friend's recommendations, etc. Cao et al. [65] analysed the similar living habits of the population. Tsai et al. [23] suggested a distributed algorithm to group together similar movement patterns into the same cluster. ...
Article
Full-text available
Recent explorative growth in telecommunication and telepathy technology has flooded the market with location-based data, which paves the way for location-aware prediction services. These applications have vast domain influence in route navigation, recommendation system, traffic-congestion control, ecological study, climatological forecast, and many more. Research efforts are spent on the put-forward overall picture of location prediction through trajectory data. This survey offers an extensive overview of location prediction enveloping basic definitions and concepts, data sources, approaches, and applications. Moreover, Spatial–Temporal pattern-based prediction models are discussed, highlighting the advantages and disadvantages of each. Sequential, periodic, and frequent pattern mining advances are noted. This paper presents a recent deep learning methodology for extracting features of a large trajectory. Distributive big data models using Hadoop and MapReduce frameworks are recorded. Location prediction using the social media platform is mentioned. Content-based and semantic mining models are studied. Tables and diagrams are displayed to provide at a glance view to facilitate smooth understanding. Furthermore, application and challenges are addressed related to the next location prediction. The overall conclusion of the survey and future directions are also listed.
... Trajectory is traditionally described as a sequence of spatially located points with time stamps [1,2]. As a temporal record of interactions between users and the spatial environment, driving trajectories are capable of demonstrating users' behavioural characteristics and travel intention, which can be exploited further in various applications such as user portrait analysis [3,4], next location recommendation [5][6][7][8], and human activity classification [9]. Patterns mined from trajectories can also offer instructions and advice for transportation system optimisation and city planning. ...
... Ying et al. [21,22] presented semantic trajectory mining methods for location predictions. Most current models prefer to incorporate spatialtemporal records with POI to extract more explicit information [4,9]. However, more data sources can provide more detailed information. ...
Article
Driving trajectory representation learning is of great significance for various location-based services such as driving pattern mining and route recommendation. However, previous representation generation approaches rarely address three challenges: (1) how to represent the intricate semantic intentions of mobility inexpensively, (2) complex and weak spatial–temporal dependencies due to the sparsity and heterogeneity of the trajectory data, and (3) route selection preferences and their correlation to driving behaviour. In this study, we propose a novel multimodal fusion model, DouFu, for trajectory representation joint learning, which applies a multimodal learning and attention fusion module to capture the internal characteristics of trajectories. We first design movement, route, and global features generated from the trajectory data and urban functional zones, and then analyse them with an with the attention encoder or fully connected network. The attention fusion module incorporates route features with movement features to create more effective spatial–temporal embedding. Combined with the global semantic feature, DouFu produced a comprehensive embedding for each trajectory. We evaluated the representations generated by our method and other baseline models on the classification and clustering tasks. The empirical results show that DouFu outperforms other models in most learning algorithms, such as the linear regression and the support vector machines, by more than 10%.
... Trajectory is traditionally described as a sequence of spatially located points with time stamps [1,2]. As a temporal record of interactions between users and the spatial environment, driving trajectories are capable of demonstrating users' behavioural characteristics and travel intention, which can be exploited further in various applications such as user portrait analysis [3,4], next location recommendation [5][6][7][8], and human activity classification [9]. Patterns mined from trajectories can also offer instructions and advice for transportation system optimisation and city planning. ...
... Ying et al. [21,22] presented semantic trajectory mining methods for location predictions. Most current models prefer to incorporate spatialtemporal records with POI to extract more explicit information [4,9]. However, more data sources can provide more detailed information. ...
Preprint
Full-text available
Driving trajectory representation learning is of great significance for various location-based services such as driving pattern mining and route recommendation. However, previous representation generation approaches rarely address three challenges: (1) how to represent the intricate semantic intentions of mobility inexpensively, (2) complex and weak spatial–temporal dependencies due to the sparsity and heterogeneity of the trajectory data, and (3) route selection preferences and their correlation to driving behaviour. In this study, we propose a novel multimodal fusion model, DouFu, for trajectory representation joint learning, which applies a multimodal learning and attention fusion module to capture the internal characteristics of trajectories. We first design movement, route, and global features generated from the trajectory data and urban functional zones, and then analyse them with an with the attention encoder or fully connected network. The attention fusion module incorporates route features with movement features to create better spatial–temporal embedding. Using the global semantic feature, DouFu produced a comprehensive embedding for each trajectory. We evaluated the representations generated by our method and other baseline models for the classification and clustering tasks. The empirical results show that DouFu outperforms other models in most learning algorithms, such as the linear regression and the support vector machines, by more than 10%.
... Reachability summary generation and its contractive reconstruction can together be viewed as the generative pretext task that encodes the co-occurrence relationships inherent in geospatial transitions (resulting from interaction of traffic and local transport infrastructure) to obtain reachability embeddings. GPS Trajectory Embeddings: Most existing meth-ods using GPS record or trajectory representations process trajectories similar to sequences with NLP-inspired methods like skip-gram or RNNs e.g., location similarity prediction [7], motion modality classification [9,25,16], demographic attribute prediction [42], and living pattern recognition [4]. To the best of our knowledge, no prior work uses the Markovian concept of reachability and a computer vision-based SSL pretext task to learn self-supervised, contextual representations of geographic locations. ...
... There are multiple ways, called local aggregate representations (LAR), simpler than reachability embeddings, to create semantically meaningful, multi-channel, imagelike representations of GPS trajectory datasets making them amenable for use by downstream computer vision models for geospatial tasks. The pixel values for a given zoom-24 tile are obtained by analyzing records observed only in that tile (local) to yield one (single-channel) or 4 Probe data is similar to publicly available GPS trajectory datasets such as [33,49]. More details can be found in the section titled "Probe data and privacy" in [36]. ...
Preprint
Full-text available
Self-supervised representation learning techniques utilize large datasets without semantic annotations to learn meaningful, universal features that can be conveniently transferred to solve a wide variety of downstream supervised tasks. In this paper, we propose a self-supervised method for learning representations of geographic locations from unlabeled GPS trajectories to solve downstream geospatial computer vision tasks. Tiles resulting from a raster representation of the earth's surface are modeled as nodes on a graph or pixels of an image. GPS trajectories are modeled as allowed Markovian paths on these nodes. A scalable and distributed algorithm is presented to compute image-like representations, called reachability summaries, of the spatial connectivity patterns between tiles and their neighbors implied by the observed Markovian paths. A convolutional, contractive autoencoder is trained to learn compressed representations, called reachability embeddings, of reachability summaries for every tile. Reachability embeddings serve as task-agnostic, feature representations of geographic locations. Using reachability embeddings as pixel representations for five different downstream geospatial tasks, cast as supervised semantic segmentation problems, we quantitatively demonstrate that reachability embeddings are semantically meaningful representations and result in 4-23% gain in performance, while using upto 67% less trajectory data, as measured using area under the precision-recall curve (AUPRC) metric, when compared to baseline models that use pixel representations that do not account for the spatial connectivity between tiles. Reachability embeddings transform sequential, spatiotemporal mobility data into semantically meaningful image-like representations that can be combined with other sources of imagery and are designed to facilitate multimodal learning in geospatial computer vision.
... Li et al. [32] aimed to geographically mine the similarity between users based on their location histories by considering both the sequence property of people's movement behaviors and the hierarchy property of geographic spaces. Cao et al. [11] proposed a representation learning-based system called habit2vec to represent user trajectory semantics in vector space, which preserves the original user living habit information. The system is evaluated on social network data of over 123,000 users and the results show the effectiveness of clustering users with similar living habits. ...
... Our sample was diverse in terms of demographics and geographic location. There are 68 female and 117 male participants ranged in age from 18 to over 50 (15% in 18-23, 69% in [24][25][26][27][28][29]11.9% in 30-35, 2.7% in 36-40, 0.5% in 36-40, 0 in 46-50, and 1.1% in 50+). ...
Article
Qualitative and quantitative user studies can reveal valuable insights into user behavior, which in turn can assist system designers in providing better user experiences. Car sharing (e.g., Zipcar and car2go), as an emerging App-based online shared mobility mode, has been increasing dramatically worldwide in recent years. However, to date, comprehensive user behavior in car sharing systems has not been investigated, which is essential for understanding their characteristics and promotion roadblocks. With the goal of understanding various facets of user behavior in online car sharing systems, in this paper, we performed a qualitative and quantitative user study by adopting a mixed-methods approach. We first designed an attitude-aware online survey with a set of qualitative questions to perceive people’s subjective attitudes to online car sharing, where a total of 185 participants (68 females) completed the survey. Next, we quantitatively analyzed a one-year real-world car sharing operation dataset collected from the Chinese city Beijing, which involves over 68,000 unique users and over 587,850 usage records. We dissected this attitude-free dataset to understand the objective car sharing user behavior from different dimensions, e.g., spatial, temporal, and demographic. Furthermore, we conducted a comparative study by utilizing one-year data from other two representative Chinese city Fuzhou and Lanzhou to show if the obtained findings from Beijing data may be generalizable to other cities having different urban features, e.g., different city size, population density, wealth, and climate conditions. We also do a case study by designing a user behavior-aware usage prediction model (i.e., BeXGBoost) based on findings from our user study (e.g., unbalanced spatiotemporal usage patterns, weekly regularity, demographic-related usage difference, and low-frequency revisitation), which is the basis for car sharing service station deployment and vehicle rebalancing. Finally, we summarize a set of findings obtained from our study about the unique user behavior in online car sharing systems, combined with some detailed discussions about implications for design.
... Human mobility is generally modelled as a stochastic process around fixed point [1] and various models for next location prediction [2], [3], [4], [5], [6] have been proposed. The main shortcoming of these mobility models, however, is that they overlook the activity (often referred to as the semantics of trajectory [11], [12], [13]) a person engages in at a location within a certain time, ...
... Inspired by PTE [45], Zhang et al. [15] dynamically model the semantic meaning of spatial-temporal points based on their co-occurrence with the texts in social media's check-ins through constructing a spatial-temporal-textual network. Yan et al. [46] adapt skip-gram model [42] for learning the representations of place types, and Cao et al. [11] propose representation learning based framework to embed trajectory semantics for living pattern recognition in population. Zhang et al. [21] propose a embedding-based method for online local event detection. ...
Article
Understanding human mobility benefits numerous applications such as urban planning, traffic control and city management. In this paper, we propose a novel semantics-aware mobility model that captures human mobility motivation using large-scale semantic-rich spatial-temporal data from location-based social networks. In our system, we first develop a multimodal embedding method to project user, location, time, and activity on the same embedding space in an unsupervised way while preserving original trajectory semantics. Then, we use hidden Markov model to learn latent states and transitions between them in the embedding space, which is the location embedding vector, to jointly consider spatial, temporal, and user motivations. In order to tackle the sparsity of individual mobility data, we further propose a von Mises-Fisher mixture clustering for user grouping so as to learn a reliable and fine-grained model for groups of users sharing mobility similarity. We evaluate our proposed method on two large-scale real-world datasets, where we validate the ability of our method to produce high-quality mobility models. We also conduct extensive experiments on the specific task of location prediction. The results show that our model outperforms state-of-the-art mobility models with higher prediction accuracy and much higher efficiency
... Considering the important role of the semantic layer for target behaviour analysis, scholars have semantically enriched trajectory [6][7][8] and explored semantic representation methods [9][10][11]. On this basis, some scholars performed the cluster analysis [12][13][14] of trajectory points by clustering around discrete semantic information, such as geographic tags [15] and attribute tags [16]. However, the semantic trajectory clustering analysis method did not identify the way that the moving target interacts with the environment, and the recognition of the moving target's behaviour is the key to analyse and judge the target situation. ...
Article
Full-text available
The semantic representation of the trajectory is conducive to enrich the content of trajectory data mining. A trajectory summarisation generation method based on the mobile robot behaviour analysis was proposed to realize the abstract expression and semantic representation of the spatio-temporal motion features of the robot and its environmental interaction state. First, the behavioural semantic modelling and representation of the mobile robot are completed by modelling the sub-trajectory and calculating the topological behaviour (TOP). Second, Chinese word segmentation and semantic slot filling methods are used to combine with hierarchical clustering to perform basic word extraction and classification for describing trajectory sentences. Then, the description language frame is extracted based on the TOP, and the final trajectory summarisation is generated. The result shows that the proposed method can semantically represent robot behaviours with different motion features and topological features, extract two verb-frameworks for describing the sentences according to their topological features, and dynamically adjust the syntactic structure for the different topological behaviours between the target and the environment. The proposed method can generate semantic information of relatively high quality for spatio-temporal data and help to understand the higher-order semantics of moving robot behaviour.
... Ref. [94] recommended trajectories. Ref. [95] clustered trajectories. Ref. [96] predicted the next locations visited, and matched trajectories to corresponding users. ...
Article
Full-text available
Self-supervised representation learning (SSRL) concerns the problem of learning a useful data representation without the requirement for labelled or annotated data. This representation can, in turn, be used to support solutions to downstream machine learning problems. SSRL has been demonstrated to be a useful tool in the field of geographical information science (GIS). In this article, we systematically review the existing research literature in this space to answer the following five research questions. What types of representations were learnt? What SSRL models were used? What downstream problems were the representations used to solve? What machine learning models were used to solve these problems? Finally, does using a learnt representation improve the overall performance?
... Additionally, this identifies the social roles of users and provides assistance to urban planners and decision makers. Even so, there have been few studies dedicated to living patterns recognition via semantic-rich trajectory data [2]. ...
Chapter
Full-text available
With the development of fifth-generation mobile communication technology, a huge volume of mobile data have been generated which enable a wide range of location-based services. As a result, user location prediction has attracted attention from researchers. However, existing methods have low accuracy due to the sparsity of user check-ins. In order to address this issue, we propose a method for user location prediction based on similar living patterns. We first obtain a vector representation of each user’s living habits to cluster users with similar living patterns. Then, embedded vectors of POI category and POI location are learned. Finally, we construct activity prediction model and location prediction model for each user cluster by using Gate Recurrent Unit (GRU). The experimental results for real user check-ins show that the proposed method outperforms the baseline methods in most cases.
... Considering the important role of the semantic layer for target behaviour analysis, scholars have semantically enriched trajectory [6][7][8] and explored semantic representation methods [9][10][11]. On this basis, some scholars performed the cluster analysis [12][13][14] of trajectory points by clustering around discrete semantic information, such as geographic tags [15] and attribute tags [16]. However, the semantic trajectory clustering analysis method did not identify the way that the moving target interacts with the environment, and the recognition of the moving target's behaviour is the key to analyse and judge the target situation. ...
Article
Full-text available
The semantic representation of the trajectory is conducive to enrich the content of trajectory data mining. A trajectory summarisation generation method based on the mobile robot behaviour analysis was proposed to realize the abstract expression and semantic representation of the spatio‐temporal motion features of the robot and its environmental interaction state. First, the behavioural semantic modelling and representation of the mobile robot are completed by modelling the sub‐trajectory and calculating the topological behaviour (TOP). Second, Chinese word segmentation and semantic slot filling methods are used to combine with hierarchical clustering to perform basic word extraction and classification for describing trajectory sentences. Then, the description language frame is extracted based on the TOP, and the final trajectory summarisation is generated. The result shows that the proposed method can semantically represent robot behaviours with different motion features and topological features, extract two verb‐frameworks for describing the sentences according to their topological features, and dynamically adjust the syntactic structure for the different topological behaviours between the target and the environment. The proposed method can generate semantic information of relatively high quality for spatio‐temporal data and help to understand the higher‐order semantics of moving robot behaviour.
... On the one hand, the campus data record students' daily physical exercise activities by time. On the other hand, from the trajectory of students' physical exercise activities, the trajectory data has strong semantic information, reflecting the trajectory location association of different physical exercise activities in a certain time series [12]. erefore, it is helpful to improve the accuracy of spatiotemporal behavior prediction by fully considering the connectivity, time series, and semantic meaning among trajectory locations. ...
Article
Full-text available
Physical exercise for college students is an important means to build a healthy standard of college students and an important way to a healthy campus. In addition to creating good physical fitness, physical exercise has significant effects on improving psychological stress and alleviating psychological problems and mental illnesses among college students. It is important to predict and analyze the physical exercise behavior of college students and explore the positive value of physical exercise for college education. In order to overcome the problem of low accuracy of traditional algorithms in prediction, this paper uses the improved gray wolf algorithm (IGWO) and support vector machine (SVM) for predictive analysis of college students' physical exercise behavior. A nonlinear decreasing convergence factor strategy and an inertia weight strategy are introduced to improve the gray wolf optimization algorithm, which is used to determine the SVM parameters for the purpose of improving the model accuracy. Then, the college students' physical exercise data are input into the model for validation. By constructing a campus behavior data set of college students and conducting experiments, the algorithm achieves 90.45% behavior prediction accuracy, which is better than that of typical prediction models. Finally, individual growth monitoring of college students is targeted to warn students with abnormal behaviors. At the same time, the higher-order information such as physical exercise behavior habits of college students is explored to provide meaningful reference for constructing personalized training.
... However, geographical coordinates (i.e., latitude and longitude) simply represent a point in the space, which does not contribute to the recognition of the user's situation. In the literature, this drawback has been typically addressed by associating a semantic label with GPS data, thus describing the corresponding most likely point of interest (e.g., restaurant, park, or bar ) [66,67]. However, also this approach presents several shortcomings. ...
Article
Full-text available
Context modeling and recognition represent complex tasks that allow mobile and ubiquitous computing applications to adapt to the user’s situation. The real advantage of context-awareness in mobile environments mainly relies on the prompt system’s and applications’ reaction to context changes. Current solutions mainly focus on limited context information generally processed on centralized architectures, potentially exposing users’ personal data to privacy leakage, and missing personalization features. For these reasons on-device context modeling and recognition represent the current research trend in this area. Among the different information characterizing the user’s context in mobile environments, social interactions and visited locations remarkably contribute to the characterization of daily life scenarios. In this paper we propose a novel, unsupervised and lightweight approach to model the user’s social context and her locations based on ego networks directly on the user mobile device. Relying on this model, the system is able to extract high-level and semantic-rich context features from smartphone-embedded sensors data. Specifically, for the social context it exploits data related to both physical and cyber social interactions among users and their devices. As far as location context is concerned, we assume that it is more relevant to model the familiarity degree of a specific location for the user’s context than the raw location data, both in terms of GPS coordinates and proximity devices. We demonstrate the effectiveness of the proposed approach with 3 different sets of experiments by using 5 real-world datasets collected from a total of 956 personal mobile devices. Specifically, we assess the structure of the social and location ego networks, we provide a semantic evaluation of the proposed models and a complexity evaluation in terms of mobile computing performance. Finally, we demonstrate the relevance of the extracted features by showing the performance of 3 different machine learning algorithms to recognize daily-life situations, obtaining an improvement of 3% of AUROC, 9% of Precision, and 5% in terms of Recall with respect to use only features related to physical context.
... However, geographical coordinates (i.e., latitude and longitude) simply represent a point in the space, which does not contribute to the recognition of the user's situation. In the literature, this drawback has been typically addressed by associating a semantic label with GPS data, thus describing the corresponding most likely point of interest (e.g., restaurant, park, or bar ) [66,67]. However, also this approach presents several shortcomings. ...
Preprint
Full-text available
Context modeling and recognition represent complex tasks that allow mobile and ubiquitous computing applications to adapt to the user's situation. Current solutions mainly focus on limited context information generally processed on centralized architectures, potentially exposing users' personal data to privacy leakage, and missing personalization features. For these reasons on-device context modeling and recognition represent the current research trend in this area. Among the different information characterizing the user's context in mobile environments, social interactions and visited locations remarkably contribute to the characterization of daily life scenarios. In this paper we propose a novel, unsupervised and lightweight approach to model the user's social context and her locations based on ego networks directly on the user mobile device. Relying on this model, the system is able to extract high-level and semantic-rich context features from smartphone-embedded sensors data. Specifically, for the social context it exploits data related to both physical and cyber social interactions among users and their devices. As far as location context is concerned, we assume that it is more relevant to model the familiarity degree of a specific location for the user's context than the raw location data, both in terms of GPS coordinates and proximity devices. By using 5 real-world datasets, we assess the structure of the social and location ego networks, we provide a semantic evaluation of the proposed models and a complexity evaluation in terms of mobile computing performance. Finally, we demonstrate the relevance of the extracted features by showing the performance of 3 machine learning algorithms to recognize daily-life situations, obtaining an improvement of 3% of AUROC, 9% of Precision, and 5% in terms of Recall with respect to use only features related to physical context.
... Wang et al. [124], for instance, utilized the Flexible regularity index to estimate regularity in aspects of sensed behaviors captured via smartphone activity and, using this index, found a positive association between ambient sound regularity and openness as measured by the Big Five Inventory. While these measures may be effective at estimating the routineness of a specific sensor measurement, they are typically limited to providing this estimation for only a single measure at a time, such as a participant's movement patterns [25,27,77,79,101,132], whereas the present focus is on multimodal routineness. Multimodal measures may provide a more holistic view of what a healthy routine entails by describing the joint dynamics of multiple health-related behaviors and taking into consideration their interaction in determining personal outcomes such as stress, anxiety, and work performance. ...
Article
Although some research highlights the benefits of behavioral routines for individual functioning, other research indicates that routines can reflect an individual's inflexibility and lower well-being. Given conflicting accounts on the benefits of routine, research is needed to examine how routineness versus flexibility in health-related behaviors correspond to personality traits, health, and occupational outcomes. We adopt a nonlinear dynamical systems approach to understanding routine using automatically sensed health-related behaviors collected from 483 information workers over a roughly two-month period. We utilized multidimensional recurrence quantification analysis to derive a measure of health regularity (routineness) from measures of daily step count, sleep duration, and heart rate variability (which relates to stress). Participants also completed measures of personality, health, and job performance at the start of the study and for two months via Ecological Momentary Assessments. Greater regularity was associated with higher neuroticism, lower agreeableness, and greater interpersonal and organizational deviance. Importantly, these results were independent of overall levels of each health indicator in addition to demographics. It is often believed that routine is desirable, but the results suggest that associations with routineness are more nuanced, and wearable sensors can provide insights into beneficial health behaviors.
... The authors of [6] propose another NLP technique that can be modified to analyse trajectory embeddings. They define a person's habit signature unit transition H u as a feature representing the typical specific locations a person visits at a specific time slice, H u can be defined in the format (u i , h p1 1 , h p1 2 ,...,h p1 m ). ...
Conference Paper
Analysing unstructured data with minimal contextual information is a challenge faced in application areas of text and spatial data, especially movement data. Movement data are sequences of time-stamped locations of a moving entity similar to what text data are sequences of words in a document. Text analytics is rich in methods to learn word embeddings and latent semantic clusters from unstructured data. In this work, we draw on successes from probabilistic topic models which are used in natural langauge processing (NLP). Our motivation is based on the fact that topic models poses characteristics from both clustering and dimensionality reduction techniques. Furthermore, because of their sparse representation, they don't only provide data compression, but also produce topics that are interpretable. We illustrate this application on jaguar movement data based on GPS locations and timestamps. First, we apply a level of abstraction on the raw data, by calculating speed, which we use as input data to an NLP model, latent Dirichlet allocation (LDA). The evaluation of the inferred topics indicates movement behaviour which are similar to behavioural clusters found in literature. This technique can be generalised to movement data from any object, whether animals, humans or technology with a minimal contextual or semantic information.
... Many works have already investigated the use of word embedding for HAR. For instance, Cao et al. [24] applied the Word2Vec model to cluster and create a semantic relationship between population habits, whereas Matsuki et al. [25], Shimoda et al. [26] used pretrained public word embeddings to associate a label with unknown activities for wearable sensors. Abramova et al. [27] exploited a similar approach to annotate unknown activities and studied the zero-shot learning in a smart home. ...
Preprint
Full-text available
Long Short Term Memory LSTM-based structures have demonstrated their efficiency for daily living recognition activities in smart homes by capturing the order of sensor activations and their temporal dependencies. Nevertheless, they still fail in dealing with the semantics and the context of the sensors. More than isolated id and their ordered activation values, sensors also carry meaning. Indeed, their nature and type of activation can translate various activities. Their logs are correlated with each other, creating a global context. We propose to use and compare two Natural Language Processing embedding methods to enhance LSTM-based structures in activity-sequences classification tasks: Word2Vec, a static semantic embedding, and ELMo, a contextualized embedding. Results, on real smart homes datasets, indicate that this approach provides useful information, such as a sensor organization map, and makes less confusion between daily activity classes. It helps to better perform on datasets with competing activities of other residents or pets. Our tests show also that the embeddings can be pretrained on different datasets than the target one, enabling transfer learning. We thus demonstrate that taking into account the context of the sensors and their semantics increases the classification performances and enables transfer learning.
... Many works have already investigated the use of word embedding for HAR. For instance, Cao et al. [24] applied the Word2Vec model to cluster and create a semantic relationship between population habits, whereas Matsuki et al. [25], Shimoda et al. [26] used pretrained public word embeddings to associate a label with unknown activities for wearable sensors. Abramova et al. [27] exploited a similar approach to annotate unknown activities and studied the zero-shot learning in a smart home. ...
Article
Full-text available
Long Short Term Memory (LSTM)-based structures have demonstrated their efficiency for daily living recognition activities in smart homes by capturing the order of sensor activations and their temporal dependencies. Nevertheless, they still fail in dealing with the semantics and the context of the sensors. More than isolated id and their ordered activation values, sensors also carry meaning. Indeed, their nature and type of activation can translate various activities. Their logs are correlated with each other, creating a global context. We propose to use and compare two Natural Language Processing embedding methods to enhance LSTM-based structures in activity-sequences classification tasks: Word2Vec, a static semantic embedding, and ELMo, a contextualized embedding. Results, on real smart homes datasets, indicate that this approach provides useful information, such as a sensor organization map, and makes less confusion between daily activity classes. It helps to better perform on datasets with competing activities of other residents or pets. Our tests show also that the embeddings can be pretrained on different datasets than the target one, enabling transfer learning. We thus demonstrate that taking into account the context of the sensors and their semantics increases the classification performances and enables transfer learning.
... Graph is often used to construct a network to represent intricate relationships and recently many studies have focused on graph representation learning [58], [59], [60], [61], [62]. As a result, a lot of researchers have applied it to many areas [63], [64], [65], [66], [67], [68]. For example, in the field of human mobility, to capture human mobility motivation, Shi et.al. ...
Article
Full-text available
With the increasing diversity of mobile apps, users install many apps in their smartphones and often use several apps together to meet a specific requirement. Because of the evolution of user habits and app functions, the set of apps using at the same time, i.e., app usage context, may change over time, which represents the dynamic correlation of different apps and even the evolution trend of the whole app ecosystem. Therefore, understanding how an apps usage context changes over time is very meaningful. In this paper, based on a seven-year app usage dataset, we explore the long-term app usage context dynamics and understand the underlying reasons and influence factors behind. Specifically, we build app co-occurrence graphs in different periods and learn app embeddings accordingly by leveraging graph embedding algorithm. We then measure the change of app usage context by the distance between neighboring app embeddings. As for the whole app ecosystem, we find that the change rate of app usage context undergoes up and down phrases, and varies in different app-categories. Furthermore, we explore three influence factors correlated with such dynamics. These results will be helpful for stakeholders to better understand the evolution of mobile users app usage behavior.
... Social science theories have identified social demographic factors as important determinants of human behavior (Skinner 1953;Zastrow and Kirst-Ashman 2006), and empirical studies in diverse fields have confirmed the close link between social demographic attributes and user behavior (Abrahamse and Steg 2009;Dong et al. 2016;Kalmus et al. 2011;Lenormand et al. 2015). Based on user behavior data from different sources (e.g., communication, credit card transaction, traffic, online social networks, wearable sensors), numerous attempts have been made to predict user gender (Zhu et al. 2017), ethnicity (Pennacchiotti and Popescu 2011), age (Felbo et al. 2016), profession , living pattern (Cao et al. 2019), income , employment status (Almaatouq et al. 2016), financial well-being (Singh et al. 2015), social relationship (Eagle et al. 2009), etc. Studies have also shown that user behavior change is linked with change in his/her social demographic status (Jessor and Jessor 1977;Moffitt 2017;Ollendick et al. 1992). ...
Article
Full-text available
In the past decade, mobile app usage has played an important role in our daily life. Existing studies have shown that app usage is intrinsically linked with, among others, demographics, social and economic factors. However, due to data limitations, most of these studies have a short time span and treat users in a static manner. To date, no study has shown whether changes in socioeconomic status or other demographics are reflected in long-term app usage behavior. In this paper, we contribute by presenting the first ever long-term study of individual mobile app usage dynamics and how app usage behavior of individuals is influenced by changes in socioeconomic demographic factors over time. Through a novel app dataset we collected, from which we extracted records of 1608 long-term users with more than 3-year app usage and their detailed socioeconomic attributes, we verify the stable correlation between user app usage and user socioeconomic attributes over time and identify a number of representative app usage patterns in connection with specific user attributes. On the basis, we analyze the long-term app usage dynamics and reveal that there is significant evolution in long-term app usage that 60–70% of users change their app usage patterns during the duration of more than 3 years. We further discover a variety of app pattern change modes and demonstrate that the long-term app usage behavior change reflects corresponding transition in socioeconomic attributes, such as change of civil status, family size, transition in job or economic status.
... Previous work mainly focuses on leveraging the mobility data to infer social relationship based on the co-location behavior [16,17], or combined the mobility data with other source of ubiquitous data, such as app usage and light sensor, to infer user attributes like age and gender [4,5]. Other work has investigated the possibility of sensing user living pattern from mobility [32][33][34]. Also, studies [35,36] have found that important functional locations (i.e., residence and workplace) can be identified by individual trajectory data. ...
Article
Full-text available
Can health conditions be inferred from an individual's mobility pattern? Existing research has discussed the relationship between individual physical activity/mobility and well-being, yet no systematic study has been done to investigate the predictability of fine-grained health conditions from mobility, largely due to the unavailability of data and unsatisfactory modelling techniques. Here, we present a large-scale longitudinal study, where we collect the health conditions of 747 individuals who visit a hospital and tracked their mobility for 2 months in Beijing, China. To facilitate fine-grained individual health condition sensing, we propose HealthWalks, an interpretable machine learning model that takes user location traces, the associated points of interest, and user social demographics as input, at the core of which a Deterministic Finite Automaton (DFA) model is proposed to auto-generate explainable features to capture useful signals. We evaluate the effectiveness of our proposed model, which achieves 40.29% in micro-F1 and 31.63% in Macro-F1 for the 8-class disease category prediction, and outperforms the best baseline by 22.84% in Micro-F1 and 31.79% in Macro-F1. In addition, deeper analysis based on the SHapley Additive exPlanations (SHAP) showcases that HealthWalks can derive meaningful insights with regard to the correlation between mobility and health conditions, which provide important research insights and design implications for mobile sensing and health informatics.
... Yuan et al. [38] propose a Bayesian non-parametric model to discover periodic mobility patterns by jointly modeling geographic and temporal information. Xu et al. [35] and Cao et al. [9] further propose methods to detect periodic temporal modes from mobility traces, which reflects user living pattern. Different from them, our work focuses on more general revisitation behavior including aperiodic revisitations, rather than only considering periodic patterns as in these works. ...
Article
Full-text available
Recent years have witnessed much work unraveling human mobility patterns through urban visitation and location check-in data. Traditionally, user visitation and check-in have been assumed as the same behavior, yet this fundamental assumption can be questionable and lacks supporting evidence. In this paper, we seek to understand the similarities and differences of visitation and check-in by presenting a large-scale systematic analysis under the specific setting of urban revisitation and recheck in , which demonstrate people's periodic behaviors and regularities. Leveraging a localization dataset to model urban revisitation and a Foursqaure dataset to delineate recheck in , we identify features concerning POI visitation patterns, POI background information, user visitation patterns, user preference and users' behavioral characteristics to understand their effects on urban revisitation and recheck in. We examine the relationship between revisitation/re-check-in rate and the features we identify, highlighting the similarities and differences between urban revisitation and recheck in. We demonstrate the prediction effectiveness of the identified characteristics utilizing machine learning models, with an overall ROC AUC of 0.92 for urban revisitation and 0.82 for recheck in , respectively. This study has important research implications, including improved modeling of human mobility and better understanding of human behavior, and sheds light on designing novel ubiquitous computing applications.
... Note that the learned embeddings can effectively capture items' important relationships and features in training dataset. Recently, the embedding has been expanded to many sequential tasks, including trajectory data mining [3,13], sequential recommender systems [35,43], question answering [19], graph representation [14] and so on. As for POI recommendation, a check-in at a target POI would be influenced by users' previously visited POIs, and influence the POIs visited by user after the target POI. ...
Article
Full-text available
The effective Point-of-Interest (POI) recommendation can significantly assist users to find their preferred POIs and help POI owners to attract more customers. As a result, a variety of methods have been proposed to tackle the issue of POI recommendation recently. However, it is still very difficult to precisely model the strong correlations between the POIs visited by the user and the POIs to be visited next, which leads to the poor performance of POI recommendation. In this paper, we propose a context- and preference- aware model (CPAM) to incorporate both contextual influence and user preferences into POI recommendation. Firstly, we design a Skip-Gram based POI Embedding Model (SG-PEM) to capture the contextual influence of POIs and learn the vector representation (embedding) of POIs from visiting sequences. The users’ preferences for the target POIs are obtained from the learned embeddings via similarity metric. Secondly, for the implicit feedback information contained in the check-in data, we use the Logistic Matrix Factorization (LMF) algorithm to model the users’ personalized preferences for POI. Finally, we unify SG-PEM and LMF as the CPAM model to perform personalized recommendation by leveraging contextual influence and user preferences. The experimental results on two real-world datasets of Foursquare and Gowalla show that the proposed model outperforms the state-of-the-art baselines.
... Kumar et al. [19] utilized the embedding trajectory of user/item to predict the future user-item interactions in e-commerce system. Cao et al. [7] proposed a representation learning based system called habit2vec to represent user trajectory semantics in vector space. Pandhre et al. [32] presented a novel approach STWalk for learning trajectory representations of nodes in temporal graphs. ...
Article
Full-text available
Anomalous trajectory detection is a crucial task in trajectory mining fields. Traditional anomalous trajectory detection methods mainly focus on the differences of a new trajectory and the historical trajectory with density and isolation techniques, which may suffer from the following two disadvantages. (1) They cannot capture the sequential information of the trajectory well. (2) They cannot make use of the common information of the trajectory points. To overcome the above shortcomings, we propose a novel method called A nomalous T rajectory D etection using R ecurrent N eural N etwork (ATD − RNN) which characterizes the trajectory with the learned trajectory embedding. The trajectory embedding can capture the sequential information of the trajectory and depict the internal characteristics between abnormal and normal trajectories. In order to learn the high-quality trajectory embedding, we further propose an attention mechanism to aggregate the long sequential information. Furthermore, to alleviate the data sparsity problem, we augment the datasets between a source and a destination by taking the relevant trajectories into consideration simultaneously. Extensive experiments on real-world datasets validate the effectiveness of our proposed methods.
... R ETRIEVING similar trajectories is to search in a large dataset for the trajectories that have a similar movement pattern. This is essential for many trajectory processing tasks (e.g., clustering [1], [2], classification [3]) and many applications (e.g., human mobility analysis [4], [5], [6], [7] and transportation planning [8], [9]). Many works have been proposed to define new similarity measures [10], [11] or implement search operations in parallel [12]. ...
Preprint
Full-text available
Retrieving similar trajectories from a large trajectory dataset is important for a variety of applications, like transportation planning and mobility analysis. Unlike previous works based on fine-grained GPS trajectories, this paper investigates the feasibility of identifying similar trajectories from cellular data observed by mobile infrastructure, which provide more comprehensive coverage. To handle the large localization errors and low sample rates of cellular data, we develop a holistic system, cellSim, which seamlessly integrates map matching and similar trajectory search. A set of map matching techniques are proposed to transform cell tower sequences into moving trajectories on a road map by considering the unique features of cellular data, like the dynamic density of cell towers and bidirectional roads. To further improve the accuracy of similarity search, map matching outputs M trajectory candidates of different confidence, and a new similarity measure scheme is developed to process the map matching results. Meanwhile, M is dynamically adapted to maintain a low false positive rate of the similarity search, and two pruning schemes are proposed to minimize the computation overhead. Extensive experiments on a dataset of 3,186,000 mobile users and real-world trajectories of 1701 km reveal that cellSim provides high accuracy (precision 62.4% and recall of 89.8%).
... Based on the check-in information collected from a location-based online social network, Cranshaw et al. [3] tried to understand the dynamics of the city. Other works have dedicated to recognize user living pattern through check-in data [29,30]. Because of its sparsity in temporal and spatial dimension, the social data can provide few details about the individual mobility. ...
Article
Full-text available
Understanding crowd mobility in a metropolitan area is extremely valuable for city planners and decision makers. However, crowd mobility is a relatively new area of research and has significant technical challenges: lack of large-scale fine-grained data, difficulties in large-scale trajectory processing, and issues with spatial resolution. In this article, we propose a novel approach for analyzing crowd mobility on a “city block” level. We first propose algorithms to detect homes, working places, and stay regions for individual user trajectories. Next, we propose a method for analyzing commute patterns and spatial correlation at a city block level. Using mobile cellular accessing trace data collected from users in Shanghai, we discover commute patterns, spatial correlation rules, as well as a hidden structure of the city based on crowd mobility analysis. Therefore, our proposed methods contribute to our understanding of human mobility in a large metropolitan area.
Article
Full-text available
The purpose of this study is to summarize the pattern recognition (PR) and deep learning (DL) artificial intelligence methods developed for the management of data in the last six years. The methodology used for the study of documents is a content analysis. For this study, 186 references are considered, from which 120 are selected for the literature review. First, a general introduction to artificial intelligence is presented, in which PR/DL methods are studied and their relevance to data management evaluated. Next, a literature review is provided of the most recent applications of PR/DL, and the capacity of these methods to process large volumes of data is evaluated. The analysis of the literature also reveals the main applications, challenges, approaches, advantages, and disadvantages of using these methods. Moreover, we discuss the main measurement instruments; the methodological contributions by study areas and research domain; and major databases, journals, and countries that contribute to the field of study. Finally, we identify emerging research trends, their limitations, and possible future research paths.
Article
Recent advances in localization techniques have fundamentally enhanced social networking services, allowing users to share their locations and location-related contents. This has further increased the popularity of location-based social networks (LBSNs) and produces a huge amount of trajectories composed of continuous and complex spatio-temporal points from people’s daily lives. How to accurately aggregate large-scale trajectories is an important and challenging task. Conventional clustering algorithms (e.g., k -means or k -mediods) cannot be directly employed to process trajectory data due to their serialization, triviality and redundancy. Aiming to overcome the drawbacks of traditional k -means algorithm and k -mediods, including their sensitivity to the selection of the initial k value, the cluster centers and easy convergence to a locally optimal solution, we first propose an optimized k -means algorithm (namely OKM ) to obtain k optimal initial clustering centers based on the density of trajectory points. Second, because k -means is sensitive to noisy points, we propose an improved k -mediods algorithm called IKMD based on an acceptable radius r by considering users’ geographic location in LBSNs. The value of k can be calculated based on r , and the optimal k points are selected as the initial clustering centers with high densities to reduce the cost of distance calculation. Thirdly, we thoroughly analyze the advantages of IKMD by comparing it with the commonly used clustering approaches through illustrative examples. Last, we conduct extensive experiments to evaluate the performance of IKMD against seven clustering approaches including the proposed optimized k -means algorithm, k -mediods algorithm, traditional density-based k -mediods algorithm and the state-of-the-arts trajectory clustering methods. The results demonstrate that IKMD significantly outperforms existing algorithms in the cost of distance calculation and the convergence speed. The methods proposed is proved to contribute to a larger effort targeted at advancing the study of intelligent trajectory data analytics.
The wide adoption of mobile devices has provided us with a massive volume of human mobility records. However, a large portion of these records is unlabeled, i.e., only have GPS coordinates without semantic information (e.g., Point of Interest (POI)). To make those unlabeled records associate with more information for further applications, it is of great importance to annotate the original data with POIs information based on the external context. Nevertheless, semantic annotation of mobility records is challenging due to three aspects: the complex relationship among multiple domains of context, the sparsity of mobility records, and difficulties in balancing personal preference and crowd preference. To address these challenges, we propose CAP, a context-aware personalized semantic annotation model, where we use a Bayesian mixture model to model the complex relationship among five domains of context—location, time, POI category, personal preference, and crowd preference. We evaluate our model on two real-world datasets, and demonstrate that our proposed method significantly outperforms the state-of-the-art algorithms by over 11.8%.
Article
Location-based services have significantly affected mobile users’ everyday life, and location privacy has become essential. Some applications (e.g., location-based recommendation, mobility analytics) do not need the raw location data, and the service providers adopt aggregation to protect users’ location traces. However, some works show that even these aggregation data may disclose users’ location privacy when additional prior knowledge is available to an adversary. We consider the location privacy problem in the presence of Location Uniqueness , a property by which some geographical locations can be re-identified based on the aggregated point-of-interest information. We first study whether existing protection mechanisms are adequate for defending against this type of attack. Then we present two practical attacks for inferring users’ actual locations based on the POI aggregates. A secure POI aggregate release mechanism is proposed for defending against this type of re-identification attack and achieving differential privacy at the same time. We conduct extensive experiments on real-world datasets. The results show that the existing protection mechanisms cannot provide sufficient protection against location re-identification attacks. The proposed attacks can significantly improve the inference performance, and the proposed protection mechanism achieves satisfactory performance.
Article
Despite the increasing availability and spatial granularity of individuals' time-activity (TA) data, the missing data problem, particularly long-term gaps, remains as a major limitation of TA data as a primary source of human mobility studies. In the present study, we propose a two-step imputation method to address the missing TA data with long-term gaps, based on both efficient representation of TA patterns and high regularity in TA data. The method consists of two steps: (1) the continuous bag-of-words word2vec model to convert daily TA sequences into a low-dimensional numerical representation to reduce complexity; (2) a multi-scale residual Convolutional Neural Network (CNN)-stacked Long Short-Term Memory (LSTM) model to capture multi-scale temporal dependencies across historical observations and to predict the missing TAs. We evaluated the performance of the proposed imputation method using the mobile phone-based TA data collected from 180 individuals in western New York, USA, from October 2016 to May 2017, with a 10-fold out-of-sample cross-validation method. We found that the proposed imputation method achieved excellent performance with 84% prediction accuracy, which led us to conclude that the proposed imputation method was successful at reconstructing the sequence, duration, and spatial extent of activities from incomplete TA data. We believe that the proposed imputation method can be applied to impute incomplete TA data with relatively long-term gaps with high accuracy.
Article
Trajectory prediction of multiple agents in a crowded scene is an essential component in many applications, including intelligent monitoring, autonomous robotics, and self-driving cars. Accurate agent trajectory prediction remains a significant challenge because of the complex dynamic interactions among the agents and between them and the surrounding scene. To address the challenge, we propose a decoupled attention-based spatial-temporal modeling strategy in the proposed trajectory prediction method. The past and current interactions among agents are dynamically and adaptively summarized by two separate attention-based networks and have proven powerful in improving the prediction accuracy. Moreover, it is optional in the proposed method to make use of the road map and the plan of the ego-agent for scene-compliant and accurate predictions. The road map feature is efficiently extracted by a convolutional neural network, and the features of the ego-agent’s plan is extracted by a gated recurrent network with an attention module based on the temporal characteristic. Experiments on benchmark trajectory prediction datasets demonstrate that the proposed method is effective when the ego-agent plan and the the surrounding scene information are provided and achieves state-of-the-art performance with only the observed trajectories.
Article
The trajectory representation model has become a common method for calculating the similarity of trajectories. Existing works have used the encoder–decoder model, which is trained by reconstructing the original trajectory from a noisy trajectory. However, this reconstructive model ignores the point-level differences between these two trajectories and captures only the trajectory-level features. As a result, it achieves low accuracy on ranking tasks. To solve this problem, we propose a novel contrastive model to learn trajectory representations by distinguishing the trajectory-level and point-level differences between trajectories. Furthermore, to solve the lack of training data, we propose a self-supervised approach to augment training pairs of trajectories. Compared with existing models, our model achieves a significant performance improvement on various trajectory similarity tasks.
Article
Efficient representations for spatio-temporal cellular Signaling Data (SD) are essential for many human mobility applications. Traditional representation methods are mainly designed for GPS data with high spatio-temporal continuity, and thus will suffer from poor embedding performance due to the unique Ping Pong Effect in SD. To address this issue, we explore the opportunity offered by a large number of human mobility traces and mine the inherent neighboring tower connection patterns. More specifically, we design HERMAS, a novel representation learning framework for large-scale cellular SD with three steps: (1) extract rich context information in each trajectory, adding neighboring tower information as extra knowledge in each mobility observation; (2) design a sequence encoding model to aggregate the embedding of each observation; (3) obtain the embedding for a trajectory. We evaluate the performance of HERMAS based on two human mobility applications, i.e. trajectory similarity measurement and user profiling. We conduct evaluations based on a 30-day SD dataset with 130,612 users and 2,369,267 moving trajectories. Experimental results show that (1) for the trajectory similarity measurement application, HERMAS improves the Hitting Rate (HR@10) from 15.2% to 39.2%; (2) for the user profiling application, HERMAS improves the F1-score for around 9%. More importantly, HERMAS significantly improves the computation efficiency by over 30x.
Article
The ever-increasing number of automobiles, especially private cars, has irrevocably altered social life, revolutionized travel experience, and reshaped the face of urban transportation systems. Existing research mainly focuses on trajectory data from floating cars, while human mobility and travel behavior by private cars are still little understood. To bridge this gap, we employ a private car trajectory data set collected from real-world urban environments. This article provides a new perspective on human mobility, with a focus on private cars, by investigating mobility patterns and capturing the spatiotemporal evolution of urban hot zones from people’s arrive–stay–leave (ASL) behavior. Understanding these hot zones and travel behavior patterns is necessary when considering changes or improvements in urban networks. Decisions impacting the safety and emissions impacts of urban travel, including, for example, design improvements at curves or to reduce left-turn conflicts, improvements to pedestrian facilities, and the placement of alternative energy refueling stations, will rely on similar analyses across many different urban areas. Our article provides a proof of concept, and we then outline new research opportunities and challenges involving private car trajectory data.Along with the increased process of urbanization and industrialization, recent years have witnessed an ever-increasing number of automobiles, especially private cars [1] , also known as passenger cars , those with less than seven seats and usually registered to individuals for personal use. Compared to floating cars [2] , private cars constitute the vast majority of automobiles. For instance, the nationwide ownership of automobiles in China exceeded 263 million by the end of 2019 [3] , and more than 88% of these vehicles (225 million) were registered to individuals. As shown in Figure 1(a) , ownership of private cars exhibited a growing trend from 2013 to 2019. As seen in Sivak’s technical report [4] , per capita car ownership in the United States was 0.756 in 2015 and increased to 0.766 by 2016. According to [5] , in the last two decades, private cars accounted for more than 75% of passenger transport activity in the European Union. As shown in Figure 1(b) , even though a slightly decreasing trend is expected by 2030, it is still expected to be larger than 72%.
Chapter
Massive and redundant vehicle trajectory data are continuously sent to the data center via vehicle-mounted GPS devices, causing a number of sustainable issues, such as storage, communication, and computation. Online trajectory compression becomes a promising way to alleviate these issues. In this chapter, we first propose an online trajectory data compression algorithm which works on the basis of the SD-Matching algorithm. Similar to the SD-Matching algorithm, the newly online data compression makes use of the heading change at intersections, namely Heading Change Compression (HCC), to find concise and compact trajectory representation. Furthermore, we also implement both SD-Matching and HCC algorithms in a real system called VTracer running on the Android platform. Since both online map-matching and compression are resource-hungry, and GPS devices cannot afford the heavy computation tasks, we offload such tasks to the nearby smartphones of drivers by leveraging the idea of mobile edge computing. We conduct experiments to evaluate the effectiveness and efficiency of the proposed HCC algorithm using real-world datasets in the city of Beijing, China. We deploy the system in the real world in the city of Chongqing, China. Experimental results in real cases demonstrate the excellent performance of HCC algorithm and VTracer system.
Article
Human trajectory data, collected from various location-based services, is of great significance to the understanding of users. However, trajectory-based user understanding is very challenging, due to the huge semantic gap between the existing low-level GEO spatial information and the target high-level semantic information. In this work, we propose a sequential state model as well as a multi-task based learning method to bridge the above semantic gap. First, we propose a sequential state model to organize the human trajectory data as well as the POI information. Second, we employ a LSTM based representation method to extract the semantic representation from the sequential state model, in which various representations are learned by using the user tags as the supervise information individually. Finally, we devise a multi-task fine-tuning LSTM method to take advantage of the dependency among the tags. We also demonstrate the usage of the proposed method and the effectiveness of the proposal in a real-world demand-side-platform system.
Article
With the increasing accumulation of spatial-temporal trajectory data, location-based data mining has recently been extensively studied. A fundamental research topic in this field is learning the embedding vectors of locations through self-supervised pre-training. Pre-trained embedding vectors can utilize the highly available unlabeled trajectory data, and benefit downstream tasks in multiple aspects. However, most existing methods ignore the temporal information hidden in the visited time of locations in trajectories. Considering that human activities are highly regulated by specific periods of a day, temporal information can reflect some intrinsic characteristics of locations, so it is necessary to fuse them into location embedding vectors. In this paper, we propose a Time-Aware Location Embedding (TALE) pre-training method based on the CBOW framework, which is able to incorporate temporal information into the learned embedding vectors of locations. A novel temporal tree structure is designed to extract temporal information during the calculation of Hierarchical Softmax. In order to verify the effectiveness of TALE, we apply the learned embedding vectors into three downstream location-based prediction tasks, i.e., location classification, location visitor flow prediction and user next location prediction. Experiments are conducted on four real-world user trajectory datasets, and the experimental results demonstrate that our TALE model can obviously help downstream tasks gain better performance.
Article
Recent years have witnessed a rapid proliferation of personalized mobile Apps, which poses a pressing need for user experience improvement. A promising solution is to model App usage by learning semantic-aware App usage representations which can capture the relation among time, locations and Apps. However, it is non-trivial due to the complexity, dynamics, and heterogeneity characteristics of App usage. To smooth over these obstacles and achieve the goal, we propose SA-GCN, a novel representation learning model to map Apps, location, and time units into dense embedding vectors considering spatio-temporal characteristics and unit properties simultaneously. To handle complexity and dynamics, we build an App usage graph by regarding App, time, and location units as nodes and their co-occurrence relations as edges. For heterogeneity, we develop a Graph Convolutional Network with meta path-based objective function to combine the structure of the graph and the attribute of units into the semantic-aware representations. We evaluate the performance of SA-GCN via a large-scale real-world dataset. In-depth analysis shows that SA-GCN characterizes the complex relationships among different units and recover meaningful spatio-temporal patterns. Moreover, we make use of the learned representations in App usage prediction task without post-training and achieve 8.3% of the performance gain compared with state-of-the-art baselines.
Article
Forecasting the fire risk is of great importance to fire prevention deployments in a city, which can reduce loss even deaths caused by fires. However, it is very challenging because fires are influenced by many complex factors, including spatial correlations, temporal dependencies, even the mixture of these two and external factors. Firstly, the fire risk of a region is influenced by temporal effect of internal factors (e.g., the historical fire risk records) and temporal effect of external factors (e.g., weather). Secondly, a region's fire risk is not only influenced by its inherent geospatial attributes (e.g., POIs) but also dependent on other regions in spatial. To address these challenges, we propose a machine learning approach to forecast the fire risk, entitled NeuroFire. NeuroFire can represent internal and external temporal effect then combine the temporal representation and spatial dependencies by a spatial-temporal loss function. Experimental evaluations on real-world datasets show that our NeuroFire outperforms 9 baselines, demonstrating the performance of our approach by several visualizations. Moreover, we implement a citywide fire forecasting system named CityGuard to display the analysis and forecasting results, which can assist the fire rescue department in deploying fire prevention.
Article
Synthesizing a fake trajectory with consistent lifestyle and meaningful mobility as the actual one, is the most popular way to protect the location privacy in trajectory sharing. Recent location privacy-preservation shows a strong personalized requirement from the mobile semantics between users and locations. However, existing techniques cannot fully satisfy such personalized requirements, resulting in either over-protection or under-protection. It remains open to characterize and quantify the personalized requirement for the location privacy preservation. In this paper, we propose a mobile semantic-aware privacy model, named MSP. Specifically, we first characterize a new kind of user-related mobile semantic on-location set by constructing a hierarchical semantic tree, according to the user’s roles at locations. Then, a dedicated approach is proposed to evaluate the location’s privacy sensitivity and integrate it into the user-related mobile semantic. Finally, an adaptive privacy-preserving mechanism, MSP, is developed, fully considering the personalized requirement from both the user and the location. With this model in place, mobile semantic-aware synthetic trajectories are constructed adaptively. Extensive experiments with real-world dataset demonstrate that, our MSP model can achieve an effective and flexible balance between the personalized privacy-preservation and the data availability of synthetic trajectories.
Article
Full-text available
We present the first large-scale analysis of POI revisitation patterns, which aims to model the periodic behavior in human mobility. We apply the revisitation analysis technique, which has previously been used to understand website revisitation, and smartphone app revisitations. We analyze a 1.5-year-long Foursquare check-in dataset with 266,909 users in 415 cities around the globe, as well as a Chinese social networking dataset on continuous localization of 15,000 users in Beijing. Our analysis identifies four major POI revisitation patterns and four user revisitation patterns of distinct characteristics, and demonstrates the role of POI functions and geographic constraints in shaping these patterns. We compare our results to previous analysis on website and app revisitation, and highlight the similarities and differences between physical and cyber revisitation activities. These point to fundamental characteristics of human behavior.
Article
Full-text available
We investigate the potential for privacy leaks when users reveal their nearby Points-of-Interest (POIs). Specifically, we investigate whether and how a person's location can be reverse-engineered when that person simply reveals their nearby POI types (e.g. 2 schools and 3 restaurants). We approach our analysis by introducing a "Location Re-identification" algorithm that is computationally efficient. Using data from Open Street Map, we conduct our analysis on datasets of multiple representative cities: New York City, Melbourne, Vancouver, Zurich and Shanghai. Our analysis indicates that urban morphology has a clear link to location privacy, and highlights a number of urban factors that contribute to location privacy. Our findings can be used in any systems or platforms where users reveal their proximal POIs, such as recommendation systems, advertising platforms, and appstores.
Article
Full-text available
With the rapid process of urbanization, revealing the underlying mechanisms behind urban mobility has become a crucial research problem. The movements of urban dwellers are often constituted by their daily routines, and exhibit distinct and contextual temporal modes, i.e., the patterns of individuals allocating their time across different locations. In this paper, we investigate a novel problem of detecting popular temporal modes in population-scale unlabelled trajectory data. Our key finding is that the detected temporal modes capture the semantic feature of human's living style, and is able to unravel meaningful correlations between urban mobility and human behavior. Specifically, we represent the temporal mode of a trajectory as a partition of the time duration, where the time slices associated with same locations are partitioned into same subsets. Such abstraction decouples the temporal modes from actual physical locations, and allows individuals with similar temporal modes yet completely different physical locations to have similar representations. Based on this insight, we propose a pipeline system composed of three components: 1) noise handler that eliminates the noises in the raw mobility records, 2) representation extractor for temporal modes, and 3) popular temporal modes detector. By applying our system on three real-world mobility datasets, we demonstrate that our system effectively detects the popular temporal modes embedded in population-scale mobility datasets, which is easy to be interpreted and can be justified through the associated PoIs and mobile applications usage. More importantly, our further experiments reveal insightful correlations between the popular temporal modes and individuals' social economic status, i.e. occupation information, which sheds light on the mechanisms behind urban mobility.
Conference Paper
Full-text available
The availability of massive geo-annotated social media data sheds light on studying human mobility patterns. Among them, periodic pattern, \ie an individual visiting a geographical region with some specific time interval, has been recognized as one of the most important. Mining periodic patterns has a variety of applications, such as location prediction, anomaly detection, and location- and time-aware recommendation. However, it is a challenging task: the regions of a person and the periods of each region are both unknown. The interdependency between them makes the task even harder. Hence, existing methods are far from satisfactory for detecting periodic patterns from the low-sampling and noisy social media data. We propose a Bayesian non-parametric model, named \textbf{P}eriodic \textbf{RE}gion \textbf{D}etection (PRED), to discover periodic mobility patterns by jointly modeling the geographical and temporal information. Our method differs from previous studies in that it is non-parametric and thus does not require priori knowledge about an individual's mobility (\eg number of regions, period length, region size). Meanwhile, it models the time gap between two consecutive records rather than the exact visit time, making it less sensitive to data noise. Extensive experimental results on both synthetic and real-world datasets show that PRED outperforms the state-of-the-art methods significantly in four tasks: periodic region discovery, outlier movement finding, period detection, and location prediction.
Conference Paper
Full-text available
Understanding human mobility is of great importance to various applications, such as urban planning, traffic scheduling, and location prediction. While there has been fruitful research on modeling human mobility using tracking data (e.g., GPS traces), the recent growth of geo-tagged social media (GeoSM) brings new opportunities to this task because of its sheer size and multi-dimensional nature. Nevertheless, how to obtain quality mobility models from the highly sparse and complex GeoSM data remains a challenge that cannot be readily addressed by existing techniques. We propose GMove, a group-level mobility modeling method using GeoSM data. Our insight is that the GeoSM data usually contains multiple user groups, where the users within the same group share significant movement regularity. Meanwhile, user grouping and mobility modeling are two intertwined tasks: (1) better user grouping offers better within-group data consistency and thus leads to more reliable mobility models; and (2) better mobility models serve as useful guidance that helps infer the group a user belongs to. GMove thus alternates between user grouping and mobility modeling, and generates an ensemble of Hidden Markov Models (HMMs) to characterize group-level movement regularity. Furthermore, to reduce text sparsity of GeoSM data, GMove also features a text augmenter. The augmenter computes keyword correlations by examining their spatiotemporal distributions. With such correlations as auxiliary knowledge, it performs sampling-based augmentation to alleviate text sparsity and produce high-quality HMMs. Our extensive experiments on two real-life data sets demonstrate that GMove can effectively generate meaningful group-level mobility models. Moreover, with context-aware location prediction as an example application, we find that GMove significantly outperforms baseline mobility models in terms of prediction accuracy.
Conference Paper
Full-text available
Identifying the patterns in urban mobility is important for a variety of tasks such as transportation planning, urban resource allocation, emergency planning etc. This is evident from the large body of research on the topic, which has exploded with the vast amount of geo-tagged user-generated content from online social media. However, most of the existing work focuses on a specific setting, taking a statistical approach to describe and model the observed patterns. On the contrary in this work we introduce EigenTransitions, a spectrum-based, generic framework for analyzing spatiotemporal mobility datasets. EigenTransitions capture the anatomy of the aggregate and/or individuals’ mobility as a compact set of latent mobility patterns. Using a large corpus of geo-tagged content collected from Twitter, we utilize EigenTransitions to analyze the structure of urban mobility. In particular, we identify the EigenTransitions of a flow network between urban areas and derive hypothesis testing framework to evaluate urban mobility from both temporal and demographic perspectives. We further show how EigenTransitions not only identify latent mobility patterns, but also have the potential to support applications such as mobility prediction and inter-city comparisons. In particular, by identifying neighbors with similar latent mobility patterns and incorporating their historical transition behaviors, we proposed an EigenTransitions-based k-nearest neighbor algorithm, which can significantly improve the performance of individual mobility prediction. The proposed method is especially effective in “cold-start” scenarios where traditional methods are known to perform poorly.
Conference Paper
Full-text available
Location prediction enables us to use a person's mobility history to realize various applications such as efficient temperature control, opportunistic meeting support, and automated receptionists. Indoor location prediction is a challenging problem, particularly due to a high density of possible locations and short transition distances between these locations. In this paper we present Indoor-ALPS, an Adaptive Indoor Location Prediction System that uses temporal-spatial features to create individual daily models for the prediction of when a user will leave their current location (transition time) and the next location she will transition to. We tested Indoor-ALPS on the Augsburg Indoor Location Tracking Benchmark and compared our approach to the best performing temporal-spatial mobility prediction algorithm, Prediction by Partial Match (PPM). Our results show that Indoor-ALPS improves the temporal-spatial prediction accuracy over PPM for look-aheads up to 90 minutes by 6.2%, and for up to 30 minute look-aheads by 10.7%. These results demonstrate that Indoor-ALPS can be used to support a wide variety of indoor mobility prediction-based applications.
Article
Full-text available
This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. Most existing graph embedding methods do not scale for real world information networks which usually contain millions of nodes. In this paper, we propose a novel network embedding method called the "LINE," which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted. The method optimizes a carefully designed objective function that preserves both the local and global network structures. An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference. Empirical experiments prove the effectiveness of the LINE on a variety of real-world information networks, including language networks, social networks, and citation networks. The algorithm is very efficient, which is able to learn the embedding of a network with millions of vertices and billions of edges in a few hours on a typical single machine. The source code of the LINE is available online.
Article
Full-text available
Focus on movement data has increased as a consequence of the larger availability of such data due to current GPS, GSM, RFID, and sensors techniques. In parallel, interest in movement has shifted from raw movement data analysis to more application-oriented ways of analyzing segments of movement suitable for the specific purposes of the application. This trend has promoted semantically rich trajectories, rather than raw movement, as the core object of interest in mobility studies. This survey provides the definitions of the basic concepts about mobility data, an analysis of the issues in mobility data management, and a survey of the approaches and techniques for: (i) constructing trajectories from movement tracks, (ii) enriching trajectories with semantic information to enable the desired interpretations of movements, and (iii) using data mining to analyze semantic trajectories and extract knowledge about their characteristics, in particular the behavioral patterns of the moving objects. Last but not least, the article surveys the new privacy issues that arise due to the semantic aspects of trajectories.
Article
Full-text available
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Article
Full-text available
Data mining and statistical learning techniques are powerful analysis tools yet to be incorporated in the domain of urban studies and transportation research. In this work, we analyze an activity-based travel survey conducted in the Chicago metropolitan area over a demographic representative sample of its population. Detailed data on activities by time of day were collected from more than 30,000 individuals (and 10,552 households) who participated in a 1-day or 2-day survey implemented from January 2007 to February 2008. We examine this large-scale data in order to explore three critical issues: (1) the inherent daily activity structure of individuals in a metropolitan area, (2) the variation of individual daily activities—how they grow and fade over time, and (3) clusters of individual behaviors and the revelation of their related socio-demographic information. We find that the population can be clustered into 8 and 7 representative groups according to their activities during weekdays and weekends, respectively. Our results enrich the traditional divisions consisting of only three groups (workers, students and non-workers) and provide clusters based on activities of different time of day. The generated clusters combined with social demographic information provide a new perspective for urban and transportation planning as well as for emergency response and spreading dynamics, by addressing when, where, and how individuals interact with places in metropolitan areas.
Conference Paper
Full-text available
The collection of huge amount of tracking data made possi-ble by the widespread use of GPS devices, enabled the anal-ysis of such data for several applications domains, ranging from traffic management to advertisement and social stud-ies. However, the raw positioning data, as it is detected by GPS devices, lacks of semantic information since this data does not natively provide any additional contextual infor-mation like the places that people visited or the activities performed. Traditionally, this information is collected by hand filled questionnaire where a limited number of users are asked to annotate their tracks with the activities they have done. With the purpose of getting large amount of semantically rich trajectories, we propose an algorithm for automatically annotating raw trajectories with the activities performed by the users. To do this, we analyse the stops points trying to infer the Point Of Interest (POI) the user has visited. Based on the category of the POI and a proba-bility measure based on the gravity law, we infer the activity performed. We experimented and evaluated the method in a real case study of car trajectories, manually annotated by users with their activities. Experimental results are encour-aging and will drive our future works.
Article
Full-text available
Much work has been done on predicting where is one going to be in the immediate future, typically within the next hour. By contrast, we address the open problem of predicting human mobility far into the future, a scale of months and years. We propose an efficient nonparametric method that extracts significant and robust patterns in location data, learns their associations with contextual features (such as day of week), and subsequently leverages this information to predict the most likely location at any given time in the future. The entire process is formulated in a principled way as an eigendecomposition problem. Evaluation on a massive dataset with more than 32,000 days worth of GPS data across 703 diverse subjects shows that our model predicts the correct location with high accuracy, even years into the future. This result opens a number of interesting avenues for future research and applications.
Article
Full-text available
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.
Article
Full-text available
We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.
Article
Full-text available
Mobile phones or smartphones are rapidly becoming the central computer and communication device in people's lives. Application delivery channels such as the Apple AppStore are transforming mobile phones into App Phones, capable of downloading a myriad of applications in an instant. Importantly, today's smartphones are programmable and come with a growing set of cheap powerful embedded sensors, such as an accelerometer, digital compass, gyroscope, GPS, microphone, and camera, which are enabling the emergence of personal, group, and communityscale sensing applications. We believe that sensor-equipped mobile phones will revolutionize many sectors of our economy, including business, healthcare, social networks, environmental monitoring, and transportation. In this article we survey existing mobile phone sensing algorithms, applications, and systems. We discuss the emerging sensing paradigms, and formulate an architectural framework for discussing a number of the open issues and challenges emerging in the new area of mobile phone sensing research.
Conference Paper
Full-text available
Home heating is a major factor in worldwide energy use. Our system, PreHeat, aims to more efficiently heat homes by using occupancy sensing and occupancy prediction to automatically control home heating. We deployed PreHeat in five homes, three in the US and two in the UK. In UK homes, we controlled heating on a per-room basis to enable further energy savings. We compared PreHeat's prediction algorithm with a static program over an average 61 days per house, alternating days between these conditions, and measuring actual gas consumption and occupancy. In UK homes PreHeat both saved gas and reduced MissTime (the time that the house was occupied but not warm). In US homes, PreHeat decreased MissTime by a factor of 6-12, while consuming a similar amount of gas. In summary, PreHeat enables more efficient heating while removing the need for users to program thermostat schedules.
Conference Paper
Full-text available
We propose automatically learning probabilistic Hierarchical Task Networks (pHTNs) in order to capture a user's preferences on plans, by observing only the user's behavior. HTNs are a common choice of representation for a variety of purposes in planning, including work on learning in planning. Our contributions are (a) learning structure and (b) representing preferences. In contrast, prior work employing HTNs considers learning method preconditions (instead of structure) and representing domain physics or search control knowledge (rather than preferences). Initially we will assume that the observed distribution of plans is an accurate representation of user preference, and then generalize to the situation where feasibility constraints frequently prevent the execution of preferred plans. In order to learn a distribution on plans we adapt an Expectation-Maximization (EM) technique from the discipline of (probabilistic) grammar induction, taking the perspective of task reductions as productions in a context-free grammar over primitive actions. To account for the difference between the distributions of possible and preferred plans we subsequently modify this core EM technique, in short, by rescaling its input.
Article
Full-text available
The Hausdorff distance is commonly used as a similarity measure between two point sets. Using this measure, a set X is considered similar to Y iff every point in X is close to at least one point in Y. Formally, the Hausdorff distance HausDist(X, Y) can be computed as the Max-Min distance from X to Y, i.e., find the maximum of the distance from an element in X to its nearest neighbor (NN) in Y. Although this is similar to the closest pair and farthest pair problems, computing the Hausdorff distance is a more challenging problem since its Max-Min nature involves both maximization and minimization rather than just one or the other. A traditional approach to computing HausDist(X, Y) performs a linear scan over X and utilizes an index to help compute the NN in Y for each x in X. We present a pair of basic solutions that avoid scanning X by applying the concept of aggregate NN search to searching for the element in X that yields the Hausdorff distance. In addition, we propose a novel method which incrementally explores the indexes of the two sets X and Y simultaneously. As an example application of our techniques, we use the Hausdorff distance as a measure of similarity between two trajectories (represented as point sets). We also use this example application to compare the performance of our proposed method with the traditional approach and the basic solutions. Experimental results show that our proposed method outperforms all competitors by one order of magnitude in terms of the tree traversal cost and total response time.
Article
Full-text available
Information theoretic measures form a fundamental class of measures for comparing clusterings, and have recently received increasing interest. Nevertheless, a number of questions concerning their properties and inter-relationships remain unresolved. We perform an organized study of information theoretic measures for clustering comparison, including several existing popular measures in the literature, as well as some newly proposed ones. We discuss and prove their important properties, such as the metric property and the normalization property. We then highlight to the clustering community the importance of correcting information theoretic measures for chance, especially when the data size is small compared to the number of clusters present therein. Of the available information theoretic based measures, we advocate the normalized information distance (NID) as a general measure of choice, for it possesses concurrently several important properties, such as being both a metric and a normalized measure, admitting an exact analytical adjusted-for-chance form, and using the nominal [0,1] range better than other normalized variants.
Article
Full-text available
A growing number of sociologists, political scientists, economists, and organizational theorists have invoked the concept of social capital in the search for answers to a broadening range of questions being confronted in their own fields. Seeking to clarify the concept and help assess its utility for organizational theory, we synthesize the theoretical research undertaken in these various disciplines and develop a common conceptual framework that identifies the sources, benefits, risks, and contingencies of social capital.
Article
Full-text available
A range of applications, from predicting the spread of human and electronic viruses to city planning and resource management in mobile communications, depend on our ability to foresee the whereabouts and mobility of individuals, raising a fundamental question: To what degree is human behavior predictable? Here we explore the limits of predictability in human dynamics by studying the mobility patterns of anonymized mobile phone users. By measuring the entropy of each individual’s trajectory, we find a 93% potential predictability in user mobility across the whole user base. Despite the significant differences in the travel patterns, we find a remarkable lack of variability in predictability, which is largely independent of the distance users cover on a regular basis.
Conference Paper
With the ever-increasing urbanization process, systematically modeling people's activities in the urban space is being recognized as a crucial socioeconomic task. This task was nearly impossible years ago due to the lack of reliable data sources, yet the emergence of geo-tagged social media (GTSM) data sheds new light on it. Recently, there have been fruitful studies on discovering geographical topics from GTSM data. However, their high computational costs and strong distributional assumptions about the latent topics hinder them from fully unleashing the power of GTSM. To bridge the gap, we present CrossMap, a novel cross-modal representation learning method that uncovers urban dynamics with massive GTSM data. CrossMap first employs an accelerated mode seeking procedure to detect spatiotemporal hotspots underlying people's activities. Those detected hotspots not only address spatiotemporal variations, but also largely alleviate the sparsity of the GTSM data. With the detected hotspots, CrossMap then jointly embeds all spatial, temporal, and textual units into the same space using two different strategies: one is reconstruction-based and the other is graph-based. Both strategies capture the correlations among the units by encoding their co-occurrence and neighborhood relationships, and learn low-dimensional representations to preserve such correlations. Our experiments demonstrate that CrossMap not only significantly outperforms state-of-the-art methods for activity recovery and classification, but also achieves much better efficiency.
Conference Paper
Human routines are blueprints of behavior, which allow people to accomplish purposeful repetitive tasks at many levels, ranging from the structure of their day to how they drive through an intersection. People express their routines through actions that they perform in the particular situations that triggered those actions. An ability to model routines and understand the situations in which they are likely to occur could allow technology to help people improve their bad habits, inexpert behavior, and other suboptimal routines. However, existing routine models do not capture the causal relationships between situations and actions that describe routines. Our main contribution is the insight that byproducts of an existing activity prediction algorithm can be used to model those causal relationships in routines. We apply this algorithm on two example datasets, and show that the modeled routines are meaningful-that they are predictive of people's actions and that the modeled causal relationships provide insights about the routines that match findings from previous research. Our approach offers a generalizable solution to model and reason about routines.
Article
Driven by the advance of positioning technology and the popularity of location-sharing services, semantic-enriched trajectory data have become unprecedentedly available. The sequential patterns hidden in such data, when properly defined and extracted, can greatly benefit tasks like targeted advertising and urban planning. Unfortunately, classic sequential pattern mining algorithms developed for transactional data cannot effectively mine patterns in semantic trajectories, mainly because the places in the continuous space cannot be regarded as independent "items". Instead, similar places need to be grouped to collaboratively form frequent sequential patterns. That said, it remains a challenging task to mine what we call fine-grained sequential patterns, which must satisfy spatial compactness, semantic consistency and temporal continuity simultaneously. We propose SPLITTER to effectively mine such fine-grained sequential patterns in two steps. In the first step, it retrieves a set of spatially coarse patterns, each attached with a set of trajectory snippets that precisely record the pattern's occurrences in the database. In the second step, SPLITTER breaks each coarse pattern into fine-grained ones in a top-down manner, by progressively detecting dense and compact clusters in a higher-dimensional space spanned by the snippets. SPLITTER uses an effective algorithm called weighted snippet shift to detect such clusters, and leverages a divide-and-conquer strategy to speed up the top-down pattern splitting process. Our experiments on both real and synthetic data sets demonstrate the effectiveness and efficiency of SPLITTER.
Article
Most existing approaches aiming at measuring trajectory similarity are focused on two-dimensional sequences of points, called raw trajectories. However, recent proposals have used background geographic information and social media data to enrich these trajectories with a semantic dimension, giving rise to the concept of semantic trajectories. Only a few works have proposed similarity measures for semantic trajectories or multidimensional sequences, having limitations such as predefined weight of the dimensions, sensitivity to noise, tolerance for gaps with different sizes, and the prevalence of the worst dimension similarity. In this article we propose MSM, a novel similarity measure for multidimensional sequences that overcomes the aforementioned limitations by considering and weighting the similarity in all dimensions. MSM is evaluated through an extensive experimental study that, based on a seed trajectory, creates sets of semantic trajectories with controlled transformations to introduce different kinds and levels of dissimilarity. For each set, we compute the similarity between the seed and the transformed trajectories, using different measures. The results showed that MSM was more robust and efficient than related approaches in the domain of semantic trajectories.
Article
This paper explores the sequential structure of work processes in a task unit whose work involves high numbers of exceptions, low analyzability of search, frequent interruptions, and extensive deliberation and that cannot be characterized as routine under any traditional definition. Yet a detailed analysis of the sequential pattern of action in a sample of 168 service interactions reveals that most interactions follow a repetitive, functionally similar pattern. This apparent contradiction presents a challenge to our theoretical understanding of routines: How can apparently nonroutine work display such a high degree of regularity? To answer this question, we propose a new definition of organizational routines as a set of functionally similar patterns and illustrate a new methodology for studying the sequential structure of work processes using rule-based grammatical models. This approach to organizational routines juxtaposes the structural features of the organization against the reflective agency of organizational members. Members enact specific performances from among a constrained, but potentially large set of possibilities that can be described by a grammar, giving rise to the regular patterns of action we label routines.
Article
Microlevel mobility research argues that job changes depend on the job seeker's social network and social ties. Job seekers find better jobs by contacting persons with superior knowledge and influence. These contact persons are usually others with whom the job seeker has only weak ties. Life history data from Germany demonstrate the necessity of considering the multidimensional nature of social ties and the interaction between social ties and status of prior job when predicting job mobility. Results suggest some modification of micromobility theory because individuals with high status prior jobs benefit from weak social ties, whereas individuals with low status prior jobs do not.
Article
In recent years, research on location predictions by mining trajectories of users has attracted a lot of attention. Existing studies on this topic mostly treat such predictions as just a type of location recommendation, that is, they predict the next location of a user using location recommenders. However, an user usually visits somewhere for reasons other than interestingness. In this article, we propose a novel mining-based location prediction approach called Geographic-Temporal-Semantic-based Location Prediction (GTS-LP), which takes into account a user's geographic-triggered intentions, temporal-triggered intentions, and semantic-triggered intentions, to estimate the probability of the user in visiting a location. The core idea underlying our proposal is the discovery of trajectory patterns of users, namely GTS patterns, to capture frequent movements triggered by the three kinds of intentions. To achieve this goal, we define a new trajectory pattern to capture the key properties of the behaviors that are motivated by the three kinds of intentions from trajectories of users. In our GTS-LP approach, we propose a series of novel matching strategies to calculate the similarity between the current movement of a user and discovered GTS patterns based on various moving intentions. On the basis of similitude, we make an online prediction as to the location the user intends to visit. To the best of our knowledge, this is the first work on location prediction based on trajectory pattern mining that explores the geographic, temporal, and semantic properties simultaneously. By means of a comprehensive evaluation using various real trajectory datasets, we show that our proposed GTS-LP approach delivers excellent performance and significantly outperforms existing state-of-the-art location prediction methods.
Article
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.
Article
We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk's latent representations on several multi-label network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk's representations can provide F1 scores up to 10% higher than competing methods when labeled data is sparse. In some experiments, DeepWalk's representations are able to outperform all baseline methods while using 60% less training data. DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.
Conference Paper
Mining patterns of human behavior from large-scale mobile phone data has potential to understand certain phenomena in society. The study of such human-centric massive datasets requires new mathematical models. In this paper, we propose a probabilistic topic model that we call the distant n-gram topic model (DNTM) to address the problem of learning long duration human location sequences. The DNTM is based on Latent Dirichlet Allocation (LDA). We define the generative process for the model, derive the inference procedure and evaluate our model on real mobile data. We consider two different real-life human datasets, collected by mobile phone locations, the first considering GPS locations and the second considering cell tower connections. The DNTM successfully discovers topics on the two datasets. Finally, the DNTM is compared to LDA by considering log-likelihood performance on unseen data, showing the predictive power of the model on unseen data. We find that the DNTM consistantly outperforms LDA as the sequence length increases.
Article
The frequency distribution of words has been a key object of study in statistical linguistics for the past 70 years. This distribution approximately follows a simple mathematical form known as Zipf's law. This article first shows that human language has a highly complex, reliable structure in the frequency distribution over and above this classic law, although prior data visualization methods have obscured this fact. A number of empirical phenomena related to word frequencies are then reviewed. These facts are chosen to be informative about the mechanisms giving rise to Zipf's law and are then used to evaluate many of the theoretical explanations of Zipf's law in language. No prior account straightforwardly explains all the basic facts or is supported with independent evaluation of its underlying assumptions. To make progress at understanding why language obeys Zipf's law, studies must seek evidence beyond the law itself, testing assumptions and evaluating novel predictions with new, independent data.
Article
The development of a city gradually fosters different functional regions, such as educational areas and business districts. In this paper, we propose a framework (titled DRoF) that Discovers Regions of different Functions in a city using both human mobility among regions and points of interests (POIs) located in a region. Specifically, we segment a city into disjointed regions according to major roads, such as highways and urban express ways. We infer the functions of each region using a topic-based inference model, which regards a region as a document, a function as a topic, categories of POIs (e.g., restaurants and shopping malls) as metadata (like authors, affiliations, and key words), and human mobility patterns (when people reach/leave a region and where people come from and leave for) as words. As a result, a region is represented by a distribution of functions, and a function is featured by a distribution of mobility patterns. We further identify the intensity of each function in different locations. The results generated by our framework can benefit a variety of applications, including urban planning, location choosing for a business, and social recommendations. We evaluated our method using large-scale and real-world datasets, consisting of two POI datasets of Beijing (in 2010 and 2011) and two 3-month GPS trajectory datasets (representing human mobility) generated by over 12,000 taxicabs in Beijing in 2010 and 2011 respectively. The results justify the advantages of our approach over baseline methods solely using POIs or human mobility.
Chapter
In step with the rapidly growing volumes of available moving-object trajectory data, there is also an increasing need for techniques that enable the analysis of trajectories. Such functionality may benefit a range of application area and services, including transportation, the sciences, sports, and prediction-based and social services, to name but a few. The chapter first provides an overview trajectory patterns and a categorization of trajectory patterns from the literature. Next, it examines relative motion patterns, which serve as fundamental background for the chapter's subsequent discussions. Relative patterns enable the sp