Article

Potentials of using social media to infer the longitudinal travel behavior: A sequential model-based clustering method

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This study explores the possibility of employing social media data to infer the longitudinal travel behavior. The geo-tagged social media data show some unique features including location-aggregated features, distance-separated features, and Gaussian distributed features. Compared to conventional household travel survey, social media data is less expensive, easier to obtain and the most importantly can monitor the individual’s longitudinal travel behavior features over a much longer observation period. This paper proposes a sequential model-based clustering method to group the high-resolution Twitter locations and extract the Twitter displacements. Further, this study details the unique features of displacements extracted from Twitter including the demographics of Twitter user, as well as the advantages and limitations. The results are even compared with those from traditional household travel survey, showing promises in using displacement distribution, length, duration and start time to infer individual’s travel behavior. On this basis, one can also see the potential of employing social media to infer longitudinal travel behavior, as well as a large quantity of short-distance Twitter displacements. The results will supplement the traditional travel survey and support travel behavior modeling in a metropolitan area.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Similar to research using regional properties to study survey respondents' living conditions (e.g., in urban sociology), research using Twitter data can examine the spatial distribution of tweets, compare the content of tweets across regions, or link Twitter data with external data sources by way of regional identifiers to study a variety of phenomena. Recent studies in the social sciences have used Twitter geoinformation to study the COVID-19 pandemic (Ntompras et al., 2022), influenza trends (Gao et al., 2018), crime (Hipp et al., 2018), language dialects (Huang et al., 2016), conspiracy theories (Stephens, 2020), polling (Beauchamp, 2017), travel and mobility (Blanford et al., 2015;Zhang et al., 2017;Wang et al., 2018;Levy et al., 2020), health behavior and outcomes (Wiedener and Li, 2014;Nguyen et al., 2017;Martinez et al., 2018), anti-immigrant attitudes (Menshikova and van Tubergen, 2022), happiness (Mitchell et al., 2013), and human behavior in environmental disasters (Murthy and Gross, 2017). ...
... Among these, GPS geotags are the most obvious source of location information, as they come in the form of geographic coordinates (longitude and latitude) and represent precise locations on the Earth's surface without any further processing. Thanks to their ease of use, tweet geotags are utilized by many researchers to locate tweets and users in their analysis (Mitchell et al., 2013;Hawelka et al., 2014;Wiedener and Li, 2014;Blanford et al., 2015;Shelton et al., 2015;Huang et al., 2016;Murthy and Gross, 2017;Nguyen et al., 2017;Zhang et al., 2017;Hipp et al., 2018;Martinez et al., 2018;Wang et al., 2018;Levy et al., 2020). However, this information is available for not even 1% of all tweets (Sloan and Morgan, 2015). ...
Article
Full-text available
More and more, social scientists are using (big) digital behavioral data for their research. In this context, the social network and microblogging platform Twitter is one of the most widely used data sources. In particular, geospatial analyses of Twitter data are proving to be fruitful for examining regional differences in user behavior and attitudes. However, ready-to-use spatial information in the form of GPS coordinates is only available for a tiny fraction of Twitter data, limiting research potential and making it difficult to link with data from other sources (e.g., official statistics and survey data) for regional analyses. We address this problem by using the free text locations provided by Twitter users in their profiles to determine the corresponding real-world locations. Since users can enter any text as a profile location, automated identification of geographic locations based on this information is highly complicated. With our method, we are able to assign over a quarter of the more than 866 million German tweets collected to real locations in Germany. This represents a vast improvement over the 0.18% of tweets in our corpus to which Twitter assigns geographic coordinates. Based on the geocoding results, we are not only able to determine a corresponding place for users with valid profile locations, but also the administrative level to which the place belongs. Enriching Twitter data with this information ensures that they can be directly linked to external data sources at different levels of aggregation. We show possible use cases for the fine-grained spatial data generated by our method and how it can be used to answer previously inaccessible research questions in the social sciences. We also provide a companion R package, nutscoder, to facilitate reuse of the geocoding method in this paper.
... Twitter messages often contain daily happening events with location data, that suits CFGA well in identifying traffic events. Recently, Twitter data harnessing for various transport-related applications like incident detection, accessing transport information [11], travel behavior [12], [13] and mobility pattern analysis [14], [15], [16] [17], [18], [19], [20], etc. are widely popular. In [21], authors detected the traffic accidents using geotagged Twitter data. ...
... Authors developed a comprehensive approach to identify traffic-related tweets using historical tweets and further applied tf-idf to generate the influential word set [29]. In [13], clustering was applied to group the tweet locations which infer the travel behavior. The results were compared with traditional household survey methods and verify the potential of social media data for travel behavior studies. ...
Preprint
Full-text available
Confined use of road sensors limits the effectiveness of traffic disturbing event detection. In this context, Twitter is becoming popular among the people to share the events that affect the daily life. In this paper, a novel dictionary formation and a new feature generation approach is proposed to build an integrated machine learning framework to detect the traffic events. The proposed novel combinatorial feature generation approach (CFGA) uncovers appropriate associations among the keywords of tweet and extracts the correlated keyword sets to the data collected. Such keyword sets are denoted as set phrases . The set phrases may comprise of single or multiple words of a tweet. These set phrases may be used as keywords for event-related data collection or further analysis. The frequently occurring set phrases are identified using the notion of support, which signifies the percentage of tweets containing relevant keywords. Since the nature of different events may also vary; therefore, a hardcoded value for support thresh- old will not be beneficial. Hyper-parameter designated as support (!) is tuned for finding threshold value that is used to obtain the set phrases. This process sets up a database of frequently occurring set phrases that can signalize the traffic-related events using ML classifier. The results of the proposed approach suggest that if suitable support is chosen, proposed CFGA increases the accuracy of supervised classification models for extracting traffic information from Twitter data. The classification results obtained by using the proposed approach outperform their existing counterparts in terms of precision, recall, and F-measure.
... Unlike traditional data, social media offer semantically rich information, not to mention frequent, high-granular updates that cost little or nothing to extract. For these reasons, researchers have been using social media to study human practices (Zhu, Blanke, & Gerhard, 2016;Bodnar, Dering, Tucker, & Hopkinson, 2017) such as travel behavior (Bocconi, Bozzon, Psyllidis, Bolivar, & Houben, 2015;Rashidi, Abbasi, Maghrebi, Hasan, & Waller, 2017;Zhang, He, & Zhu, 2017), modes of transportation (Zhang, He, & Zhu, 2017), and nutrition patterns (Fried, Surdeanu, Kobourov, Hingie, & Bell, 2014;Abbar, Mejova, & Weber, 2015;Fard, Hadadi, & Targhi, 2016). ...
... Unlike traditional data, social media offer semantically rich information, not to mention frequent, high-granular updates that cost little or nothing to extract. For these reasons, researchers have been using social media to study human practices (Zhu, Blanke, & Gerhard, 2016;Bodnar, Dering, Tucker, & Hopkinson, 2017) such as travel behavior (Bocconi, Bozzon, Psyllidis, Bolivar, & Houben, 2015;Rashidi, Abbasi, Maghrebi, Hasan, & Waller, 2017;Zhang, He, & Zhu, 2017), modes of transportation (Zhang, He, & Zhu, 2017), and nutrition patterns (Fried, Surdeanu, Kobourov, Hingie, & Bell, 2014;Abbar, Mejova, & Weber, 2015;Fard, Hadadi, & Targhi, 2016). ...
... Furthermore, we use administrative suburb boundaries to designate user activity areas rather than calculating them as exact trajectories in time and space. [27] is also another study which explicitly determined geographical displacement for unique twitter users. They used a model based clustering approach to determine clusters of travel locations that have similar travel motifs. ...
... They used a model based clustering approach to determine clusters of travel locations that have similar travel motifs. In comparison to [27] this work does not cluster based on geographical coordinates but rather on inclusion in predetermined suburb boundaries around Melbourne. ...
Conference Paper
The population of Melbourne is growing at over 100,000 every year. This is impacting on all aspects of society: house prices, health, multi-ethnic society, and transport amongst many others. Social media data is a hugely popular data source for research in many domains from understanding urban environments to predicting election results. In this paper we present an approach to identify commuter travel patterns and calculate the average travel time of commuters around Melbourne using social media data from Twitter, Instagram, FourSquare and Flickr. At present there is no other technology or system that captures this information other than by randomly sampling subsets of the population. To achieve this, the social media data was transformed into travel vectors and subsequently filtered in order to facilitate analysis and reduce noise in the data. Travel patterns were then learned using K-means clustering and use of force-directed graphing techniques. Travel pattern identification was based on social media user movement between suburb boundaries in the city of Melbourne. The calculated travel time was compared to the expected travel time by train. A web based platform was developed to allow analysis of the results. The results show that social media can indeed be used to better understand commuting behaviours, although it is highly dependent on the amount of data and especially geo-coded data from individual users.
... According to Rashidi et al. (2017), as social media data encompasses information that is revealed by users in realistic situations, such data is free from sampling, surveying or laboratory biases. The location effectiveness and timeliness features of Twitter can be proved in a recent accident detection study that uses the GPS-enabled smartphones (White et al., 2011) and travel behavior study which has been validated by the household travel survey (Zhang et al., 2017). ...
Preprint
This paper employs deep learning in detecting the traffic accident from social media data. First, we thoroughly investigate the 1-year over 3 million tweet contents in two metropolitan areas: Northern Virginia and New York City. Our results show that paired tokens can capture the association rules inherent in the accident-related tweets and further increase the accuracy of the traffic accident detection. Second, two deep learning methods: Deep Belief Network (DBN) and Long Short-Term Memory (LSTM) are investigated and implemented on the extracted token. Results show that DBN can obtain an overall accuracy of 85% with about 44 individual token features and 17 paired token features. The classification results from DBN outperform those of Support Vector Machines (SVMs) and supervised Latent Dirichlet allocation (sLDA). Finally, to validate this study, we compare the accident-related tweets with both the traffic accident log on freeways and traffic data on local roads from 15,000 loop detectors. It is found that nearly 66% of the accident-related tweets can be located by the accident log and more than 80% of them can be tied to nearby abnormal traffic data. Several important issues of using Twitter to detect traffic accidents have been brought up by the comparison including the location and time bias, as well as the characteristics of influential users and hashtags.
... Previously, numerous studies have utilized social media data for transportation research, including the classification of urban activity patterns [5], estimation of travel activity spaces [6], examination of longitudinal travel behavior [7], incidents detection [8], and so on. Specifically, the authors in [5] utilize Latent Dirichlet Allocation to classify individual activity patterns. ...
Preprint
Full-text available
Social media has become an important platform for people to express their opinions towards transportation services and infrastructure, which holds the potential for researchers to gain a deeper understanding of individuals' travel choices, for transportation operators to improve service quality, and for policymakers to regulate mobility services. A significant challenge, however, lies in the unstructured nature of social media data. In other words, textual data like social media is not labeled, and large-scale manual annotations are cost-prohibitive. In this study, we introduce a novel methodological framework utilizing Large Language Models (LLMs) to infer the mentioned travel modes from social media posts, and reason people's attitudes toward the associated travel mode, without the need for manual annotation. We compare different LLMs along with various prompting engineering methods in light of human assessment and LLM verification. We find that most social media posts manifest negative rather than positive sentiments. We thus identify the contributing factors to these negative posts and, accordingly, propose recommendations to traffic operators and policymakers.
... Social media data, reflecting real-life user situations, often avoids the biases typical in traditional data collection methods [9]. Twitter's timeliness and location accuracy have been validated in accident detection studies leveraging GPS-enabled smartphones [34] and travel behavior studies validated by household travel surveys [35]. ...
Article
Full-text available
Highlights What are the main findings? Demonstrates the effectiveness of a novel multitask learning (MTL) framework utilizing large language models (LLMs) for real-time analysis of road traffic crashes (RTCs) through the integration of social media data. Fine-tuning GPT-2 for language modeling demonstrated that it outperformed baseline models, including GPT-4o mini in zero-shot mode and XGBoost, across various classification and information retrieval tasks. This study benchmarks the performance of the fine-tuned GPT-2 model against these baselines, highlighting its superior performance in these tasks. The study collected and curated a dataset of 26,226 RTC-related tweets from Australia over a year. This dataset extracted fifteen unique features, with six used in classification tasks and nine in information retrieval tasks. Developed an advanced automated labeling system using GPT-3.5, followed by rigorous expert verification to ensure the accuracy and reliability of feature extraction from tweets. The resulting meticulously curated dataset serves as a foundational resource for training and validating subsequent models, establishing a new standard for RTC analysis. What is the implication of the main finding? Offers a transformative approach to traffic safety analytics, providing detailed, timely insights crucial for emergency responders, urban planners, and policymakers. By leveraging cutting-edge AI techniques within an MTL framework, this study demonstrates a transformative approach to real-time RTC analysis, setting the stage for future advancements in the field. The curated dataset generated in this research not only advances traffic safety measures but also serves as a valuable resource for extracting insights, developing models, and conducting further research. This resource provides a solid foundation for future studies aimed at enhancing road safety. Abstract Road traffic crashes (RTCs) are a global public health issue, with traditional analysis methods often hindered by delays and incomplete data. Leveraging social media for real-time traffic safety analysis offers a promising alternative, yet effective frameworks for this integration are scarce. This study introduces a novel multitask learning (MTL) framework utilizing large language models (LLMs) to analyze RTC-related tweets from Australia. We collected 26,226 traffic-related tweets from May 2022 to May 2023. Using GPT-3.5, we extracted fifteen distinct features categorized into six classification tasks and nine information retrieval tasks. These features were then used to fine-tune GPT-2 for language modeling, which outperformed baseline models, including GPT-4o mini in zero-shot mode and XGBoost, across most tasks. Unlike traditional single-task classifiers that may miss critical details, our MTL approach simultaneously classifies RTC-related tweets and extracts detailed information in natural language. Our fine-tunedGPT-2 model achieved an average accuracy of 85% across the six classification tasks, surpassing the baseline GPT-4o mini model’s 64% and XGBoost’s 83.5%. In information retrieval tasks, our fine-tuned GPT-2 model achieved a BLEU-4 score of 0.22, a ROUGE-I score of 0.78, and a WER of 0.30, significantly outperforming the baseline GPT-4 mini model’s BLEU-4 score of 0.0674, ROUGE-I score of 0.2992, and WER of 2.0715. These results demonstrate the efficacy of our fine-tuned GPT-2 model in enhancing both classification and information retrieval, offering valuable insights for data-driven decision-making to improve road safety. This study is the first to explicitly apply social media data and LLMs within an MTL framework to enhance traffic safety.
... However, this method has some limitations as the individuals using and sharing trackable social media content can be behaviorally different from people that do not share this content and it depends on the willingness-to-share the venues visited. This problem is challenging for researchers when they want to generalize the population's travel behavior (Zhang et al. 2017). ...
Article
Full-text available
When observing the temporal trajectory of an individual, there is a high probability of them visiting an already-known place due to habit and routine in human mobility behavior. To collect data and understand these routine activities, we propose the Place Generator and the Place Interpreter, a survey adapted from the Name Generator and Name Interpreter methodology of social network studies for travel behavior. In the survey, we asked the participants to name the venues they regularly visit for leisure by category. This methodology captures the characteristics of the venues and the reasons to be chosen. We tested this method in the Zurich Metropolitan Area in Switzerland, focusing on leisure activities and the social environment of the venues. Hence, we ask the individuals to describe the reasons for choosing that specific location and the sociodemographic characteristics of the other visitors. This methodology worked well when compared with earlier long-duration GPS tracking surveys. Respondents report, on average, 9.85 locations for nine types of venues, mainly supermarkets and restaurants or cafes, and respondents can describe their similarities with other visitors to that location. The survey is complemented with a survey of sociodemographic characteristics and the respondent’s ego-centric social network to get information on social connections and their impact on leisure activity.
... Supplementing the analysis of travel and shopping with longitudinal data that covers an extended period could provide valuable insights, but it would require a well-designed survey to ensure a high response rate and would likely be costly (Eisenmann et al., 2019). Another promising approach, which was used by Zhang et al. (2017), is to complement traditional travel behavior studies with data from social media that captures meaningful purchasing information. Innovative technologies could also be leveraged to collect data, as demonstrated by Hong et al. (2021), who compared the accuracy and level of detail between GPS-based household travel surveys and traditional recall-based methods. ...
Article
Full-text available
How does e-shopping impact household travel? To answer this question, which is particularly relevant for policymakers concerned with congestion, air pollution, and greenhouse gas emissions from transportation, we analyzed data from the 2017 National Household Travel Survey using propensity score matching. This allowed us to tackle the bias from households self-selecting into various levels of e-shopping and gain causal inference. Unlike other related papers in the literature, our unit of analysis is a household because travel and shopping decisions within a household are interrelated. We classified households into three groups based on how many orders per person per month they placed online: low (up to one), medium (more than once but less than four), and high (over four). We found that more e-shopping results in more household travel (number of trips, miles, and VMT), but this effect depends on e-shopping frequency and population density, and it affects weekdays more than weekends. E-shopping impacts household travel more for medium frequency e-shoppers in low density areas: compared to similar low frequency e-shoppers, on weekdays, they take on average 8 more monthly trips and travel ~104 extra miles (including 31 miles for shopping). At the other end of the spectrum, high frequency e-shoppers in dense areas do not travel more on weekends than similar low e-shopping frequency households. To help reduce e-shopping induced travel, policymakers could encourage the creation of neighborhood depots where households would pickup and return unwanted orders, and foster the development of virtual reality tools for shopping from home.
... Existence of saturated pixels, blooming effects and failure to remove gas flaring, have reduced the quality of NTL imagery in mapping socioeconomic variables; however, such problems are nonexistent in the tweet images (Zhao et al. 2018). Although the number of geotagged tweets has great potential to be deemed as a substitute for brightness of NTL to assess socioeconomic variables over large geographic areas (Zhao et al. 2018), Twitter data have some inherent disadvantages, such as relatively low heterogeneity in sample representativeness and sampling frequency (Jiang et al. 2016;Patel et al. 2017;Paule, Sun, and Thakuriah 2019;Steiger et al. 2016;Zhang, He, and Zhu 2017). Owing to a high spatio-temporal resolution, mobile phone call records data are likely to be the one of the best social sensing data sources for mapping population (Chen et al. 2018;Kubíček et al. 2019;Liu et al. 2018;Lulli et al. 2016;Thomas et al. 2017;Yao et al. 2017). ...
Article
Full-text available
In this study, we proposed a multi-source approach for mapping local-scale population density of England. Specifically, we mapped both the working and daytime population densities by integrating the multi-source data such as residential population density, point-of-interest density, point-of-interest category mix, and nighttime light intensity. It is demonstrated that combining remote sensing and social sensing data provides a plausible way to map annual working or daytime population densities. In this paper, we trained models with England-wide data and subsequently tested these models with Wales-wide data. In addition, we further tested the models with England-wide data at a higher level of spatial granularity. Particularly, the random forest and convolutional neural network models were adopted to map population density. The estimated results and validation suggest that the three built models have high prediction accuracies at the local authority district level. It is shown that the convolutional neural network models have the greatest prediction accuracies at the local authority district level though they are most time-consuming. The models trained with the data at the local authority district level are less appropriately applicable to test data at a higher level of spatial granularity. The proposed multi-source approach performs well in mapping local-scale population density. It indicates that combining remote sensing and social sensing data is advantageous to mapping socioeconomic variables.
... Using this method, they can accurately predict the productivity and effectiveness of the network, whether it is a business or a whole city (Pentland, 2015). Furthermore, various social media-based studiesmainly utilizing Twitter data thanks to the free Twitter APIare available for investigating the users' travel behavior (Zhang et al., 2017) or the capability of detecting road traffic congestion Moyano et al. (2021). In Adetiloye and Awasthi (2019), Twitter data are used besides the traditional sensors for traffic congestion prediction. ...
Article
Full-text available
The estimation and analysis of road traffic represent the preliminary steps towards satisfying the current needs for smooth, safe, and green transportation. Therefore, effective traffic monitoring is an essential topic alongside the planning of sustainable transportation systems and the development of new traffic management concepts. In contrast to classical traffic detection solutions, this study investigates the correlation between travelers' social activities and road traffic. The s's primary goal is to investigate the presence of the relationship between social activity and road traffic, which might allow an infrastructure-independent traffic monitoring technique as well. People's general activities at Point of Interest (POI) locations (measured as occupancy parameter) are correlated with traffic data so that, finally, proper proxys can be defined for link-level average traffic speed estimation. The method is tested and evaluated using real-world traffic and POI occupancy data from Budapest (District XI.). The results of the correlation investigation justify an indirect relationship between activity at POIs and road traffic, which holds promise for future practical applicability.
... However, this method can have some limitations as the individuals using and sharing trackable social media content can be behaviorally different from people that do not share this content. This problem is challenging for researchers when they want to estimate the population's average trip length and start and end times (Zhang et al, 2017). ...
Preprint
Full-text available
When observing the temporal trajectory of an individual, there is a high probability of visiting a known place; this is due to the central component of habit and routine in human mobility behavior. To understand those routine activities, we propose a new survey method: the Place Generator & the Place Interpreter, based on the name generator and name interpreter survey methodology for ego-centric social networks. In the survey, we asked the participants to name the locations they regularly visit for leisure, by category. This methodology captures the characteristics of the locations and the reasons to be chosen. We tested the methodology in the Zurich Metropolitan Area in Switzerland, focusing on urban leisure activities and the social environment of the locations. Hence, we ask the individuals to describe the reasons for choosing that specific location and the socio-demographic characteristics of the other visitors. This methodology worked well when compared with earlier long-duration GPS tracking surveys. Respondents report, on average, 9.85 locations for nine types of locations, mainly supermarkets and restaurants or cafes, and respondents can describe their similarities with other visitors to that location. The survey is complemented with a survey of sociodemographic characteristics and the respondent's ego-centric social network to get information on social connections and their impact on leisure activity.
... Existing work leverages social media data to analyze travel behaviors, including activity pattern classification [11], location inference [12], travel activity estimation [13], and longitudinal travel behavior inference [14]. A forecasting model is proposed to predict mode choices according to the check-in information of individual tweets [15]. ...
Article
Full-text available
This paper aims to leverage Twitter data to understand travel mode choices during the pandemic. Tweets related to different travel modes in New York City (NYC) are fetched from Twitter in the two most recent years (January 2020–January 2022). Building on these data, we develop travel mode classifiers, adapted from natural language processing (NLP) models, to determine whether individual tweets are related to some travel mode (subway, bus, bike, taxi/Uber, and private vehicle). Sentiment analysis is performed to understand people’s attitudinal changes about mode choices during the pandemic. Results show that a majority of people had a positive attitude toward buses, bikes, and private vehicles, which is consistent with the phenomenon of many commuters shifting away from subways to buses, bikes and private vehicles during the pandemic. We analyze negative tweets related to travel modes and find that people were worried about those who did not wear masks on subways and buses. Based on users’ demographic information, we conduct regression analysis to analyze what factors affected people’s attitude toward public transit. We find that the attitude of users in the service industry was more easily affected by MTA subway service during the pandemic.
... They were able to extract interesting properties of supposed journeys from the data, including travel time, travel distance, and travel speed. Zhang et al. (2017) also discovered several properties of journeys that can be built from social media data. In utilizing Twitter data to build longitudinal travel records, they were able to infer distributions of inter-tweet displacement, length of displacement, duration of displacement, and travel start time. ...
Article
Full-text available
Background In this paper, we consider the applicability of the customer journey framework from retailing as a driver for urban informatics at individual scales within urban science. The customer journey considers shopper experiences in the context of shopping paths, retail service spaces, and touch-points that draw them into contact. Around this framework, retailers have developed sophisticated data science for observation, identification, and measurement of customers in the context of their shopping behavior. This knowledge supports broad data-driven understanding of customer experiences in physical spaces, economic spaces of decision and choice, persuasive spaces of advertising and branding, and inter-personal spaces of customer-staff interaction. Method We review the literature on pedestrian and high street retailing, and on urban informatics. We investigate whether the customer journey could be usefully repurposed for urban applications. Specifically, we explore the potential use of the customer journey framework for producing new insight into pedestrian behavior, where a sort of empirical hyperopia has long abounded because data are always in short supply. Results Our review addresses how the customer journey might be used as a structure for examining how urban walkers come into contact with the built environment, how people actively and passively sense and perceive ambient city life as they move, how pedestrians make sense of urban context, and how they use this knowledge to build cognition of city streetscapes. Each of these topics has relevance to walking studies specifically, but also to urban science more generally. We consider how retailing might reciprocally benefit from urban science perspectives, especially in extending the reach of retailers' insight beyond store walls, into the retail high streets from which they draw custom. Conclusion We conclude that a broad set of theoretical frameworks, data collection schemes, and analytical methodologies that have advanced retail data science closer and closer to individual-level acumen might be usefully applied to accomplish the same in urban informatics. However, we caution that differences between retailers’ and urban scientists’ viewpoints on privacy presents potential controversy.
... [12][13] Discussed the extraction of various aspects of transportation information from social media, which is examined and revised based on human perceptions. [14] Proposed a methodology that explores the abilities to make use of geotagged social media twitter data and investigate the longitudinal travel behavior analysis. [15] [16] Presented an approach that integrates the text mining techniques and ensemble methods which increases the performance of models for predicting the severity of rail accidents. ...
Article
Full-text available
Capturing public insights related to transit systems in social media has gained huge popularity presently. The regional transportation agencies use social media as a tool to provide information to the public and seek their inputs and ideas for meaningful decision making in transportation activities. This exploratory study attempts to gauge the impact of social media use in transportation planning that in turn would help transportation administration in identifying the day-to-day challenges faced by the customers and to suggest a suitable solution. This paper presents the effect of pre-processing techniques on transit opinion analysis to improve the performance. Performance of different pre-processing methods namely stop word removal, stemming, lemmatization, negation handling and URL removal using feature representation models namely TF-IDF with unigram, TF-IDF with bigram on three feature selection techniques including information gain, standard deviation and chi-square on social media transit rider’s opinion is carried out. The experimental results are evaluated using four different classifiers such as Support vector machine, Naïve Bayes, Decision Tree, K-Nearest Neighborhood in terms of accuracy, precision, recall, and f-measure. On analyzing the social media related transit opinion data, it is observed that pre-processing with bigram technique performs better than the other approaches specifically with Support Vector Machine and Naïve Bayes.
... Criticism about this approach is that the model deals with spatio-temporal observations, while density-based models process spatial noise but not the temporal one (Ester et al., 1996). Finally, in Zhang et al. (2017) a sequential model-based approach that allocates points is presented. The sequential nature of this model allows to overcome some of the limitations of the traditional model-based approach, which is purely data-driven and can lead to unrealistic clusters. ...
Article
This paper proposes a data fusion approach to automatically detect activity patterns in a GPS dataset based on travel diaries and correct misclassification errors. The Activity Patterns Detection consists of a Supervised Learning framework, thanks to which the activity purposes in the travel diaries are learned and then predicted in the GPS dataset. Furthermore, we deploy Unsupervised Learning to identify similar spatial and temporal activities in the GPS dataset and, based on travel diaries, to correct the misclassification errors. This work shows that, based on a few observations in the travel diaries and a set of features such as the resting time before the activity takes place, the number of occurrences of the same trip and the percentage of the trip made during daytime and the speed, it is possible to detect activities with an overall accuracy of 90%. Since the GPS dataset does not have information on the activity performed by the user, in reality, the aggregated results are validated based on the Kolmogorov-Smirnov test. The experiment shows that, with a confidence level of 99%, the majority of spatial and temporal feature distributions of activities in the travel diaries dataset are similar to those in the GPS dataset. Thanks to this approach, planners and transport operators can automatically obtain spatial and temporal patterns of frequent activities in urban areas.
... Combined with the point-of-interest data, they can be used to forecast the next activity besides the current activity (Cui et al., 2018). Twitter and other social media data have been used to study different aspects of longitudinal travel behaviour, such as destination choice (Chen et al., 2018;Llorca et al., 2018;Zhang et al., 2017) and mode choice (Maghrebi et al., 2016). When combined with census and land-use data, Twitter data can help estimate OD demand matrices with adequate accuracy (Osorio-Arjona & García-Palomares, 2019). ...
Article
Data play an indispensable role in transport modelling. The availability of data from non-conventional sources, such as mobile phones, social media, and public transport smart cards, changes the way we conduct mobility analyses and travel forecasting. Existing studies have demonstrated the multitude and varied applications of these emerging data in transport modelling. The transferability of current research and further endeavours depend mostly on the availability of these data. Therefore, the openness or public availability of the prominent data for transport modelling needs to be adequately investigated. Such a discussion should also encompass these data’s application aspects to provide a holistic overview. This paper defines a typology for the data classification based on a set of availability or openness attributes from the existing literature. Subsequently, we use the developed typology to classify the prominent transport data into four categories: (i) Commercial data, (ii) Inaccessible data, (iii) Gratis and accessible data with restricted use, and (iv) Open data. Using this typology, we conclude that the public data, which refer to the data that are accessible and free of cost, are a superset of open data. Further, we discuss the applications and limitations of the selected data in transport modelling and highlight in which task(s) certain data excel. Lastly, we synthesise our review using a Strengths, Weaknesses, Opportunities and Threats (SWOT) analysis to bring out the aspects relevant to data owners and data consumers. Public availability of data can help in various modelling steps such as trip generation, accessibility, destination choice, route choice, network modelling. Complementary datasets such as General Transit Feed Specification (GTFS) and Volunteered Geographic Information (VGI) increase the usability of other data. Thus, modellers can gain from the positive cascade effect by prioritising these data. There is also a potential for data owners to release proprietary data, such as mobile phone data, with restricted-use licenses after addressing privacy risks. Our study contributes by dealing with two problems at the same time. On the one hand, the paper analyses existing data based on their potential for mobility studies. On the other hand, we classify them based on how open they are. Hence, we identify the most promising public data for developing the next generation of transport models.
... The other group of studies placed attention on more fine-grained travel behaviors. Zhang et al. [4] proposed a sequential model-based clustering method to group locations and extract displacements from geotagged social media data, which can supplement traditional travel surveys. Hot spots of customized buses were identified by Qiu et al. [5] using passenger trip behavior data. ...
Article
Full-text available
We investigate how to effectively and efficiently embed users' personalized travel behaviors to vectors in this paper. Based on an example scenario of travel mode choice in intelligent transportation system, three data structures representing users' travel behaviors are defined, namely heterogeneous graph of users' travel behaviors, user travel behavior k-partite graph, and personalized user travel behavior sentence set. This paper systematically analyzes the principle of existing methods and provides intuitions for the problem of learning travel behavior representation in intelligent transportation system. Then we propose the Behavior2vector, which is an improved method tailored for embedding users' personalized travel behaviors to vectors. In our experiments, we design a travel mode choice model based on machine learning, which uses both hand-crafted basic features and embedded vector features. We further quantify the impact of various factors on travel mode choice and use travel big data to test the hypothesis of traffic assignment models, e.g., travelers always choose the path with the shortest path. In addition, we also compared with the existing graph embedding methods and essentially discussed their advantages and disadvantages.
... There have been studies comparing multiple data sources to identify/adjust the biases [e.g., 64,65] and to validate against "ground truth" [e.g., 59]. When validating geotagged tweets against travel surveys, one study shows that geotagged social media data capture the displacement distribution, length, duration, and start time of trips reasonably well for inferring individual travel behaviour [66]. Validations using CDR need to be interpreted carefully as CDR and geotagged tweets have similar passive data collection manners that might share some shortcomings. ...
Thesis
Full-text available
Transportation presents a major challenge to curb climate change due in part to its ever-increasing travel demand. Better informed policy-making requires up-to-date empirical mobility data to model viable mitigation options for reducing emissions from the transport sector. On the one hand, the prevalence of digital technologies enables a large-scale collection of human mobility traces, providing big potentials for improving the understanding of mobility patterns and transport modal disparities. On the other hand, the advancement in data science has allowed us to continue pushing the boundary of the potentials and limitations, for new uses of big data in transport. This thesis uses emerging data sources, including Twitter data, traffic data, OpenStreetMap (OSM), and trip data from new transport modes, to enhance the understanding of mobility and transport modal disparities, e.g., how car and public transit support mobility differently. Specifically, this thesis aims to answer two research questions: (1) What are the potentials and limitations of using these emerging data sources for modelling mobility? (2) How can these new data sources be properly modelled for characterising transport modal disparities? Papers I-III model mobility mainly using geotagged social media data, and reveal the potentials and limitations of this data source by validating against established sources (Q1). Papers IV-V combine multiple data sources to characterise transport modal disparities (Q2) which further demonstrate the modelling potentials of the emerging data sources (Q1). Despite a biased population representation and low and irregular sampling of the actual mobility, the geolocations of Twitter data can be used in models to produce good agreements with the other data sources on the fundamental characteristics of individual and population mobility. However, its feasibility for estimating travel demand depends on spatial scale, sparsity, sampling method, and sample size. To extend the use of social media data, this thesis develops two novel approaches to address the sparsity issue: (1) An individual-based mobility model that fills the gaps in the sparse mobility traces for synthetic travel demand; (2) A population-based model that uses Twitter geolocations as attractions instead of trips for estimating the flows of people between regions. This thesis also presents two reproducible data fusion frameworks for characterising transport modal disparities. They demonstrate the power of combining different data sources to gain new insights into the spatiotemporal patterns of travel time disparities between car and public transit, and the competition between ride-sourcing and public transport.
... In this case, the trip purpose is mostly predicted based on surveys, with the development of related technologies enabling extensive research, including the addition of GPS data that provide the travel routes [15,29]. In addition, the analysis of complex space and time data has become more sophisticated through the fusion of heterogeneous data such as social media data (e.g., tweets) and POI data (e.g., Google API) [14,29,30]. However, predicting the trip purpose only through direct traveler surveys has severely limited the existing research. ...
Article
Full-text available
Public bike-sharing is eco-friendly, connects excellently with other transportation modes, and provides a means of mobility that is highly suitable in the current era of climate change. This study proposes a methodology for inferring the bike trip purpose based on bike-share and point-of-interest (POI) data. Because the purpose of a trip involves decision-making, its inference necessitates an understanding of the spatiotemporal complexity of human activities. Thus, the spatiotemporal features affecting bike trips were selected from the bike-share data, and the land uses at the origin and destination of the trips were extracted from the POI data. During POI type embedding, the data were augmented considering the geographical distance between the POIs and the number of bike rentals at each bike station. We further developed a ground truth data construction method that uses temporal mobile and POI data. The inference model was built using machine learning and applied to experiments involving bike stations in Seocho-gu, Seoul, Korea. The experimental results revealed that optimal performance was achieved with the use of decision tree algorithms, as demonstrated by a 78.95% overall accuracy and 66.43% F1-score. The proposed method contributes to a better understanding of the causes of movement within cities.
... Similar s indicate similar travel behaviors between survey data and Twitter data. This is also verified in a previous study (Zhang et al. 2017). Table 8 shows the comparison of parameters of truncated power-law for 4 categories with most people of survey data and resample Twitter data. ...
Article
Full-text available
Many studies demonstrated that social media data, especially Twitter data, have significant potentials to develop models for estimating travel demand, managing operation, and conducting long-term planning purposes. However, it is well known that research with social media data is facing a looming challenge in sampling bias. The Twitter user’s population has huge discrepancies compared with the overall population. Therefore, social media data, when it is directly used for travel behavior analysis, contains biases and errors to some degree. The objective of this study is to correct sampling bias of Twitter data for travel behavior analysis by inferring Twitter users’ socio-demographics. This study first links travelers’ Twitter account with their Facebook account, and verifies their socio-demographics by Facebook data, assuming that one’s Facebook information is real. Second, several models are proposed for predicting socio-demographics, including gender, age, ethnicity, and education levels. Afterward, this paper resamples social media data and compares it to the 2009 California Household Travel Survey data. The resampled data show comparable characteristics to the survey data. This research shed light on tackling sampling bias issues when social media data are incorporated for augmenting travel behavior analysis and urban planning.
... Twitter has gradually become as a viable and even primary data source for some transportation research and applications. Recent studies valued its easy accessibility, low cost, as well as the "Big Data" features and made several breakthroughs in traditional research fields including traffic accident detection (1)(2), traffic flow prediction(3)(4), travel behavior and pattern analysis (5), transportation planning (6), infrastructure management (7), crisis management (8), etc. Of all these studies, travel behavior analysis focuses on the spatial and temporal travel features according to the GPS locations of the Twitter users and gives important information to transportation planners to evaluate the traffic operation. ...
Preprint
Full-text available
Social media platforms, such as Twitter, provide a totally new perspective in dealing with the traffic problems and is anticipated to complement the traditional methods. The geo-tagged tweets can provide the Twitter users' location information and is being applied in traveler behavior analysis. This paper explores the full potentials of Twitter in deriving travel behavior information and conducts a case study in Manhattan Area. A systematic method is proposed to extract displacement information from Twitter locations. Our study shows that Twitter has a unique demographics which combine not only local residents but also the tourists or passengers. For individual user, Twitter can uncover his/her travel behavior features including the time-of-day and location distributions on both weekdays and weekends. For all Twitter users, the aggregated travel behavior results also show that the time-of-day travel patterns in Manhattan Island resemble that of the traffic flow; the identification of OD pattern is also promising by comparing with the results of travel survey.
... When cross-validating against data with higher temporal resolution such as CDR (Lenormand et al. 2014), good agreement is generally found regarding, for instance, trip distance distribution. When validating geotagged tweets against travel surveys, studies show that geotagged social media data capture the displacement distribution, length, duration, and start time of trips reasonably well for the purpose of inferring individual travel behaviour (Zhang et al. 2017;Liao et al. 2019). Validations using CDR need careful interpretation, as CDR and geotagged tweets are both passive data collection methods that share some similar shortcomings. ...
Article
Full-text available
Travel demand estimation, as represented by an origin–destination (OD) matrix, is essential for urban planning and management. Compared to data typically used in travel demand estimation, the key strengths of social media data are that they are low-cost, abundant, available in real-time, and free of geographical partition. However, the data also have significant limitations: population and behavioural biases, and lack of important information such as trip purpose and social demographics. This study systematically explores the feasibility of using geolocations of Twitter data for travel demand estimation by examining the effects of data sparsity, spatial scale, sampling methods, and sample size. We show that Twitter data are suitable for modelling the overall travel demand for an average weekday but not for commuting travel demand, due to the low reliability of identifying home and workplace. Collecting more detailed, long-term individual data from user timelines for a small number of individuals produces more accurate results than short-term data for a much larger population within a region. We developed a novel approach using geotagged tweets as attraction generators as opposed to the commonly adopted trip generators. This significantly increases usable data, resulting in better representation of travel demand. This study demonstrates that Twitter can be a viable option for estimating travel demand, though careful consideration must be given to sampling method, estimation model, and sample size.
... Cats et al. (2015) used the hierarchical clustering method to classify urban clusters based on temporal mobility profiles. The cluster-based analysis has been used to group citizens by travel behavior (Zhang et al. 2017) and bike-sharing stations by trip numbers (Hyland et al. 2018). Although the clustering approach has been used in various domains, there are advantages and disadvantages associated with it. ...
Article
This study will estimate the impact of population size, population density, urbanization level and economy-related parameters on the proportion of trips with that have varied trip lengths for work purposes in India, which uses a fractional multinomial logit model. Data from 232 districts of India that have at least one city or town with a population >0.1 million will be used. The analysis shows the positive impact of the district's population size and economy on the proportion of trips longer than 5 km. In addition, there is a positive impact from the overall population density and a negative impact from urban density on the share of long-distance trips. This study shows that as districts in India continue to urbanize and develop economically, the propensity for traveling long distances will probably increase. This will result in an increased dependency on motorized modes of transport that negatively impacts the environment, resource consumption, and human health. This study identifies the need for the adoption of development control regulations and policies to maintain the proportion of long-distance trips at a low level as India continues to urbanize. The findings of this study could be useful to project externalities of the transport sector in future years when the urbanization pattern is considered.
... Furthermore, clustering methods incorporating spatial, temporal, and textual information have been widely applied to infer activity types or travel purpose and segment user groups at an aggregated scale [30,31]. Results were generally verified with travel surveys or census data [9,32], and it was concluded that working and commercially related tweets or topics gave a better estimate. For long-term and even larger spatial scale movement patterns, analysis around migration was explored, such as that in [33,34]. ...
Article
Full-text available
Knowledge discovery about people and cities from emerging location data has been an active research field but is still relatively unexplored. In recent years, a considerable amount of work has been developed around the use of social media data, most of which focusses on mining the content, with comparatively less attention given to the location information. Furthermore, what aggregated scale spatial patterns show still needs extensive discussion. This paper proposes a tweet-topic-function-structure framework to reveal spatial patterns from individual tweets at aggregated spatial levels, combining an unsupervised learning algorithm with spatial measures. Two-year geo-tweets collected in Greater London were analyzed as a demonstrator of the framework and as a case study. The results indicate, at a disaggregated level, that the distribution of topics possess a fair degree of spatial randomness related to tweeting behavior. When aggregating tweets by zones, the areas with the same topics form spatial clusters but of entangled urban functions. Furthermore, hierarchical clustering generates a clear spatial structure with orders of centers. Our work demonstrates that although uncertainties exist, geo-tweets should still be a useful resource for informing spatial planning, especially for the strategic planning of economic clusters.
... Pereira et al. conducted a study to recognize user activity and trip pattern by applying the historical datamatching rules approach with the help of the GPS trajectories dataset along with user activity duration, point of interests, socio-demographics, and work hours' travel time data (35). In another study, Zhang et al. performed a GPS-based survey to recognize user activity through applying the sequential model-based clustering method by using visiting frequency, most frequently visited locations, distance between visited locations, and the relation between a location and its surrounding environment (36). In addition to GPS trajectories, the Light Detection and Ranging (LiDAR) dataset was used to predict traffic flow and user activities by analyzing real-time spatialtemporal information (37). ...
Article
Full-text available
Analyzing travel behavior in transportation networks within a city is significant to understand the user’s activity and travel pattern in relation to making improved city plans for the future. Unlike the traditional travel diary survey, GPS data have helped researchers to analyze Big Data with enriched travel information in an automated way. The focus of this research was to identify user activity and travel pattern from GPS data logs. We proposed three different approaches, including Geohash clustering, the GIS-based approach, and Combined Geohash–GIS approach, for automatic user activity and trip recognition in a continuous and aggregate manner. We developed different individual models considering different dwell times for the above three approaches. We considered three different testing scenarios based on specified tolerance levels, including simple, moderate, and critical testing to identify trip only, activity only, and sequential activity–trip analysis. In comparison with other approaches, the Combined Geohash–GIS approach considering 5 min dwell time accurately classified data with about 95% accuracy. The proposed Combined Geohash–GIS approach could significantly enhance the efficiency and accuracy of GPS travel surveys by correctly recognizing user activity and trip patterns. This proposed combined approach could serve as a foundation for a future model system of full-scale travel information identification with GPS data.
... Several researchers have analyzed data with different techniques to extract useful information and support marketing decisions, including Clustering techniques. In recent years the number of articles and publications on Social Media Data Analysis for marketing purposes using Clustering has increased significantly (Bello-Orgaz et al. 2020;Jisun et al. 2017;Ianni et al. 2019;Zhang et al. 2017), so there is a need to study which marketing benefits are driven by these techniques in the analysis of Social Media Big Data. ...
Chapter
The technological revolution and the appearance of Social Media have made it possible to generate large volumes of heterogeneous data called Big Data. Today, Big Data Analytics plays a very important role for businesses in making marketing decisions. Social Media Data represents a large part of Big Data and are characterized by complex and unstructured formats, which makes their analysis a difficult task. The challenge for researchers and decision-makers is to find a path to facilitate the analysis of these huge data in order to extract relevant information and to improve marketing decisions and strategies. In this context, previous research proposed several methods and techniques such as Data Mining, visualization and machine learning. Data Mining techniques are among the most widely used techniques and include Clustering techniques. Clustering provides a wide range of techniques that classify unstructured data and detect useful knowledge from large data sets. In this regard, numerous articles on analyzing Social Media Data using Clustering have been published and there has been a rapid increase in the number of publications in the areas of Social Media Data and marketing, in which several Clustering methods have been proposed. Despite this increase, there is a lack of articles organizing these publications according to Clustering techniques and added value. The aim of this paper is to answer the following questions: What are the techniques for aggregating Social Media Data? What are the marketing decisions generated by Social Media Data Clustering? Thus, it will be useful to present a review and a classification of research articles on Social Media Analysis in the field of marketing using Clustering to provide an overview to researchers and managers looking to use these techniques.
... Several researchers have analyzed data with different techniques to extract useful information and support marketing decisions, including Clustering techniques. In recent years the number of articles and publications on Social Media Data Analysis for marketing purposes using Clustering has increased significantly (Bello-Orgaz et al. 2020;Jisun et al. 2017;Ianni et al. 2019;Zhang et al. 2017), so there is a need to study which marketing benefits are driven by these techniques in the analysis of Social Media Big Data. ...
Book
This book constitutes the refereed proceedings of the 5th International Conference, ICDEc 2020, held in Bucharest, Romania, in June 2020. Due to the COVID-19 pandemic the conference took place virtually. The 13 full papers presented in this volume together with 3 abstracts of keynotes and 1 introductory paper by the steering committee were carefully reviewed and selected from a total of 41 submissions. The core theme of this year’s conference was “Emerging Technologies & Business Innovation”. The papers were organized in four topical sections named: digital transformation, data analytics, digital marketing, and digital business models.
... Overall, the above-mentioned statistical modeling approaches have demonstrated their strength in explaining and predicting the ride-hailing service demand which allows planners to identify significant parameters for informed decision making. Regarding the use of other big data sources, social media data sets have been utilized in a variety of applications to capture individual activity patterns (Gu et al., 2018;Zhang et al., 2017;Hasnat and Hasan, 2018) and detect traffic incidents (Wang et al.,2016;Kuflik et al., 2017). On the other hand, using app-based data (i.e., multisourced data with high variances in both location accuracy and time of travel), He and Shen (2015) and Wang et al. (2019) have proposed conceptual frameworks to estimate the impact of the disruptive mobility services on taxi markets. ...
Article
Full-text available
As app-based ride-hailing services have been widely adopted within existing traditional taxi markets, researchers have been devoted to understand the important factors that influence the demand of the new mobility. Econometric models (EMs) are mainly utilized to interpret the significant factors of the demand, and deep neural networks (DNNs) have been recently used to improve the forecasting performance by capturing complex patterns in the large datasets. However, to mitigate possible (induced) traffic congestion and balance utilization rates for the current taxi drivers, an effective strategy of proactively managing a quota system for both emerging services and regular taxis is still critically needed. This paper aims to systematically design an explainable deep learning model capable of assessing the quota system balancing the demand volumes between two modes. A two-stage interpretable machine learning modeling framework was developed by a linear regression (LR) model, coupled with a neural network layered by long short-term memory (LSTM). The first stage investigates the correlation between the existing taxis and on-demand ride-hailing services while controlling for other explanatory variables. The second stage fulfills the long short-term memory (LSTM) network structure, capturing the residuals from the first estimation stage in order to enhance the forecasting performance. The proposed stepwise modeling approach (LR-LSTM) forecasts the demand of taxi rides, and it is implemented in the application of pick-up demand prediction using New York City (NYC) taxi data. The experiment result indicates that the integrated model can capture the inter-relationships between existing taxis and ride-hailing services as well as identify the influence of additional factors, namely, the day of the week, weather, and holidays. Overall, this modeling approach can be applied to construct an effective active demand management (ADM) for the short-term period as well as a quota control strategy between on-demand ride-hailing services and traditional taxis.
... The role of social media in transport analysis has rapidly grown over the last years, allowing the ability to obtain information regarding trips [64] and activities [65], while highlighting the benefits of using these unstructured data sources [66]. There are a few approaches to understand transport perception using this data source: in our literature search, we found a descriptive analysis of public transport perception from tweets in the city of Chicago [67], the monitoring of malfunctions in public transport in Madrid [68], the analysis of satisfaction with public transport in Santiago [69], and the public transport opinion (in terms of polarity, from negative to positive) in Nanjing [70]. ...
Article
Full-text available
People often base their mobility decisions on subjective aspects of travel experience, such as time perception, space usage, and safety. It is well recognized that different groups within a population will react differently to the same trip, however, current data collection methods might not consider the multi dimensional aspects of travel perception, which could lead to overlooking the needs of large population groups. In this paper, we propose to measure several aspects of the travel experience from the social media platform Twitter, with a focus on differences with respect to gender. We analyzed more than 400,000 tweets from 100,000 users about transportation from Santiago, Chile. Our main findings show that both genders express themselves differently, as women write about their emotions regarding travel (both, positive and negative feelings), that men express themselves using slang, making it difficult to interpret emotion. The strongest difference is related to harassment, not only on transportation, but also on the public space. Since these aspects are usually omitted from travel surveys, our work provides evidence on how Twitter allows the measurement of aspects of the transportation system in a city that have been studied in qualitative terms, complementing surveys with emotional and safety aspects that are as relevant as those traditionally measured.
... An increasingly broad range of data sources can provide insights into travel behaviour. Multiday travel behaviour has been analysed using mobile phone data (Järv et al., 2014;Masso et al., 2019), Bluetooth data (Crawford et al., 2018), smartcard data (Kieu et al., 2015;Kim et al., 2017;Goulet-Langlois et al., 2018) and social media data (Zhang et al., 2017). Passively collected data has a disadvantage for the current research, however, as it does not provide the context required to separate out work travel (Pajević and Shearmur, 2017). ...
Article
Travel needs for commute and business trips are complex and choices are not made based on the characteristics of individual trips, but instead based on the needs over weeks and months. For example, the cost per trip of commuting by bus varies depending upon the frequency of travel, and the cost of a monthly subway pass depends upon the number of zones visited during that period. Intrapersonal variability, namely the variation in an individual's travel behaviour from day to day, therefore shapes our transport choices and should influence service provision. Changes in working patterns such as increases in part time working, self-employment and tele-commuting challenge the traditionally held assumptions that work activities are fixed in time and space, thus making intrapersonal variability increasingly relevant. This research uses a data-driven approach to segment workers based on their work-related travel behaviour, including frequency of travel and both spatial and time of day intrapersonal variability. The analysis uses survey and seven day travel diary data for over 110,000 people collected over a 19 year period in England. Four groups of workers were identified: infrequent, spatially variable, temporally variable and regular travellers. These groups do not align closely with self-reported working arrangements such as self-employment or part time working. The group of regular travellers has decreased in size between 1998 and 2016 but remains the largest group, containing just under 60% of workers in 2016. Both the infrequent and spatially variable groups have grown over the same period. For a small but growing group of workers, a seven day diary is insufficient to understand their work-related transport needs as little or no work travel is recorded. These findings have implications for the design of public transport ticketing, the design of mobility as a service packages and the appraisal of congestion charging schemes.
... Jurdak et al. (2015) also concluded that the three inevitable issues (i.e., potential sampling bias, location bias, and communication modality) in using geo-tagged social media data (e.g., Twitter) did not strongly influence the performance of LBS data in characterizing dynamic population distributions at city scale. Moreover, LBS data from social media have been successfully used in pioneering studies on human mobility (Huang and Li, 2016;Luo et al., 2016), travel behavior (Rashidi et al., 2017;Zhang et al., 2017), environmental exposure (Chen et al., 2018b;Zheng et al., 2019), land-use classification (Chen et al., 2017b;Liu et al., 2017), and urban planning Zhang and Zhou, 2018). These studies illustrated that LBS data from social media could be a useful proxy for describing dynamic population distribution. ...
... The rapid development of information and communication technology (ICT) has the potential to address some of the shortcomings mentioned above and broaden the types of questions that can be explored in travel behaviour studies [8]. Emerging data sources, such as records from Global Positioning System (GPS) devices, smart cards, mobile phones, and other online systems, have deepened the understanding of human mobility [9,10]. ...
Article
Full-text available
This paper examines the population heterogeneity of travel behaviours from a combined perspective of individual actors and collective behaviours. We use a social media dataset of 652,945 geotagged tweets generated by 2,933 Swedish Twitter users covering an average time span of 3.6 years. No explicit geographical boundaries, such as national borders or administrative boundaries, are applied to the data. We use spatial features, such as geographical characteristics and network properties, and apply a clustering technique to reveal the heterogeneity of geotagged activity patterns. We find four distinct groups of travellers: local explorers (78.0%), local returners (14.4%), global explorers (7.3%), and global returners (0.3%). These groups exhibit distinct mobility characteristics, such as trip distance, diffusion process, percentage of domestic trips, visiting frequency of the most-visited locations, and total number of geotagged locations. Geotagged social media data are gradually being incorporated into travel behaviour studies as user-contributed data sources. While such data have many advantages, including easy access and the flexibility to capture movements across multiple scales (individual, city, country, and globe), more attention is still needed on data validation and identifying potential biases associated with these data. We validate against the data from a household travel survey and find that despite good agreement of trip distances (one-day and long-distance trips), we also find some differences in home location and the frequency of international trips, possibly due to population bias and behaviour distortion in Twitter data. Future work includes identifying and removing additional biases so that results from geotagged activity patterns may be generalised to human mobility patterns. This study explores the heterogeneity of behavioural groups and their spatial mobility including travel and day-today displacement. The findings of this paper could be relevant for disease prediction, transport modelling, and the broader social sciences.
... Rashidi et al. [37] confirmed the applicability of social media data for modeling daily travel behavior, based on the results of a qualitative survey. Zhang et al. [38] proposed a sequential model-based clustering method to group high-resolution Twitter locations and extract Twitter displacements, thereby showing the application of social media data in predicting the travel behavior of individuals. Furthermore, they suggested that social media data is less expensive in comparison with conventional household travel surveys as it is easier to obtain and, most importantly, it can monitor the longitudinal travel behavior features of an individual over a longer observation period. ...
Article
Full-text available
Intercity transport systems have been plagued by low efficiency and overutilization for a long time, due to unhealthy competition among multi-transport modes. Hence, this study aims to estimate the dominant trip distance of intercity passenger transport modes to optimize the allocation of intercity passenger transport resources and improve the efficiency of intercity transport systems. Dominant trip distance was classified into two types: Absolute dominant trip distance and relative dominant trip distance; and their respective models were developed using passenger transport mode share functions and fitting curves. Particularly, the big data of intercity passenger transport mode share rate of more than 360 cities in China was obtained using a network crawler and each passenger transport mode share function and their curves were proposed. Furthermore, the dominant trip distances estimation models of intercity passenger transport were developed and solved. The results show that there are significant differences in dominant trip distance between the transport modes. For example, the absolute and relative dominant trip distances of highway are 8–119 km and 8–463 km, respectively, while those of airway are 1594–3000 km and 2477–3000 km, respectively.
Chapter
Before the age of modern influencer marketing, most business organizations utilized traditional marketing where the marketing was made through the advertisement that came out from pop-up ads on websites, on the TV, and also the radio. However, this form of advertisement is no longer effective due to many potential customers simply ignoring the ad and clicking out of it or most of the time choosing to skip it. This kind of phenomenon is somehow blocking the growth of the advertisement. It is reported that one out of five people worldwide will set the ad blocking from their phones and this amount increases tremendously in the past year alone. People tend to dislike this kind of ads due to many reasons such as annoying, interruption to their focus, disturbance and many more. Thus, there is the demand for the new type of marketing universally. With more than billion active users, Instagram, one of the social media platforms, is the most popular social network in the world. Many users followed the trends to fully maximize their strengths and skills in becoming the internet celebrities which in turn found their way to make profit from their fame. This group is called an influencer. Influencer Marketing Hub is known as the people with the power of affecting the buying habits of customers through social media platforms and this kind of group was defined as influencer marketing as a brand working activities either to market the products or services through online as well as improving the brand reputation for the respective products. Influencers can be anyone in the world that has more creative freedom than celebrities to attract, introduce and influence the followers who are keen to see the platform in their free time to buy or view the products. These kinds of platforms like Instagram and Tiktok have their potential as informal and transparent ways of marketing. Meanwhile, data analytics has become an exclusively important tool for decision making especially in the 21th century. This includes businesses, organizations as well as individuals. With the good exploitation of advanced technology and increasing data availability has become authoritative for organizations to make data-driven decisions. The mass amounts of data generated in the modern era required versatile techniques for their analysis which where the data analytics play its role worldwide. Data analytics is the process to examine and transform raw data into meaningful and valuable information that can help with decision making. Businesses and organizations will rely on data analytics to understand their customers, competitors, and market trends and make decisions for the growth and sustainable operations. Nowadays, artificial intelligence (AI) was reported to become a part of a commercial entity for the entire world in the long term in future. The new trends in AI-driven automation can reflect the change in the AI landscape. This includes reconfigured ideas, interests and advertisements in the field of AI adoption by enterprise. This technology is very versatile as it can recognize the faces and objects which have various business applications. For security objectives, facial recognition can distinguish individuals while for the objects, it can be detected by image analysis. Thus, this kind of service can meet more customers’ preferences and facial recognition will allow the business owner to diagnose their customers’ moods which in turn can help with product recommendations and satisfaction. Therefore, this chapter aims to discuss in detail the actual role of data analytics and AI in marketing. The applications of each issue are also concerning. This includes ethics, privacy, security, and many more are discussed. Future recommendations for these platforms are also analyzed.
Article
The analysis of social media data to extract new insights has attracted much attention, especially in the field of Marketing. Few researchers, however, have studied both the concepts of Social Media Data Analytics (SMDA) and Marketing Strategies. Previous publications have only focused on a particular technique or a well-defined Marketing Strategy in a specific context. To address this gap, this paper aims to explore how Social Media Data Analytics can guide and affect Marketing Strategies, and provide an overview of the range of Social Media Data Analytics techniques related to Marketing Strategies. We conducted a systematic review of 120 papers published between 2015 and 2021 on SMDA in Marketing. The findings are presented in terms of the main social media platforms, publication date, journal quality, social media data types, analytical techniques, fields of application, firm size, and related Marketing Strategies. The SMDA techniques are classified into six categories: Sentiment Analysis, Artificial Intelligence, Data Mining, Statistics, Coding and Modelling, and Simulation. A set of detailed Marketing Strategies guided by SMDA are also presented, as well as an integrative framework mapping how SMDA creates value. The results highlight several SMDA techniques that still lack exploration and outline their relevance.
Article
Governments and healthcare organizations increasingly pay attention to social media for handling a disease outbreak. The institutions and organizations need information support to gain insights into the situation and act accordingly. Currently, they primarily rely on ground‐level data, collecting which is a long and cumbersome process. Social media data present immense opportunities to use ground data quickly and effectively. Governments and HOs can use these data in launching rapid and speedy remedial actions. Social media data contain rich content in the form of people's reactions, calls‐for‐help, and feedback. However, in healthcare operations, the research on social media for providing information support is limited. Our study attempts to fill the gap mentioned above by investigating the relationship between the activity on social media and the quantum of the outbreak and further using content analytics to construct a model for segregating tweets. We use the case example of the COVID‐19 outbreak. The pandemic has advantages in contributing to the generalizability of results and facilitating the model's validation through data from multiple waves. The findings show that social media activity reflects the outbreak situation on the ground. In particular, we find that negative tweets posted by people during a crisis outbreak concur with the quantum of a disease outbreak. Further, we find a positive association between this relationship and increased information sharing through retweets. Building further on this insight, we propose a model using advanced analytical methods to reduce a large amount of unstructured data into four key categories—irrelevant posts, emotional outbursts, distress alarm, and relief measures. The supply‐side stakeholders (such as policy makers and humanitarian organizations) could use this information on time and optimize resources and relief packages in the right direction proactively.
Article
Nowadays, a large percentage of people use smartphones frequently. The mobile phone signaling data contains various attributes that can be used to infer when and where the user is. Compared with other big data sources (e.g., social media and GPS data) for the human movement, mobile phone signaling data demonstrate the advantages of a high coverage of population, strong temporal continuity, and low cost of collection. Taking advantage of such mobile phone signaling data, this work aims to identify tourists and locals from a large volume of mobile phone signaling data in a tourism city and analyze their spatiotemporal patterns to better promote tourism service and alleviate possible disturbance to local residents. In this paper, we present a framework to differentiate these two types of people by the following procedure: first, the hidden behavior characteristics of users are extracted from mobile phone signaling data; and then, the K-means clustering method is adopted to identify tourists and locals. With the identification of both tourists and local residents, we analyze the distribution and interaction characteristics of tourists and locals in an urban area. An experimental study is conducted in a famous tourism city, Xiamen, China. The results indicate that the proposed method can successfully identify the most popular scenic spots and major transportation corridors for tourists. The feature extraction, identification, and spatiotemporal analysis presented in this paper are of great significance for analyzing the urban tourism demand, managing the urban space, and mining the tourist behavior.
Article
Full-text available
Pattern clustering is an effective method for exploring the regularities of human mobility scheduling and daily activities. There still remains the challenge of measuring the similarity between pairs of activity patterns that are in the form of categorical time series sequences. Existing studies measured similarity using binary vector or edit distance, but these methods were insufficient to characterize routine arrangement and time scheduling of daily activities. To address this issue, we cluster daily activities and identify regular patterns using a Markov-chain-based mixture model, which captures features of activity scheduling by Markov transition matrix as well as measures similarity with probability distribution. Logistic regression models are further built to test hypothetical relationships between activity patterns and socio-demographic characteristics. Results show there are three main human activity patterns in terms of daily routine arrangement and activity scheduling: working-education-oriented (WE-oriented), recreation-shopping-oriented (RS-oriented), and schooling-drop-off/pick-up-oriented (SDP-oriented). People in the WE-oriented pattern mainly engage with regular home-based commuting trips, while people in the RS-oriented pattern are involved in home-based shopping and entertainment events. With regard to the SDP-oriented pattern, people plan their trips under a restricted scheduling of schooling pickup/drop-off. Each pattern clearly indicates long-term regularity of daily activity behaviors and corresponds to specific socio-demographics. Distinguishing three categories of residents with distinct life styles, this research would help accommodate travel demand from different groups of people in urban transportation planning.
Article
Full-text available
The effectiveness of traditional traffic prediction methods, such as autoregressive or spatio-temporal models, is often extremely limited when forecasting traffic dynamics in early morning. The reason is that traffic can break down drastically during the early morning commute, and the time and duration of this break-down vary substantially from day to day. Early morning traffic forecast is crucial to inform morning-commute traffic management, but they are generally challenging to predict in advance, particularly by midnight (called ‘next-day morning traffic prediction’ thereafter). In this paper, we propose to mine Twitter messages as a probing method to understand the impacts of people’s work and rest patterns in the evening/midnight of the previous day to the next-day morning traffic. The model is tested on freeway networks in Pittsburgh as experiments. The resulting relationship is surprisingly simple and powerful. We find that, in general, the earlier people rest as indicated from Tweets, the more congested roads will be in the next morning. The occurrence of big events in the evening before, represented by higher or lower tweet sentiment than normal, often implies lower travel demand in the next morning than normal days. Besides, people’s tweeting activities in the night before and early morning (by 5am) are statistically associated with congestion in morning peak hours. We make use of such relationships to build a predictive framework which forecasts morning commute congestion using people’s tweeting profiles extracted by 5am. In most cases, the tweet information collected by the midnight before is sufficient to make good prediction for next-day morning traffic. The Pittsburgh study supports that this framework can precisely predict morning congestion, particularly for some road segments upstream of roadway bottlenecks with large day-to-day congestion variation, while its prediction performance being no worse than baseline methods on other roads. Through experiments, we demonstrate our approach considerably outperforms those existing methods without Twitter message features, and it can learn meaningful representation of demand from tweeting profiles that offer managerial insights. The proposed social media empowered framework can be a promising tool for real-time traffic management and potentially extended for traffic prediction at other times of day.
Article
Outlier detection is an important branch of data mining. This paper proposes an advanced fast density peak outlier detection algorithm based on the characteristics of big data. The algorithm is an outlier detection method based on the improved density peak clustering algorithm. This paper improves the original algorithm. From the perspective of outlier detection, although it is a clustering idea, it avoids the clustering process, reduces the time complexity of the cluster-based outlier detection algorithm, and absorbs. The outlier detection based on neighbors is not sensitive to data dimensions and other advantages. In the power industry, outlier detection can be used in areas such as grid fault detection, equipment fault detection, and power abnormality detection. The simulation experiment of outlier detection based on the daily load curve of single and multiple transformers in a certain province shows that the improved algorithm can effectively detect outliers in the data.
Chapter
Continuous growth of information and the increasing volume of data with high coverage in space and time open up new possibilities in the field of transport planning. In recent years, there is great research interest in how big data can be applied to the modelling and planning of transport systems. A literature survey of existing methodologies and applications of big dada in transportation is a useful tool for identifying strengths and capabilities for big data exploitation in different fields of application. The main objective of this paper is to provide a comprehensive overview of big data usage in transport planning, focusing on travel demand modelling. More specifically, the paper aims to examine whether analyzing big data can facilitate transport planning and to summarize the relative scientific discussion. Three big data sources have been examined: smart cards, mobile phones and social networks. Existing theories and studies are presented and classified according to the source of data, the methodology, the extracted transportation features and the validation of results. In the course of the review, the different big data sources are further analyzed regarding their special applications in the transport planning field. Finally, the paper concludes by presenting the barriers and gaps in the existing approaches as well as new research challenges.
Preprint
Full-text available
The effectiveness of traditional traffic prediction methods, such as autoregressive or spatio-temporal models, is often extremely limited when forecasting traffic dynamics in early morning. The reason is that traffic can break down drastically during the early morning commute, and the time and duration of this break-down vary substantially from day to day. Early morning traffic forecast is crucial to inform morning-commute traffic management, but they are generally challenging to predict in advance, particularly by midnight (called `next-day morning traffic prediction' thereafter). In this paper, we propose to mine Twitter messages as a probing method to understand the impacts of people's work and rest patterns in the evening/midnight of the previous day to the next-day morning traffic. The model is tested on freeway networks in Pittsburgh as experiments. The resulting relationship is surprisingly simple and powerful. We find that, in general, the earlier people rest as indicated from Tweets, the more congested roads will be in the next morning. The occurrence of big events in the evening before, represented by higher or lower tweet sentiment than normal, often implies lower travel demand in the next morning than normal days. Besides, people's tweeting activities in the night before and early morning (by 5 am) are statistically associated with congestion in morning peak hours. We make use of such relationships to build a predictive framework which forecasts morning commute congestion using people's tweeting profiles extracted by 5 am. In most cases, the tweet information collected by the midnight before is sufficient to make good prediction for next-day morning traffic. The Pittsburgh study supports that this framework can precisely predict morning congestion, particularly for some road segments upstream of roadway bottlenecks with large day-to-day congestion variation, while its prediction performance being no worse than baseline methods on other roads. Through experiments, we demonstrate our approach considerably outperforms those existing methods without Twitter message features, and it can learn meaningful representation of demand from tweeting profiles that offer managerial insights. The proposed social media empowered framework can be a promising tool for real-time traffic management and potentially extended for traffic prediction at other times of day.
Article
The state-of-the-art traffic sign recognition (TSR) algorithms are designed to recognize the textual information of a traffic sign at over 95% accuracy. Even though, they are still not ready for complex roadworks near ramps. In real-world applications, when the vehicles are running on the freeway, they may misdetect the traffic signs for the ramp, which will become inaccurate feedback to the autonomous driving applications and result in unexpected speed reduction. The misdetection problems have drawn minimal attention in recent TSR studies. In this paper, it is proposed that the existing TSR studies should transform from the point-based sign recognition to path-based sign learning. In the proposed pipeline, the confidence of the TSR observations from normal vehicles can be increased by clustering and location adjustment. A supervised learning model is employed to classify the clustered learned signs and complement their path information. Test drives are conducted in 12 European countries to calibrate the models and validate the path information of the learned sign. After model implementation, the path accuracy over 1,000 learned signs can be increased from 75.04% to 89.80%. This study proves the necessity of the path-based TSR studies near freeway ramps and the proposed pipeline demonstrates a good utility and broad applicability for sensor-based autonomous vehicle applications.
Article
A rapidly aging population in the United States has brought significant challenges to transportation planners and service operators. One of the most important challenges is to figure out aging Americans' travel needs in a timely manner, in order to promptly provide them with better transportation access as they age. The present research investigates how and where to collect such information. In particular, it constructs a comprehensive survey questionnaire specifically targeting aging Americans' transportation options and implements the survey using an innovative crowdsourcing platform – Amazon Mechanical Turk (MTurk). This study has dual aims. First, it shows the design of an all-inclusive survey that gathers meaningful insights on senior transportation options from different categories of respondents, including 1) older adults, 2) young, caregiving adults, and 3) young, non-caregiving adults. It is important to include all of these respondents, because they represent either current or future customers of senior transportation services. Second, the study demonstrates the resourceful survey implementation tool – MTurk, which provides a valuable platform to investigate timely issues such as aging Americans' travel needs. MTurk platform is able to capture all three categories of respondents; and survey respondents share a similar distribution pattern like those seen in generally accepted, representative surveys (e.g., Census and National Household Travel Survey).
Article
Full-text available
Social media platforms are seeing increasing adoption by public transport agencies, as they provide a cost-effective, reliable, and timely mechanism for sharing information with passengers and other travellers. In this paper, we use a case study of the @GamesTravel2014 Twitter account to evaluate how this social media platform was used over the course of the 2014 Commonwealth Games in Glasgow, Scotland to provide and share transport-related information and respond to information requests. The case study provides an exemplar for the public co-ordination of information from multiple partners in a complex environment during a time of transport disruption. We evaluate both the structure and intent of the @GamesTravel2014 social media strategy via interviews with involved parties and an analysis of Tweets related to the account. Findings indicate the potential for future applications of social media by transport operators and authorities in producing a more effective network of communication with passengers.
Article
Full-text available
Background: Although a number of environmental and policy interventions to promote physical activity are being widely used, there is sparse systematic information on the most effective approaches to guide population-wide interventions. Methods: We reviewed studies that addressed the following environmental and policy strategies to promote physical activity: community-scale urban design and land use policies and practices to increase physical activity; street-scale urban design and land use policies to increase physical activity; and transportation and travel policies and practices. These systematic reviews were based on the methods of the independent Task Force on Community Preventive Services. Exposure variables were classified according to the types of infrastructures/policies present in each study. Measures of physical activity behavior were used to assess effectiveness. Results: Two interventions were effective in promoting physical activity (community-scale and street-scale urban design and land use policies and practices). Additional information about applicability, other effects, and barriers to implementation are provided for these interventions. Evidence is insufficient to assess transportation policy and practices to promote physical activity. Conclusions: Because community- and street-scale urban design and land-use policies and practices met the Community Guide criteria for being effective physical activity interventions, implementing these policies and practices at the community-level should be a priority of public health practitioners and community decision makers.
Article
Full-text available
In this paper, we demonstrate the use of an inexpensive and easy-to-collect long-term dataset to address the problems caused by basing activity space studies off short-term data. In total, we use 63,114 geo-tagged tweets from 116 unique users to create individuals’ activity spaces based on minimum bounding geometry (convex hull). By using polygon density maps of activity space, we found clear differences between weekday and weekend activity spaces, and were able to observe the growth trajectory of activity space over 17 weeks. In order to reflect the heterogeneous nature of spatial behavior and tweeting habits, we used Latent Class Analysis twice. First, to identify five unique patterns of location-based activity spaces that are different in shape and anchoring. Second, we identify three unique growth trajectories. The comparison among these latent growth trajectories shows that in order to capture the extent of activity spaces we need long time periods for some individuals and shorter periods of observation for others. We also show that past studies using a single digit number of weeks may not be sufficient to capture individuals’ activity space. The major activity locations identified using a multilevel latent class model, do not appear to be statistically related to the growth patterns of Twitter users activity spaces. The evidence here shows Twitter data can be a valuable complementary source of information for heterogeneity analysis in activity-based modeling and simulation.
Article
Full-text available
The last decade has witnessed very active development in two broad, but separate fields, both involving understanding and modeling of how individuals move in time and space (hereafter called “travel behavior analysis” or “human mobility analysis”). One field comprises transportation researchers who have been working in the field for decades and the other involves new comers from a wide range of disciplines, but primarily computer scientists and physicists. Researchers in these two fields work with different datasets, apply different methodologies, and answer different but overlapping questions. It is our view that there is much, hidden synergy between the two fields that needs to be brought out. It is thus the purpose of this paper to introduce datasets, concepts, knowledge and methods used in these two fields, and most importantly raise cross-discipline ideas for conversations and collaborations between the two. It is our hope that this paper will stimulate many future cross-cutting studies that involve researchers from both fields.
Article
Full-text available
Predicting human mobility flows at different spatial scales is challenged by the heterogeneity of individual trajectories and the multi-scale nature of transportation networks. As vast amounts of digital traces of human behaviour become available, an opportunity arises to improve mobility models by integrating into them proxy data on mobility collected by a variety of digital platforms and location-aware services. Here we propose a hybrid model of human mobility that integrates a large-scale publicly available dataset from a popular photo-sharing system with the classical gravity model, under a stacked regression procedure. We validate the performance and generalizability of our approach using two ground-truth datasets on air travel and daily commuting in the United States: using two different cross-validation schemes we show that the hybrid model affords enhanced mobility prediction at both spatial scales.
Article
Full-text available
Recently, there has been increased interest in quantifying and modeling the impact of inclement weather on transportation system performance. One problem that the majority of research studies on the topic have faced was the great dependence on weather data merely from atmospheric weather stations, which lack information about road surface condition. The emergence of social media platforms, such as Twitter and Facebook, provides a new opportunity to extract more weather-related data from such platforms. This study had two primary objectives: (a) examine whether real-world weather events can be inferred from social media data and (b) determine whether including weather variables extracted from social media data can improve the predictive accuracy of models developed to quantify the impact of inclement weather on freeway traffic speed. To achieve those objectives, weather data, Twitter data, and traffic information were compiled for the Buffalo–Niagara, New York, metropolitan area as a case study. A method called the Twitter Weather Events Observation was then applied to the Twitter data, and the sensitivity and false alarm rate for the method was evaluated against real-world weather data. Then, linear regression models for predicting the impact of inclement weather on freeway speed were developed with and without the Twitter-based weather variables incorporated. The results indicated that Twitter data have a relatively high sensitivity for predicting inclement weather (i.e., snow), especially during the daytime and for areas with significant snowfall. The results also showed that the incorporation of Twitter-based weather variables could help improve the predictive accuracy of the models.
Article
Full-text available
We analyse a large dataset with more than six million get-tagged tweets posted in Australia, and demonstrate that Twitter can be a reliable source for studying human mobility patterns. We find that crucial information of human mobility, such as its multi-scale and multi-modal nature, returning tendency and regularity, as well as the heterogeneous moving scale among individuals, can be extracted from geo-tagged tweets using various statistical indicators. Our analysis of the spatial-temporal patterns for people with different moving scales shows that long-distance travellers have highly concentrated urban movements. Our study not only deepens overall understanding of human mobility but also opens new avenues for tracking human mobility.
Conference Paper
Full-text available
Affordances from the urban space shape the way we interact with our environment, whether manifested as driving into the city centre for work or playing sports in designated arenas. Given today's abundance of crowd-generated digital traces on location-based social network (LBSN) platforms, an opportunity arises to grasp deeper semantic characterization of urban affordances beyond static representations found in traditional GIS systems. Complementing this perception of the city, travel surveys capture mobility dynamics of people with absolute trajectory recordings and explicit travel purposes. By marrying rich LBSN data with travel surveys, we ask if crowdsourced urban characteristics can be used to explain user behaviour when interacting with the city. Concretely, our objective is to model and infer the purpose of travel, or the activity at the destination of a trip, in daily life scenarios. To this end, we generate features to correspond to time, location, and demographics in order to construct a fused understanding of people's travel purposes. Using LBSN data to augment a travel survey of 87,600 trips by 10,372 people, we show that fusion of extracted features can achieve an interpersonal prediction accuracy of >75% for 9 broad classes of travel purposes covering typical aspects of life. This represents an increase of nearly 20% compared to without LBSN augmentation.
Article
Full-text available
The pervasive use of new mobile devices has allowed a better characterization in space and time of human concentrations and mobility in general. Besides its theoretical interest, describing mobility is of great importance for a number of practical applications ranging from the forecast of disease spreading to the design of new spaces in urban environments. While classical data sources, such as surveys or census, have a limited level of geographical resolution (e.g., districts, municipalities, counties are typically used) or are restricted to generic workdays or weekends, the data coming from mobile devices can be precisely located both in time and space. Most previous works have used a single data source to study human mobility patterns. Here we perform instead a cross-check analysis by comparing results obtained with data collected from three different sources: Twitter, census and cell phones. The analysis is focused on the urban areas of Barcelona and Madrid, for which data of the three types is available. We assess the correlation between the datasets on different aspects: the spatial distribution of people concentration, the temporal evolution of people density and the mobility patterns of individuals. Our results show that the three data sources are providing comparable information. Even though the representativeness of Twitter geolocated data is lower than that of mobile phone and census data, the correlations between the population density profiles and mobility patterns detected by the three datasets are close to one in a grid with cells of 2x2 and 1x1 square kilometers. This level of correlation supports the feasibility of interchanging the three data sources at the spatio-temporal scales considered.
Article
Full-text available
Cluster analysis is reformulated as a problem of estimating the para- meters of a mixture of multivariate distributions. The maximum-likelihood theory and numerical solution techniques are developed for a fairly general class of distributions. The theory is applied to mixtures of multivariate nor- mals (NORMIX) and mixtures of multivariate Bernoulli distributions (Latent Classes). The feasibility of the procedures is demonstrated by two examples of computer solutions for normal mixture models of the Fisher Iris data and of artifjcially generated clusters with unequal covariance matrices.
Article
Full-text available
Human mobility is differentiated by time scales. While the mechanism for long time scales has been studied, the underlying mechanism on the daily scale is still unrevealed. Here, we uncover the mechanism responsible for the daily mobility patterns by analysing the temporal and spatial trajectories of thousands of persons as individual networks. Using the concept of motifs from network theory, we find only 17 unique networks are present in daily mobility and they follow simple rules. These networks, called here motifs, are sufficient to capture up to 90 per cent of the population in surveys and mobile phone datasets for different countries. Each individual exhibits a characteristic motif, which seems to be stable over several months. Consequently, daily human mobility can be reproduced by an analytically tractable framework for Markov chains by modelling periods of high-frequency trips followed by periods of lower activity as the key ingredient.
Article
Full-text available
We report that human walk patterns contain statistically similar features observed in Levy walks. These features include heavy-tail flight and pause-time distributions and the super-diffusive nature of mobility. Human walks are not random walks, but it is surprising that the patterns of human walks and Levy walks contain some statistical similarity. Our study is based on 226 daily GPS traces collected from 101 volunteers in five different outdoor sites. The heavy-tail flight distribution of human mobility induces the super-diffusivity of travel, but up to 30 min to 1 h due to the boundary effect of people's daily movement, which is caused by the tendency of people to move within a predefined (also confined) area of daily activities. These tendencies are not captured in common mobility models such as random way point (RWP). To evaluate the impact of these tendencies on the performance of mobile networks, we construct a simple truncated Levy walk mobility (TLW) model that emulates the statistical features observed in our analysis and under which we measure the performance of routing protocols in delay-tolerant networks (DTNs) and mobile ad hoc networks (MANETs). The results indicate the following. Higher diffusivity induces shorter intercontact times in DTN and shorter path durations with higher success probability in MANET. The diffusivity of TLW is in between those of RWP and Brownian motion (BM). Therefore, the routing performance under RWP as commonly used in mobile network studies and tends to be overestimated for DTNs and underestimated for MANETs compared to the performance under TLW.
Article
Full-text available
Join the Club An important question for policy-makers is how to communicate information (for example, about public health interventions) and promote behavior change most effectively across a population. The structure of a social network can dramatically affect the diffusion of behavior through a population. Centola (p. 1194 ) examined whether the number of individuals choosing to register for a health forum could be influenced by an artificially constructed network of neighbors that were signed up for the forum. The behavior spread more readily on clustered networks than on random, poorly clustered ones. Certain types of behavior within human systems are thus more likely to spread if people are exposed to many other people who have already adopted the behavior (for example, in the circumstances where your friends know each other, as well as yourself).
Article
Full-text available
A range of applications, from predicting the spread of human and electronic viruses to city planning and resource management in mobile communications, depend on our ability to foresee the whereabouts and mobility of individuals, raising a fundamental question: To what degree is human behavior predictable? Here we explore the limits of predictability in human dynamics by studying the mobility patterns of anonymized mobile phone users. By measuring the entropy of each individual’s trajectory, we find a 93% potential predictability in user mobility across the whole user base. Despite the significant differences in the travel patterns, we find a remarkable lack of variability in predictability, which is largely independent of the distance users cover on a regular basis.
Article
Full-text available
Among the realistic ingredients to be considered in the computational modeling of infectious diseases, human mobility represents a crucial challenge both on the theoretical side and in view of the limited availability of empirical data. To study the interplay between short-scale commuting flows and long-range airline traffic in shaping the spatiotemporal pattern of a global epidemic we (i) analyze mobility data from 29 countries around the world and find a gravity model able to provide a global description of commuting patterns up to 300 kms and (ii) integrate in a worldwide-structured metapopulation epidemic model a timescale-separation technique for evaluating the force of infection due to multiscale mobility processes in the disease dynamics. Commuting flows are found, on average, to be one order of magnitude larger than airline flows. However, their introduction into the worldwide model shows that the large-scale pattern of the simulated epidemic exhibits only small variations with respect to the baseline case where only airline traffic is considered. The presence of short-range mobility increases, however, the synchronization of subpopulations in close proximity and affects the epidemic behavior at the periphery of the airline transportation infrastructure. The present approach outlines the possibility for the definition of layered computational approaches where different modeling assumptions and granularities can be used consistently in a unifying multiscale framework.
Article
Full-text available
Dispersal has long been recognized as a crucial factor affecting population dynamics. Several studies on long-distance dispersal revealed a peculiarity now widely known as a problem of "fat tail": instead of the rate of decay in the population density over large distances being described by a normal distribution, which is apparently predicted by the standard diffusion approach, field data often show much lower rates such as exponential or power law. The question as to what are the processes and mechanisms resulting in the fat tail is still largely open. In this note, by introducing the concept of a statistically structured population, we show that a fat-tailed long-distance dispersal is a consequence of the fundamental observation that individuals of the same species are not identical. Fat-tailed dispersal thus appears to be an inherent property of any real population. We show that our theoretical predictions are in good agreement with available data.
Article
Full-text available
The dynamic spatial redistribution of individuals is a key driving force of various spatiotemporal phenomena on geographical scales. It can synchronize populations of interacting species, stabilize them, and diversify gene pools. Human travel, for example, is responsible for the geographical spread of human infectious disease. In the light of increasing international trade, intensified human mobility and the imminent threat of an influenza A epidemic, the knowledge of dynamical and statistical properties of human travel is of fundamental importance. Despite its crucial role, a quantitative assessment of these properties on geographical scales remains elusive, and the assumption that humans disperse diffusively still prevails in models. Here we report on a solid and quantitative assessment of human travelling statistics by analysing the circulation of bank notes in the United States. Using a comprehensive data set of over a million individual displacements, we find that dispersal is anomalous in two ways. First, the distribution of travelling distances decays as a power law, indicating that trajectories of bank notes are reminiscent of scale-free random walks known as Lévy flights. Second, the probability of remaining in a small, spatially confined region for a time T is dominated by algebraically long tails that attenuate the superdiffusive spread. We show that human travelling behaviour can be described mathematically on many spatiotemporal scales by a two-parameter continuous-time random walk model to a surprising accuracy, and conclude that human travel on geographical scales is an ambivalent and effectively superdiffusive process.
Article
Full-text available
Despite its popularity for general clustering, K-means suffers three major shortcomings; it scales poorly computationally, the number of clusters K has to be supplied by the user, and the search is prone to local minima. We propose solutions for the first two problems, and a partial remedy for the third. Building on prior work for algorithmic acceleration that is not based on approximation, we introduce a new algorithm that efficiently, searches the space of cluster locations and number of clusters to optimize the Bayesian Information Criterion (BIC) or the Akaike Information Criterion (AIC) measure. The innovations include two new ways of exploiting cached sufficient statistics and a new very efficient test that in one K-means sweep selects the most promising subset of classes for refinement. This gives rise to a fast, statistically founded algorithm that outputs both the number of classes and their parameters. Experiments show this technique reveals the true number of classes in the underlying distribution, and that it is much faster than repeatedly using accelerated K-means for different values of K.
Article
Understanding how destination choice and business clusters are connected is of great importance for designing sustainable cities, fostering flourishing business clusters, and building livable communities. As sharing locations and activities on social media platforms becomes increasingly popular, such data can reveal destination choice and activity space which can shed light on human-environment relationships. To this end, this research models the relationship between characteristics of business clusters and check-in activities from Los Angeles County, California. Business clusters are analyzed via two lenses: the supply side (employment data by industry) and the demand side (on-line check-in data). Spatial and statistical analyses are performed to understand how land use and transportation network features affect the popularity of the identified clusters and their relationships. Our results suggest that a cluster with more employment opportunities and more types of employment is associated with more check-ins. A business cluster that has access to parks or recreational services is also more popular. A business cluster with a longer road network and better connectivity of roads is associated with more check-ins. The visualization of the common visitors between clusters reveals that there are a few clusters with outstanding strong ties, while most have modest ties with each other. Our findings have implications on the influence of urban design on the popularity of business clusters.
Article
Harnessing the potential of new generation transport data and increasing public participation are high on the agenda for transport stakeholders and the broader community. The initial phase in the program of research reported here proposed a framework for mining transport-related information from social media, demonstrated and evaluated it using transport-related tweets associated with three football matches as case studies. The goal of this paper is to extend and complement the previous published studies. It reports an extended analysis of the research results, highlighting and elaborating the challenges that need to be addressed before a large-scale application of the framework can take place. The focus is specifically on the automatic harvesting of relevant, valuable information from Twitter. The results from automatically mining transport related messages in two scenarios are presented i.e. with a small-scale labelled dataset and with a large-scale dataset of 3.7 m tweets. Tweets authored by individuals that mention a need for transport, express an opinion about transport services or report an event, with respect to different transport modes, were mined. The challenges faced in automatically analysing Twitter messages, written in Twitter’s specific language, are illustrated. The results presented show a strong degree of success in the identification of transport related tweets, with similar success in identifying tweets that expressed an opinion about transport services. The identification of tweets that expressed a need for transport services or reported an event was more challenging, a finding mirrored during the human based message annotation process. Overall, the results demonstrate the potential of automatic extraction of valuable information from tweets while pointing to areas where challenges were encountered and additional research is needed. The impact of a successful solution to these challenges (thereby creating efficient harvesting systems) would be to enable travellers to participate more effectively in the improvement of transport services.
Article
This article investigates the type and quality of changes in the mobility behaviour caused by the persistent economic and social shock in Greece manifested in 2010 onwards with regard to household income. A trip survey was conducted in 2014 to explore the impacts of the economic crisis on the trip characteristics between the city centre and the greater area of Thessaloniki, the second largest city of Greece. The sample consisted of 853 randomly selected users of the city centre and is representative of the sex and age distribution of the overall population of the urban agglomeration. Aiming to minimise their expenses, the individuals have reduced the trip frequency by private car, notably for optional trip purposes like shopping and entertainment, or they have shifted to public transport, motorbike, walking and cycling for downtown trips. In some cases, this reduction in expenses led to household relocation. These changes were more evident in the lowest income groups. In general, the effects of the economic crisis are proving more effective in limiting car use compared to any sustainable mobility measure that has been implemented in the past. However, households, despite their income, appeared mostly uncertain to preserve any sustainable mobility behaviour. In fact, their decision seems to depend on the future economic conditions.
Article
In the past few years, the social science literature has shown significance attention to extracting information from social media to track and analyse human movements. In this paper the transportation aspect of social media is investigated and reviewed. A detailed discussion is provided about how social media data from different sources can be used to indirectly and with minimal cost extract travel attributes such as trip purpose, mode of transport, activity duration and destination choice, as well as land use variables such as home, job and school location and socio-demographic attributes including gender, age and income. The evolution of the field of transport and travel behaviour around applications of social media over the last few years is studied. Further, this paper presents results of a qualitative survey from travel demand modelling experts around the world on applicability of social media data for modelling daily travel behaviour. The result of the survey reveals positive view of the experts about usefulness of such data sources.
Article
Social media receive increasing attention as a crowdsourced information resource in traffic operations and management. Tweets, which are blogged and shared by great masses of people, may be associated with some major social activities. In this study these tweets are called “Twitter concentrations.” The public activities behind Twitter concentrations potentially pose more pressure on the traffic network and cause traffic surges within a specified time and location. However, it is still unknown how closely the Twitter concentrations and traffic surges are correlated. This study fuses a set of tweets and traffic data collected during 2014 in Northern Virginia and investigates the correlation between Twitter concentrations and traffic surges in July. The results show the promise and effectiveness of the proposed methods and even provide insights into the causality of nonrecurrent traffic surges.
Article
Subway passenger flow prediction is strategically important in metro transit system management. The prediction under event occurrences turns into a very challenging task. In this paper, we adopt a new kind of data source-social media-to tackle this challenge. We develop a systematic approach to examine social media activities and sense event occurrences. Our initial analysis demonstrates that there exists a moderate positive correlation between passenger flow and the rates of social media posts. This finding motivates us to develop a novel approach for improved flow forecast. We first develop a hashtag-based event detection algorithm. Furthermore, we propose a parametric and convex optimization-based approach, called optimization and prediction with hybrid loss function (OPL), to fuse the linear regression and the results of seasonal autoregressive integrated moving average (SARIMA) model jointly. The OPL hybrid model takes advantage of the unique strengths of linear correlation in social media features and SARIMA model in time series prediction. Experiments on events nearby a subway station show that OPL reports the best forecasting performance compared with other state-of-the-art techniques. In addition, an ensemble model is developed to leverage the weighted results from OPL and support vector machine regression together. As a result, the prediction accuracy and the robustness further increase.
Article
Traffic flow pattern identification, as well as anomaly detection, is an important component for traffic operations and control. To reveal the characteristics of regional traffic flow patterns in large road networks, this paper employs dictionary-based compression theory to identify the features of both spatial and temporal patterns by analyzing the multi-dimensional traffic-related data. An anomaly index is derived to quantify the network traffic in both spatial and temporal perspectives. Both pattern identifications are conducted in three different geographic levels: detector, intersection, and sub-region. From different geographic levels, this study finds several important features of traffic flow patterns, including the geographic distribution of traffic flow patterns, pattern shifts at different times-of-day, pattern fluctuations over different days, etc. Both spatial and temporal traffic flow patterns defined in this study can jointly characterize pattern changes and provide a good performance measure of traffic operations and management. The proposed method is further implemented in a case study for the impact of a newly constructed subway line. The before-and-after study identifies the major changes of surrounding road traffic near the subway stations. It is found that new metro stations attract more commute traffic in weekdays as well as entertaining traffic during weekends.
Article
The effectiveness of traditional incident detection is often limited by sparse sensor coverage, and reporting incidents to emergency response systems is labor-intensive. We propose to mine tweet texts to extract incident information on both highways and arterials as an efficient and cost-effective alternative to existing data sources. This paper presents a methodology to crawl, process and filter tweets that are accessible by the public for free. Tweets are acquired from Twitter using the REST API in real time. The process of adaptive data acquisition establishes a dictionary of important keywords and their combinations that can imply traffic incidents (TI). A tweet is then mapped into a high dimensional binary vector in a feature space formed by the dictionary, and classified into either TI related or not. All the TI tweets are then geocoded to determine their locations, and further classified into one of the five incident categories.
Article
Travel time is very critical for emergency response and emergency vehicle (EV) operations. Compared to ordinary vehicles (OVs), EVs are permitted to break conventional road rules to reach the destination within shorter time. However, very few previous studies address the travel time performance of EVs. This study obtained nearly 4-year EV travel time data in Northern Virginia (NOVA) region using 76,000 preemption records at signalized intersections. First, the special characteristics of EV travel time are explored in mean, median, standard deviation and also the distribution, which display largely different characteristics from that of OVs in previous studies. Second, a utility-based model is proposed to quantify the travel time performance of EVs. Third, this paper further investigates two important components of the utility model: benchmark travel time and standardized travel time. The mode of the distribution is chosen as benchmark travel time, and its nonlinear decreasing relationship with the link length is revealed. At the same time, the distribution of standardized travel time is fitted with different candidate distributions and Inv. Gaussian distribution is proved to be the most suitable one. Finally, to validate the proposed model, we implement the model in case studies to estimate link and route travel time performance. The results of route comparisons also show that the proposed model can support EV route choice and eventually improve EV service and operations.
Article
In the last decade, crowdsourcing has emerged as a novel mechanism for accomplishing temporal and spatial critical tasks in transportation with the collective intelligence of individuals and organizations. This paper presents a timely literature review of crowdsourcing and its applications in intelligent transportation systems (ITS). We investigate the ITS services enabled by crowdsourcing, the keyword co-occurrence and coauthorship networks formed by ITS publications, and identify the problems and challenges that need further research. Finally, we briefly introduce our future works focusing on using geospatial tagged data to analyze real-time traffic conditions and the management of traffic flow in urban environment. This review aims to help ITS practitioners and researchers build a state-of-the-art understanding of crowdsourcing in ITS, as well as to call for more research on the application of crowdsourcing in transportation systems.
Article
This study investigates non-travelers’ behavior, focusing on the influence of spatial and temporal distances on decisions not to travel and their effects on the gap between travel intention and actual behavior. The results show that intention formed at a greater temporal distance from an event reflects a stronger actualization but that spatial distance acts as impedance to traveling to distant destinations. The longer the time interval between intention formation and the action is, and the greater the spatial distance to a destination is, the higher the probability to change behaviors. The results indicate that in addition to understanding factors that facilitate travelers without an original travel intention, marketing efforts should target non-travelers to induce the intended travel.
Article
The classification maximum likelihood approach is sufficiently general to encompass many current clustering algorithms, including those based on the sum of squares criterion and on the criterion of H. P. Friedman and J. Rubin [J. Am. Stat. Assoc. 62, 1159-1178 (1967)]. However, as currently implemented, it does not allow the specification of which features (orientation, size, and shape) are to be common to all clusters and which may differ between clusters. Also, it is restricted to Gaussian distributions and it does not allow for noise. We propose ways of overcoming these limitations. A reparameterization of the covariance matrix allows us to specify that some, but not all, features be the same for all clusters. A practical framework for non- Gaussian clustering is outlined, and a means of incorporating noise in the form of a Poisson process is described. An approximate Bayesian method for choosing the number of clusters is given. The performance of the proposed methods is studied by simulation, with encouraging results. The methods are applied to the analysis of a data set arising in the study of diabetes, and the results seem better than those of previous analyses. A magnetic resonance image (MRI) of the brain is also analyzed, and the methods appear successful in extracting the main features of anatomical interest. The methods described here have been implemented in both Fortran and S-PLUS versions, and the software is freely available through StatLib.
Article
Location-based check-in services in various social media applications have enabled individuals to share their activity-related choices providing a new source of human activity data. Although geo-location data has the potential to infer multi-day patterns of individual activities, appropriate methodological approaches are needed. This paper presents a technique to analyze large-scale geo-location data from social media to infer individual activity patterns. A data-driven modeling approach, based on topic modeling, is proposed to classify patterns in individual activity choices. The model provides an activity generation mechanism which when combined with the data from traditional surveys is potentially a useful component of an activity-travel simulator. Using the model, aggregate patterns of users’ weekly activities are extracted from the data. The model is extended to also find user-specific activity patterns. We extend the model to account for missing activities (a major limitation of social media data) and demonstrate how information from activity-based diaries can be complemented with longitudinal geo-location information. This work provides foundational tools that can be used when geo-location data is available to predict disaggregate activity patterns.
Conference Paper
The advances in mobile computing and social networking services enable people to probe the dynamics of a city. In this paper, we address the problem of detecting and describing traffic anomalies using crowd sensing with two forms of data, human mobility and social media. Traffic anomalies are caused by accidents, control, protests, sport events, celebrations, disasters and other events. Unlike existing traffic-anomaly-detection methods, we identify anomalies according to drivers' routing behavior on an urban road network. Here, a detected anomaly is represented by a sub-graph of a road network where drivers' routing behaviors significantly differ from their original patterns. We then try to describe the detected anomaly by mining representative terms from the social media that people posted when the anomaly happened. The system for detecting such traffic anomalies can benefit both drivers and transportation authorities, e.g., by notifying drivers approaching an anomaly and suggesting alternative routes, as well as supporting traffic jam diagnosis and dispersal. We evaluate our system with a GPS trajectory dataset generated by over 30,000 taxicabs over a period of 3 months in Beijing, and a dataset of tweets collected from WeiBo, a Twitter-like social site in China. The results demonstrate the effectiveness and efficiency of our system.
Conference Paper
The study of human activity patterns traditionally relies on the continuous tracking of user location. We approach the problem of activity pattern discovery from a new perspective which is rapidly gaining attention. Instead of actively sampling increasing volumes of sensor data, we explore the participatory sensing potential of multiple mobile social networks, on which users often disclose information about their location and the venues they visit. In this paper, we present automated techniques for filtering, aggregating, and processing combined social networking traces with the goal of extracting descriptions of regularly-occurring user activities, which we refer to as “user routines”. We report our findings based on two localized data sets about a single pool of users: the former contains public geotagged Twitter messages, the latter Foursquare check-ins that provide us with meaningful venue information about the locations we observe. We analyze and combine the two datasets to highlight their properties and show how the emergent features can enhance our understanding of users' daily schedule. Finally, we evaluate and discuss the potential of routine descriptions for predicting future user activity and location.
Conference Paper
The potential to moderate travel demand through changes in the built environment is the subject of more than 50 recent empirical studies. The majority of recent studies are summarized. Elasticities of travel demand with respect to density, diversity, design, and regional accessibility are then derived from selected studies. These elasticity values may be useful in travel forecasting and sketch planning and have already been incorporated into one sketch planning tool, the Environmental Protection Agency's Smart Growth Index model. In weighing the evidence, what can be said, with a degree of certainty, about the effects of built environments on key transportation "outcome" variables: trip frequency, trip length, mode choice, and composite measures of travel demand, vehicle miles traveled (VMT) and vehicle hours traveled (VHT)? Trip frequencies have attracted considerable academic interest of late. They appear to be primarily a function of socioeconomic characteristics of travelers and secondarily a function of the built environment. Trip lengths have received relatively little attention, which may account for the various degrees of importance attributed to the built environment in recent studies. Trip lengths are primarily a function of the built environment and secondarily a function of socioeconomic characteristics. Mode choices have received the most intensive study over the decades. Mode choices depend on both the built environment and socioeconomics (although they probably depend more on the latter). Studies of overall VMT or VHT find the built environment to be much more significant, a product of the differential trip lengths that factor into calculations of VMT and VHT.
Article
This paper analyzes empirically measured values of Travel Liking––how much individuals like to travel, in various overall, mode-, and purpose-based categories. The study addresses two questions: what types of people enjoy travel, and under what circumstances is travel enjoyed? We first review and augment some previously hypothesized reasons why individuals may enjoy travel. Then, using data from 1358 commuting residents of three San Francisco Bay Area neighborhoods, a total of 13 ordinary least-squares linear regression models are presented: eight models of short-distance Travel Liking and five models of long-distance Travel Liking. The results indicate that travelers’ attitudes and personality (representing motivations) are more important determinants of Travel Liking than objective travel amounts. For example, while those who commute long distances do tend to dislike commute travel (as expected), the variables entering the models that hold the most importance relate to the personality and attitudes of the traveler. Most of the hypothesized reasons for liking travel are empirically supported here.
Article
Numerous studies have found that suburban residents drive more and walk less than residents in traditional neighborhoods. What is less well understood is the extent to which the observed patterns of travel behavior can be attributed to the residential built environment itself, as opposed to the prior self-selection of residents into a built environment that is consistent with their predispositions toward certain travel modes and land use configurations. To date, most studies addressing this attitudinal self-selection issue fall into seven categories: direct questioning, statistical control, instrumental variables models, sample selection models, joint discrete choice models, structural equations models, and longitudinal designs. This paper reviews and evaluates these alternative approaches with respect to this particular application (a companion paper focuses on the empirical findings of 28 studies using these approaches). We identify some advantages and disadvantages of each approach, and note the difficulties in actually quantifying the absolute and/or relative extent of the true influence of the built environment on travel behavior. Although time and resource limitations are recognized, we recommend usage of longitudinal structural equations modeling with control groups, a design which is strong with respect to all causality requisites.
Article
This paper contests the conventional wisdom that travel is a derived demand, at least as an absolute. Rather, we suggest that under some circumstances, travel is desired for its own sake. We discuss the phenomenon of undirected travel – cases in which travel is not a byproduct of the activity but itself constitutes the activity. The same reasons why people enjoy undirected travel (a sense of speed, motion, control, enjoyment of beauty) may motivate them to undertake excess travel even in the context of mandatory or maintenance trips. One characteristic of undirected travel is that the destination is ancillary to the travel rather than the converse which is usually assumed. We argue that the destination may be to some degree ancillary more often than is realized. Measuring a positive affinity for travel is complex: in self-reports of attitudes toward travel, respondents are likely to confound their utility for the activities conducted at the destination, and for activities conducted while traveling, with their utility for traveling itself. Despite this measurement challenge, preliminary empirical results from a study of more than 1900 residents of the San Francisco Bay Area provide suggestive evidence for a positive utility for travel, and for a desired travel time budget (TTB). The issues raised here have clear policy implications: the way people will react to policies intended to reduce vehicle travel will depend in part on the relative weights they assign to the three components of a utility for travel. Improving our forecasts of travel behavior may require viewing travel literally as a “good” as well as a “bad” (disutility).
Article
The sprawling patterns of land development common to metropolitan areas of the US have been blamed for high levels of automobile travel, and thus for air quality problems. In response, smart growth programs—designed to counter sprawl—have gained popularity in the US. Studies show that, all else equal, residents of neighborhoods with higher levels of density, land-use mix, transit accessibility, and pedestrian friendliness drive less than residents of neighborhoods with lower levels of these characteristics. These studies have shed little light, however, on the underlying direction of causality—in particular, whether neighborhood design influences travel behavior or whether travel preferences influence the choice of neighborhood. The evidence thus leaves a key question largely unanswered: if cities use land use policies to bring residents closer to destinations and provide viable alternatives to driving, will people drive less and thereby reduce emissions? Here a quasi-longitudinal design is used to investigate the relationship between neighborhood characteristics and travel behavior while taking into account the role of travel preferences and neighborhood preferences in explaining this relationship. A multivariate analysis of cross-sectional data shows that differences in travel behavior between suburban and traditional neighborhoods are largely explained by attitudes. However, a quasi-longitudinal analysis of changes in travel behavior and changes in the built environment shows significant associations, even when attitudes have been accounted for, providing support for a causal relationship.
Article
S ummary A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis.
Article
Despite their importance for urban planning, traffic forecasting and the spread of biological and mobile viruses, our understanding of the basic laws governing human motion remains limited owing to the lack of tools to monitor the time-resolved location of individuals. Here we study the trajectory of 100,000 anonymized mobile phone users whose position is tracked for a six-month period. We find that, in contrast with the random trajectories predicted by the prevailing Lévy flight and random walk models, human trajectories show a high degree of temporal and spatial regularity, each individual being characterized by a time-independent characteristic travel distance and a significant probability to return to a few highly frequented locations. After correcting for differences in travel distances and the inherent anisotropy of each trajectory, the individual travel patterns collapse into a single spatial probability distribution, indicating that, despite the diversity of their travel history, humans follow simple reproducible patterns. This inherent similarity in travel patterns could impact all phenomena driven by human mobility, from epidemic prevention to emergency response, urban planning and agent-based modelling.
Article
Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little systematic guidance associated with these methods for solving important practical questions that arise in cluster analysis, such as How many clusters are there?", Which clustering method should be used?" and How should outliers be handled?". We outline a general methodology for model-based clustering that provides a principled statistical approach to these issues. We also show that this can be useful for other problems in multivariate analysis, such as discriminant analysis and multivariate density estimation. We give examples from medical diagnosis, mineeld detection, cluster recovery from noisy data, and spatial density estimation. Finally, we mention limitations of the methodology, a...
Cross-Platform Future in Focus
  • L Adam
  • L Andrew
A study of users’ movements based on check-in data in location-based social networks
  • J Cao
  • Q Hu
  • Q Li
Using social media to predict traffic flow under special event conditions
  • M Ni
  • Q He
  • J Gao
On-site traffic accident detection with both social media and traffic data
  • Z Zhang
  • Q He