ArticlePublisher preview available
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

In this paper, we demonstrate the use of an inexpensive and easy-to-collect long-term dataset to address the problems caused by basing activity space studies off short-term data. In total, we use 63,114 geo-tagged tweets from 116 unique users to create individuals’ activity spaces based on minimum bounding geometry (convex hull). By using polygon density maps of activity space, we found clear differences between weekday and weekend activity spaces, and were able to observe the growth trajectory of activity space over 17 weeks. In order to reflect the heterogeneous nature of spatial behavior and tweeting habits, we used Latent Class Analysis twice. First, to identify five unique patterns of location-based activity spaces that are different in shape and anchoring. Second, we identify three unique growth trajectories. The comparison among these latent growth trajectories shows that in order to capture the extent of activity spaces we need long time periods for some individuals and shorter periods of observation for others. We also show that past studies using a single digit number of weeks may not be sufficient to capture individuals’ activity space. The major activity locations identified using a multilevel latent class model, do not appear to be statistically related to the growth patterns of Twitter users activity spaces. The evidence here shows Twitter data can be a valuable complementary source of information for heterogeneity analysis in activity-based modeling and simulation.
This content is subject to copyright. Terms and conditions apply.
Activity space estimation with longitudinal observations
of social media data
Jae Hyun Lee
1
Adam W. Davis
1
Seo Youn Yoon
2
Konstadinos G. Goulias
1
Published online: 22 June 2016
ÓSpringer Science+Business Media New York 2016
Abstract In this paper, we demonstrate the use of an inexpensive and easy-to-collect long-
term dataset to address the problems caused by basing activity space studies off short-term
data. In total, we use 63,114 geo-tagged tweets from 116 unique users to create individuals’
activity spaces based on minimum bounding geometry (convex hull). By using polygon
density maps of activity space, we found clear differences between weekday and weekend
activity spaces, and were able to observe the growth trajectory of activity space over
17 weeks. In order to reflect the heterogeneous nature of spatial behavior and tweeting
habits, we used Latent Class Analysis twice. First, to identify five unique patterns of
location-based activity spaces that are different in shape and anchoring. Second, we
identify three unique growth trajectories. The comparison among these latent growth
trajectories shows that in order to capture the extent of activity spaces we need long time
periods for some individuals and shorter periods of observation for others. We also show
that past studies using a single digit number of weeks may not be sufficient to capture
individuals’ activity space. The major activity locations identified using a multilevel latent
Paper accepted for presentation at the 95th Annual Meeting of the Transportation Research Board (TRB),
Washington, D.C., January 10–14, 2016. Also published as GEOTRANS Report 2015-7-01, Santa Barbara,
CA.
&Jae Hyun Lee
lee@geog.ucsb.edu
Adam W. Davis
awdavis@geog.ucsb.edu
Seo Youn Yoon
syyoon@krihs.re.kr
Konstadinos G. Goulias
goulias@geog.ucsb.edu
1
GeoTrans Lab, Department of Geography, University of California Santa Barbara, Santa Barbara,
CA 93106, USA
2
Korea Research Institute for Human Settlements, 254 Simin-daero, Gyeonggi-do,
Dongan-Gu, Anyang-Si 431-712, South Korea
123
Transportation (2016) 43:955–977
DOI 10.1007/s11116-016-9719-1
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... Previously, numerous studies have utilized social media data for transportation research, including the classification of urban activity patterns [5], estimation of travel activity spaces [6], examination of longitudinal travel behavior [7], incidents detection [8], and so on. Specifically, the authors in [5] utilize Latent Dirichlet Allocation to classify individual activity patterns. ...
... Specifically, the authors in [5] utilize Latent Dirichlet Allocation to classify individual activity patterns. [6] estimates the differences between weekday and weekend activity spaces through geo-tagged tweets from different users. Gu et al. [8] first manually annotate tweets relevant to the traffic incidents and develop a Semi-Naive-Bayes model to classify the results. ...
Preprint
Full-text available
Social media has become an important platform for people to express their opinions towards transportation services and infrastructure, which holds the potential for researchers to gain a deeper understanding of individuals' travel choices, for transportation operators to improve service quality, and for policymakers to regulate mobility services. A significant challenge, however, lies in the unstructured nature of social media data. In other words, textual data like social media is not labeled, and large-scale manual annotations are cost-prohibitive. In this study, we introduce a novel methodological framework utilizing Large Language Models (LLMs) to infer the mentioned travel modes from social media posts, and reason people's attitudes toward the associated travel mode, without the need for manual annotation. We compare different LLMs along with various prompting engineering methods in light of human assessment and LLM verification. We find that most social media posts manifest negative rather than positive sentiments. We thus identify the contributing factors to these negative posts and, accordingly, propose recommendations to traffic operators and policymakers.
... Sham Shui Po (SSP) and (B) Tin Shui Wai (TSW). All disparities are significantly different from 0 at the 0.01 level. Small ¼ activity space smaller than -5 in the natural logarithmic value; middle ¼ activity space of 0 in the natural logarithmic value; large ¼ activity space larger than 5 in the natural logarithmic value.Kamruzzaman and Hine 2012;J. H. Lee et al. 2016). ...
... Leveraging social media platforms as data collection tools for activity behavior presents an alternative yet promising avenue. For example, the authors in [39] used geotagged tweets as longitudinal observations for the activity pattern estimation. Similarly, another study conducted by [40] emphasizes the importance of utilizing longitudinal geolocation data obtained from social media platforms, where users freely share their activity-related preferences. ...
Article
Full-text available
The necessity for an external control mechanism that optimizes daily urban trips becomes evident when considering numerous factors at play within a complex environment. This research introduces an activity-based travel personalization tool that incorporates 10 travel decision-making factors driven by the genetic algorithm. To evaluate the framework, a complex artificial scenario is created comprising six activities in a daily plan. Afterwards, the scenario is simulated for predefined user profiles, and the results of the simulation are compared based on the users’ characteristics. The simulations of the scenario successfully demonstrate the appropriate utilization of activity constraints and the efficient implementation of users’ spatiotemporal priorities. In comparison to the base case, significant time savings ranging from 31.2% to 70.2% are observed in the daily activity chains of the simulations. These results indicate that the magnitude of time savings in daily activity simulations depends on how users assign values to the travel decision-making parameters, reflecting the attitudinal differences among the predefined users in this study. This tool holds promise for advancing longitudinal travel behavior research, particularly in gaining a more profound understanding of travel patterns.
... Over the years, numerous methods have been proposed to represent, visualize, and measure the external descriptive statistics (e.g., shape, size) or the internal structures (e.g., randomness and regularity) of activity space (Yuan and Xu 2022). Two simple and straightforward methods are minimum convex hull (Harding et al. 2012;Lee et al. 2016) and the total distance traveled (P aez et al. 2010). The former, however, might be affected by the spatial distribution of activity locations, and the latter relies heavily on data spatial and temporal resolution. ...
Article
Full-text available
Destination, as a key concept in tourism geography, has largely determined the scale at which tourist activity space was modeled and studied. Existing studies usually focused on investigating tourists’ activities and movements either at the intradestination (e.g., within a city) or interdestination scale. Although useful in numerous research contexts, these models based on fixed spatial scales are incapable of portraying the complex spatial structure of tourist activity spaces, which sometimes exhibit hierarchical structures, and could span across different spatial scales. In this study, we propose a new representation of tourist activity space to bridge these gaps. The representation takes tourists’ accommodation locations as key reference points. At the macroscale, the sequence of accommodation locations forms the backbone of tourist activity space, denoted as itinerary type. At the microscale, we introduce the concept of territory to describe how individuals organize activities around these overnight “base camps” (i.e., accommodation locations). We apply this representation over a large-scale mobile phone data set of international travelers visiting South Korea to demonstrate its capability. Results show that four generic itinerary types capture the activity space structure of 89 percent of the tourists. The interrelationships of territories and their topological structures further categorize activity spaces into subtypes, leading to a new method of tourist classification based on their spatiotemporal activity patterns. We believe the proposed representation could enrich new perspectives and debates on how tourist activities can be studied. The representation can also be extended as a generic framework to delineate complex forms of human activity space.
... The structure of an activity space mainly includes home and peripheral movements, other activity points represented by workplace and peripheral movements, and movements and travel between activity points (Golledge, 1997). There are various models of activity space, one of which involves the construction of geometric models or sets of activity points, including the standard deviational ellipse (Hu et al., 2020;Järv et al., 2015;Schönfelder and Axhausen, 2003), the minimum convex polygon (Chen and Akar, 2016;Kraft et al., 2019;Lee et al., 2016;Parthasarathi et al., 2015), the set of activity points (Day et al., 2016), and the shortest-path network estimation method (Ravensbergen et al., 2016). Another approach is the construction of a model by categorizing activity points with reference to the place of residence (Vallée et al., 2011). ...
Article
The activity space of residents is an important mediator for the exploration of the relationship between residents' activities and the city. In this study, one-month mobile phone data are used to study the activity space of residents from three aspects, namely the radius of the activity space, the frequency of activity, and the diversity of activities. Combined with census data and point-of-interest data, a random forest model is established, and partial dependency plots are used to explore the non-linear effects of the built environment and socio-demographic attributes on the activity space. After analyzing mobile phone data from the central area of Shanghai, China, over one month, it is found that the facility density, accessibility, location, housing conditions, and marital status are the most important factors affecting the activity space of residents. Different from previous studies, it is found that the facility density and accessibility have a diminishing marginal effect on the activity space, and both the built environment and socio-demographic attributes have threshold effects on the activity space. The location relative to the Huangpu River is also found to influence the frequency of activity. Moreover, there are obvious differences between the activity spaces of home-owning residents and tenants. Among tenants, a too-low or too-high monthly rent can lead to a small radius of activity space and a low diversity of activities. Married residents often have a large radius of the activity space, a high frequency of activity, and a high diversity of activities. Some factors are also found to affect the influences of other factors on the activity space, and can even change the direction of correlation. The findings of this study can provide references for urban and transportation planning.
Article
Decentralized machine learning, such as Federated Learning (FL), is widely adopted in many application domains. Especially in domains like recommendation systems, sharing gradients instead of private data has recently caught the research community’s attention. Personalized travel route recommendation utilizes users’ location data to recommend optimal travel routes. Location data is extremely privacy sensitive, presenting increased risks of exposing behavioural patterns and demographic attributes. FL for route recommendation can mitigate the sharing of location data. However, this paper shows that an adversary can recover the user trajectories used to train the federated recommendation models with high proximity accuracy. To this effect, we propose a novel attack called DeepSneak, which uses shared gradients obtained from global model training in FL to reconstruct private user trajectories. We formulate the attack as a regression problem and train a generative model by minimizing the distance between gradients. We validate the success of DeepSneak on two real-world trajectory datasets. The results show that we can recover the location trajectories of users with reasonable spatial and semantic accuracy.
Article
Constructing a data-driven spatial contact network model is challenging in epidemiological research. In this study, we examine the applicability of geotagged Twitter data as an instrumental data source for tackling such a challenge. Geotagged Twitter data carrying geolocations of the account users have the strength for longitudinal data collection at a massive scale. Still, the unstructured nature of the data exerts significant methodological and computational difficulties. We focus on methodological solutions and develop a novelty approach that lets a spatial contact network emerge naturally from the massive amount of geospatial tweets. We show that such a data-driven network has reflected the assumptions made by network models regarding human behaviors and has the potential of being used for epidemiological research. To this end, we investigate the network properties and study the spread of pathogens on the proposed spatial contact network by using the homogeneous and heterogeneous susceptible–infectious–recovered (SIR) network models and the event-driven Gillespie’s algorithm. Our simulation results strongly suggest that it is feasible to explicitly construct data-driven spatial models using massive longitudinal Twitter data for public health research.
Chapter
Full-text available
With the increasing use of the Internet for getting information, transacting business and interacting with people, a wide range of activities in everyday life can now be undertaken in cyberspace. As traditional models of accessibility are based on physical notions of distance and proximity, they are inadequate for conceptualizing or analyzing individual accessibility in the physical world and cyberspace (hereafter referred to as hybrid-accessibility). To address the need for new models of space and time that enable us to represent individual accessibility in the information age, there are at least three major research areas: (a) the conceptual and/or behavioral foundation of individual accessibility; (b) appropriate methods for representing accessibility; and (c) feasible operational measures for evaluating individual accessibility. With the recent development and application of GIS methods in the study of accessibility in the physical world (e.g., Forer 1998, Hanson, Kominiak, and Carlin 1997, Huisman and Forer 1998, Kwan 1998, 1999a, 1999b, Miller 1991, 1999, Scott 1999, Talen 1997, Talen and Anselin 1998), it is apparent that GIS have considerable potential in each of these research areas. As shown in some of these studies, a focus on the individual enabled by GIS methods also reveals the spatial-temporal complexity in individual activity patterns and accessibility through 3D visualization or computational procedures.
Article
Full-text available
Applied Latent Class Analysis introduces several innovations in latent class analysis to a wider audience of researchers. Many of the world's leading innovators in the field of latent class analysis contributed essays to this volume, each presenting a key innovation to the basic latent class model and illustrating how it can prove useful in situations typically encountered in actual research.
Conference Paper
Full-text available
In this paper we use Twitter data and a recently developed algorithm at the University of California Santa Barbara to extract Origin-Destination pairs in the Greater Los Angeles metropolitan area known as the Southern California Association of Governments (SCAG) region. This algorithm contains two steps: individual-based trajectory detection and place-based trip aggregation. In essence, if a person tweeted in different TAZs within 4 hours, it is considered to be one OD-trip. The extracted OD-trips were aggregated into 30 minute intervals. Then, we compare these trips with a traditional travel demand model (SCAG, 2012, 4-step model). Substantial heterogeneity is found due to zero trip zones and a variety of social factors including the tweeting demographics. In this paper we illustrate the results from a Tobit model and a three-class latent class regression model that convert tweet derived trips to four-step trips accounting for zonal and trip-maker heterogeneity. In these regression models we use measures of business density and diversity, and population density as added explanatory/control variables, so that a unit contribution of a tweet trip can be adjusted by land-use effects and the zero trip producing zones in the twitter data can be explained in a more complete way. Preliminary results are encouraging and show the usefulness of harvested large-scale mobility data from location-based social media. The results also show the added value of latent class regression models in this experiment. The paper concludes with a review of next steps.
Chapter
Spatial analysis assists theoretical understanding and empirical testing in the social sciences, and rapidly expanding applications of geographic information technologies have advanced the spatial data-gathering needed for spatial analysis and model making. This much-needed volume covers outstanding examples of spatial thinking in the social sciences, with each chapter showing some aspect of how certain social processes can be understood by analyzing their spatial context. The audience for this work is as trans-disciplinary as its authorship because it contains approaches and methodologies useful to geography, anthropology, history, political science, economics, criminology, sociology, and statistics.
Chapter
Applied Latent Class Analysis introduces several innovations in latent class analysis to a wider audience of researchers. Many of the world's leading innovators in the field of latent class analysis contributed essays to this volume, each presenting a key innovation to the basic latent class model and illustrating how it can prove useful in situations typically encountered in actual research.
Chapter
Transportation systems are planned and designed to provide people with the ability to engage in activities at locations and times of their preference. When people cannot engage in activities at locations and times of their preference, the transportation system is deemed to provide a poor level of service. Transportation models are aimed at modeling and forecasting where and when the demand for travel will occur so that the transportation system can be planned and designed to meet the projected travel demand and ensure a high quality of life for the residents and visitors of a geographical region. Thus the analysis of travel behavior is inextricably linked to the concepts of space and time, and there is a growing body of literature that makes a strong case for the development of transportation models and planning methods that explicitly recognize the role of space and time dimensions in people’s travel behavior.
Article
Accessibility is a fundamental but often neglected concept in transportation analysis and planning. Three complementary views of accessibility have evolved in the literature. The first is the constraints‐oriented approach, best implemented by Hägerstrand's space‐time prisms. The second perspective follows a spatial interaction framework and derives “attraction‐accessibility measures” that compare destinations' attractiveness with the travel costs required. A third approach measures the benefit provided to individuals by the transportation/land‐use system. This paper reconciles the three complementary approaches by deriving space‐time accessibility and benefit measures that are consistent with the rigorous Weibull axiomatic framework for accessibility measures. This research also develops computational procedures for calculating these measures within network structures. This provides realistic accessibility measures that reflect the locations, distances, and travel velocities allowed by an urban transportation network. Since their computational burdens are reasonable, they can be applied at the urban scale using a GIS.
Article
This work investigated the effect of personal, household, and neighborhood characteristics on variations in activity space through the use of a shortest network path buffer approach. A special focus was on the comparison of older adults (age 65 years and older, sample size of 591) with working-age adults (age 25 to 59 years, sample size of 1,806) to understand better the changes in activity space with age. Because activity space was a limited measure of social activity dependent on assumptions, this work investigated relative differences in the geographic reach of activity space and factors that increased or decreased that reach. The data were from the 2006 Household Activity Survey conducted in the Puget Sound, Washington, region. Descriptive data analysis showed that older adults on average had a substantially smaller (23%) geographic reach of activity space compared with working-age adults and that older adults who did not drive had the smallest geographic reach of activity space, only 16.7% of the overall average. The regression model results showed that low household income, often correlated with reduced mobility, was associated with a reduced geographic reach of activity space for older adults. Activity frequency significantly increased the geographic reach of activity space, and the effect was larger for older adults. The geographic reach of activity space was associated with neighborhood characteristics. Living in suburban and exurban neighborhoods led to a larger geographic reach of activity space for both older and working-age adults, while living in mixed use neighborhoods led to a smaller geographic reach of activity space.