Zhiyuan Cheng’s research while affiliated with Texas A&M University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (19)


Exploring Millions of Footprints in Location Sharing Services
  • Article

August 2021

·

14 Reads

·

56 Citations

Proceedings of the International AAAI Conference on Web and Social Media

Zhiyuan Cheng

·

James Caverlee

·

Kyumin Lee

·

Location sharing services (LSS) like Foursquare, Gowalla, and Facebook Places support hundreds of millions of user-driven footprints (i.e., "checkins"). Those global-scale footprints provide a unique opportunity to study the social and temporal characteristics of how people use these services and to model patterns of human mobility, which are significant factors for the design of future mobile+location-based services, traffic forecasting, urban planning, as well as epidemiological models of disease spread. In this paper, we investigate 22 million checkins across 220,000 users and report a quantitative assessment of human mobility patterns by analyzing the spatial, temporal, social, and textual aspects associated with these footprints. We find that: (i) LSS users follow the “Levy Flight” mobility pattern and adopt periodic behaviors; (ii) While geographic and economic constraints affect mobility patterns, so does individual social status; and (iii) Content and sentiment-based analysis of posts associated with checkins can provide a rich source of context for better understanding how users engage with these services.



Who is the barbecue king of texas?: a geo-spatial approach to finding local experts on twitter

July 2014

·

47 Reads

·

48 Citations

This paper addresses the problem of identifying local experts in social media systems like Twitter. Local experts -- in contrast to general topic experts -- have specialized knowledge focused around a particular location, and are important for many applications including answering local information needs and interacting with community experts. And yet identifying these experts is difficult. Hence in this paper, we propose a geo-spatial-driven approach for identifying local experts that leverages the fine-grained GPS coordinates of millions of Twitter users. We propose a local expertise framework that integrates both users' topical expertise and their local authority. Concretely, we estimate a user's local authority via a novel spatial proximity expertise approach that leverages over 15 million geo-tagged Twitter lists. We estimate a user's topical expertise based on expertise propagation over 600 million geo-tagged social connections on Twitter. We evaluate the proposed approach across 56 queries coupled with over 11,000 individual judgments from Amazon Mechanical Turk. We find significant improvement over both general (non-local) expert approaches and comparable local expert finding approaches.


Finding local experts on twitter

April 2014

·

26 Reads

·

8 Citations

We address the problem of identifying local experts on Twitter. Specifically, we propose a local expertise framework that integrates both users' topical expertise and their local authority by leveraging over 15 million geo-tagged Twitter lists. We evaluate the proposed approach across 16 queries coupled with over 2,000 individual judgments from Amazon Mechanical Turk. Our initial experiments find significant improvement over a naive local expert finding approach, suggesting the promise of exploiting geo-tagged Twitter lists for local expert finding.


Campaign Extraction from Social Media

December 2013

·

370 Reads

·

47 Citations

ACM Transactions on Intelligent Systems and Technology

In this manuscript, we study the problem of detecting coordinated free text campaigns in large-scale social media. These campaigns-ranging from coordinated spam messages to promotional and advertising campaigns to political astro-turfing-are growing in significance and reach with the commensurate rise in massive-scale social systems. Specifically, we propose and evaluate a content-driven framework for effectively linking free text posts with common "talking points" and extracting campaigns from large-scale social media. Three of the salient features of the campaign extraction framework are: (i) first, we investigate graph mining techniques for isolating coherent campaigns from large message-based graphs; (ii) second, we conduct a comprehensive comparative study of text-based message correlation in message and user levels; and (iii) finally, we analyze temporal behaviors of various campaign types. Through an experimental study over millions of Twitter messages we identify five major types of campaigns-namely Spam, Promotion, Template, News, and Celebrity campaigns-and we show how these campaigns may be extracted with high precision and recall.


Location prediction in social media based on tie strength

October 2013

·

169 Reads

·

172 Citations

We propose a novel network-based approach for location estimation in social media that integrates evidence of the social tie strength between users for improved location estimation. Concretely, we propose a location estimator -- FriendlyLocation -- that leverages the relationship between the strength of the tie between a pair of users, and the distance between the pair. Based on an examination of over 100 million geo-encoded tweets and 73 million Twitter user profiles, we identify several factors such as the number of followers and how the users interact that can strongly reveal the distance between a pair of users. We use these factors to train a decision tree to distinguish between pairs of users who are likely to live nearby and pairs of users who are likely to live in different areas. We use the results of this decision tree as the input to a maximum likelihood estimator to predict a user's location. We find that this proposed method significantly improves the results of location estimation relative to a state-of-the-art technique. Our system reduces the average error distance for 80% of Twitter users from 40 miles to 21 miles using only information from the user's friends and friends-of-friends, which has great significance for augmenting traditional social media and enriching location-based services with more refined and accurate location estimates.


Spatio-temporal dynamics of online memes: A study of geo-tagged tweets

May 2013

·

115 Reads

·

90 Citations

We conduct a study of the spatio-temporal dynamics of Twitter hashtags through a sample of 2 billion geo-tagged tweets. In our analysis, we (i) examine the impact of location, time, and distance on the adoption of hashtags, which is important for understanding meme diffusion and information propagation; (ii) examine the spatial propagation of hashtags through their focus, entropy, and spread; and (iii) present two methods that leverage the spatio-temporal propagation of hashtags to characterize locations. Based on this study, we find that although hashtags are a global phenomenon, the physical distance between locations is a strong constraint on the adoption of hashtags, both in terms of the hashtags shared between locations and in the timing of when these hashtags are adopted. We find both spatial and temporal locality as most hashtags spread over small geographical areas but at high speeds. We also find that hashtags are mostly a local phenomenon with long-tailed life spans. These (and other) findings have important implications for a variety of systems and applications, including targeted advertising, location-based services, social media search, and content delivery networks.


How big is the crowd?: event and location based population modeling in social media

May 2013

·

38 Reads

·

27 Citations

In this paper, we address the challenge of modeling the size, duration, and temporal dynamics of short-lived crowds that manifest in social media. Successful population modeling for crowds is critical for many services including location recommendation, traffic prediction, and advertising. However, crowd modeling is challenging since 1) user-contributed data in social media is noisy and oftentimes incomplete, in the sense that users only reveal when they join a crowd through posts but not when they depart; and 2) the size of short-lived crowds typically changes rapidly, growing and shrinking in sharp bursts. Toward robust population modeling, we first propose a duration model to predict the time users spend in a particular crowd. We propose a time-evolving population model for estimating the number of people departing a crowd, which enables the prediction of the total population remaining in a crowd. Based on these population models, we further describe an approach that allows us to predict the number of posts generated from a crowd. We validate the crowd models through extensive experiments over 22 million geo-location based check-ins and 120,000 event-related tweets.


A Content-Driven Framework for Geolocating Microblog Users

February 2013

·

62 Reads

·

45 Citations

ACM Transactions on Intelligent Systems and Technology

Highly dynamic real-time microblog systems have already published petabytes of real-time human sensor data in the form of status updates. However, the lack of user adoption of geo-based features per user or per post signals that the promise of microblog services as location-based sensing systems may have only limited reach and impact. Thus, in this article, we propose and evaluate a probabilistic framework for estimating a microblog user's location based purely on the content of the user's posts. Our framework can overcome the sparsity of geo-enabled features in these services and bring augmented scope and breadth to emerging location-based personalized information services. Three of the key features of the proposed approach are: (i) its reliance purely on publicly available content; (ii) a classification component for automatically identifying words in posts with a strong local geo-scope; and (iii) a lattice-based neighborhood smoothing model for refining a user's location estimate. On average we find that the location estimates converge quickly, placing 51% of users within 100 miles of their actual location.


Public checkins versus private queries: measuring and evaluating spatial preference

November 2012

·

8 Reads

James Caverlee

·

Zhiyuan Cheng

·

Wai Gen Yee

·

[...]

·

Yuan Liang

Understanding the spatial preference of mobile and web users is of great significance to creating and improving location-based recommendation systems, travel planners, search engines, and other emerging mobile applications. However, traditional sources of spatial preference -- which reflect the patterns of geo-spatial interest of large numbers of users -- have typically been expensive to collect, proprietary, and unavailable for widespread use. In this paper, we investigate the viability of new publicly-available geospatial information to capture spatial preference. Concretely, we compare a set of 35 million publicly shared check-ins voluntarily generated by users of a popular location sharing service with a set of over 400 million private query logs recorded by a commercial hotel search engine. Although generated by users with fundamentally different intentions, we find common conclusions may be drawn from both data sources -- (i) that the relative geo-spatial "footprint" of different locations is surprisingly consistent across both; (ii) that methods to identify significant locations results in similar conclusions; and (iii) that similar performance may be achieved for automatically identifying groups of related locations. These results indicate the viability of publicly shared location information to complement (and replace, in some cases), privately held location information.


Citations (17)


... Other studies use large-scale data, including LBSN data, and data mining techniques to understand which factors may be associated with people's movement patterns. For example, Cheng et al. [2021] used geolocated data from Twitter to understand user movements. The authors associated this spatial information with the economic characteristics of users, the geographic aspects of the areas frequented, as well as their positioning within the social network and the language used in their check-ins. ...

Reference:

Modeling Interest Networks in Urban Areas: A Comparative Study of Google Places and Foursquare Across Countries
Exploring Millions of Footprints in Location Sharing Services
  • Citing Article
  • August 2021

Proceedings of the International AAAI Conference on Web and Social Media

... Such campaigns aiming to influence public opinion are a common issue in the social media ecosystem. Past studies studied user behavior (Cao and Caverlee 2015), content (Lee et al. 2011(Lee et al. , 2014, strategies (Zannettou et al. 2019;Elmas, Overdorf, and Aberer 2023), and networks to understand and detect coordinated campaigns. Studies focusing on networks investigated the cases of accounts determined to be inauthentic by Twitter (Merhi, Rajtmajer, and Lee 2023), automated accounts (bot) (Minnich et al. 2017;Elmas, Overdorf, and Aberer 2022), follow back accounts (Beers et al. 2023;Elmas, Randl, and Attia 2024), accounts promoting sponsored topics (Varol et al. 2017), and cryptocurrencies (Tardelli et al. 2022). ...

Campaign Extraction from Social Media
  • Citing Article
  • December 2013

ACM Transactions on Intelligent Systems and Technology

... Finding experts for collaboration [22,47] Responses to factual questions [10] Providing recommendations upon products, people or places Performing generic tasks Finding local experts [13] No specific task [16] Online knowledge communities and Academic social networking: Among the 51 documents we utilized for summarizing research, these are the two domains with the least number of referenced articles. Within these domains, we identified key tasks associated with expert searching, including knowledge sharing and seeking, addressing technical issues in online knowledge communities, fostering collaboration and innovation in research, and facilitating the exchange of academic expertise in academic social networking. ...

Who is the barbecue king of texas?: a geo-spatial approach to finding local experts on twitter
  • Citing Article
  • July 2014

... Collecting pedestrian flow data is the basis for calibrating and validating the related models as well as to analyze pedestrian behaviors. Data can be obtained by means of tracking camera systems [11], [12], [1], [13], [14], [15], GPS sensors [16], [17], Bluetooth scans [18], wi-fi signals [19], laser imaging detection system [20], mobile phones and social media information [21], [22], [23], [24]. In the context of pedestrian monitoring, object detection [25] and tracking [26], [27], using high temporal resolution images from cameras [28] is one of the main challenges for applying computer vision to quantify pedestrian speed, direction and density distributions. ...

How big is the crowd?: event and location based population modeling in social media
  • Citing Conference Paper
  • May 2013

... Social media platforms (SMPs) are becoming more accessible and widespread day by day due to various factors, such as the increasing demand for easier access to information, the advancement of mobile devices, and the growing internet connectivity. 1 TikTok is a popular SMP where users can create and share videos up to 3 minutes long through personalized profiles or pages. 2 TikTok features shorter videos compared to other SMPs and presents them to billions of users worldwide by highlighting the key aspects of the content in an entertaining manner. 3 SMPs have become a resource that an increasing number of people use to access and share health information. ...

Spatial influence vs. community influence: Modeling the global spread of social media
  • Citing Conference Paper
  • October 2012

... Users rarely reveal their geographic locations on social platforms due to privacy protection restrictions, despite the diverse applications of user location data. For instance, only 5% of Twitter users include their home locations in their profile [3], and only 1% of tweets are geotagged [4]. Therefore, research on location-based inference methods for Twitter users is an urgent matter. ...

A Content-Driven Framework for Geolocating Microblog Users
  • Citing Article
  • February 2013

ACM Transactions on Intelligent Systems and Technology

... The location which maximizes the probability of becoming friends with other neighbors is selected as the user's location. Based on the research by Backstrom et al. [14], McGee et al. [24] further discriminate geographically close social relationships by utilizing a decision tree, which establishes connections between various properties of social relationships (such as direction, categories, etc.) and location proximity. The Spot-Tightness [22] analyzes the relation between distance and social closeness to construct a probability model. ...

Location prediction in social media based on tie strength
  • Citing Conference Paper
  • October 2013

... Examples include sentiment analysis (Thelwall, 2010;Kouloumpis et al., 2011;Tan et al., 2011;Bermingham et al., 2009). Further content analysis targeted the (geospatial) information spreading (Fink et al., 2016;Kamath et al., 2013;Yin et al., 2011), or Youtube (Brodersen et al., 2012). ...

Spatio-temporal dynamics of online memes: A study of geo-tagged tweets
  • Citing Conference Paper
  • May 2013

... Finding experts in Twitter has also attracted a lot of attentions of researchers, [17] mines the crowdsourcing wisdom hidden in Twitter Lists to get tags of users with high quality. Researches, such as [10], combines different user-related Twitter information together to enhance the effects of the expert finding, and they rank the expertise by using a semisupervised graph ranking method. ...

Finding local experts on twitter
  • Citing Conference Paper
  • April 2014

... Finally, by tracing the click trajectories of fraudulent dating sites and their pornographic pre-lander pages, we reflect on the heterosexually programmed characteristics of the 'attention spam' (Lee et al. 2012) initiated by the links in bot profile bios. Here, the heteronormative excess of 'mildly sexualized' female Instagram profiles plays into the realm of 'pornographic peekaboo' (Paasonen, Jarrett, and Light 2019, p. 52) in which explicit sexual displays are perpetually 'one click away', simultaneously visible and hidden. ...

Detecting collective attention spam
  • Citing Article
  • Full-text available
  • April 2012