Project

Random Twitter Project

Goal: various computational social science studies using randomly sampled Twitter data

Updates
0 new
0
Recommendations
0 new
0
Followers
0 new
6
Reads
5 new
159

Project log

Hai Liang
added a research item
Population-level national networks on social media are precious and essential for network science and behavioural science. This study collected a population-level Twitter network, based on both language and geolocation tags. We proposed a set of validation approaches to evaluate the validity of our datasets. Finally, we re-examined classical network and communication propositions (e.g., 80/20 rule, six degrees of separation) on the national network. Our dataset and strategy would flourish the data collection pool of population-level social networks and further develop the research of network analysis in digital media environment.
Hai Liang
added a research item
Social media provide gold mines of data for communication research. Different from traditional social science research methods such as survey, experiment, or content analysis, a new set of methods are required to collect Big Data from social media. In the current entry, we introduce the general procedure of how to collect social media data on content, usage, and structure via direct scraping and application programming interfaces (APIs). We also discuss sampling strategy and ethical issues involved in data collection from social media.
Hai Liang
added a project goal
various computational social science studies using randomly sampled Twitter data
 
Hai Liang
added 2 research items
Modeling retweeting behaviors is important for understanding and predicting how information spreads on social media platforms. The present study contributes to the literature by examining the decreasing social contagion and increasing homophily effects with the depth of diffusion cascades. To test the hypotheses, the study proposes a matching-on-followers method by combining choice and cascade models. More specifically, the study examines the impacts of interaction frequency, multiple exposures, and interest similarity between parent users and potential retweeters on the likelihood of retweeting. The study also incorporates the depth of diffusion cascades and network structures into the model. By using a random sample of original tweets, their retweets, and potential retweeters (N = 87,139), the study found that cascade depth is negatively associated with social contagion effects (interaction and multiple exposures) and positively associated with the effect of interest similarity on message sharing. These results indicate that influence-based and homophily-driven diffusion operate differently in cascades with different diffusion structures.
Social media platforms are increasingly being used as important sources for obtaining various types of information in the current digital age. While an increasing number of studies have investigated the factors that influence user's news content sharing behavior, few have paid attention to the reposting latency of online news contents. Reposting latency refers to the delay of interval time between original post publish time and repost time. Reposting activity on social media is an important type of user feedback behavior to the message received. The speed of the response could reflect user's processing efficiency and capacity. This study examined the possible factors that may influence users' reposting latency of news contents on social media. In doing so, we employed a multilevel negative binomial model to examine the impacts of issue attention, temporal usage pattern, and information redundancy. Our findings show that multiple issues could distract user's attention, thus leading to the low reposting speed. We also found a distributed temporal usage pattern could help shorten reposting time, while information redundancy and information overload could increase the reposting latency of news contents on social media. The findings of this study can contribute to advancing the understanding of news consumption behavior on social media. The conclusions have the potential to help in explaining and further predicting the success of news diffusion.
Hai Liang
added a research item
The use of social media such as Twitter has changed our life routines. Previous studies have found consistent diurnal patterns of user activities on social media platforms. However, the temporal organization of human behaviors is partly socially constructed and is determined by numerous factors other than the diurnal cycle. The current study argues that peer influence incurred by social networks is one of these potential factors. To test our hypotheses, we collected a random sample of active Twitter users (N = 5,066), their followers and followees (N = 424,984), and all available tweets posted by these users. Results suggest that the temporal patterns between self-posting and interaction behavior differ across individuals. Users’ daily activity rhythms are more similar to their followees’ rhythms than to their followers’ rhythms. Despite the fact that the self-selection mechanism (homophily) cannot be ignored, peer influence seems to be an equally likely mechanism explaining such similarity.
Hai Liang
added 4 research items
Privacy is a culturally specific phenomenon. As social media platforms are going global, questions concerning privacy practices in a cross-cultural context become increasingly important. The purpose of this study is to examine cultural variations of privacy settings and self-disclosure of geolocation on Twitter. We randomly selected 3.3 million Twitter accounts from more than 100 societies. Results revealed considerable cultural and societal differences. Privacy setting in collectivistic societies was more effective in encouraging self-disclosure; whereas it appeared to be less important for users in individualistic societies. Internet penetration was also a significant factor in predicting both the adoption of privacy setting and geolocation self-disclosure. However, we did not find any direct relationships between cultural values and self-disclosure.
Replication is an essential requirement for scientific discovery. The current study aims to generalize and replicate 10 propositions made in previous Twitter studies using a representative dataset. Our findings suggest 6 out of 10 propositions could not be replicated due to the variations of data collection, analytic strategies employed, and inconsistent measurements. The study’s contributions are twofold: First, it systematically summarized and assessed some important claims in the field, which can inform future studies. Second, it proposed a feasible approach to generating a random sample of Twitter users and its associated ego networks, which might serve as a solution for answering social-scientific questions at the individual level without accessing the complete data archive.
It remains controversial whether community structures in social networks are beneficial or not for information diffusion. This study examined the relationships among four core concepts in social network analysis—network redundancy, information redundancy, ego-alter similarity, and tie strength—and their impacts on information diffusion. By using more than 6,500 representative ego networks containing nearly 1 million following relationships from Twitter, the current study found that (1) network redundancy is positively associated with the probability of being retweeted even when competing variables are controlled for; (2) network redundancy is positively associated with information redundancy, which in turn decreases the probability of being retweeted; and (3) the inclusion of both ego-alter similarity and tie strength can attenuate the impact of network redundancy on the probability of being retweeted.