Conference Paper

What Is Twitter, a Social Network or a News Media?

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Twitter, a microblogging service less than three years old, com- mands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal of this paper is to study the topological characteristics of Twitter and its power as a new medium of information sharing. We have crawled the entire Twitter site and obtained 41:7 million user profiles, 1:47 billion social relations, 4; 262 trending topics, and 106 million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effec- tive diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks (28). In order to identify influentials on Twitter, we have ranked users by the number of followers and by PageRank and found two rankings to be sim- ilar. Ranking by retweets differs from the previous two rankings, indicating a gap in influence inferred from the number of followers and that from the popularity of one's tweets. We have analyzed the tweets of top trending topics and reported on their temporal behav- ior and user participation. We have classified the trending topics based on the active period and the tweets and show that the ma- jority (over 85%) of topics are headline news or persistent news in nature. A closer look at retweets reveals that any retweeted tweet is to reach an average of 1; 000 users no matter what the number of followers is of the original tweet. Once retweeted, a tweet gets retweeted almost instantly on next hops, signifying fast diffusion of information after the 1st retweet. To the best of our knowledge this work is the first quantitative study on the entire Twittersphere and information diffusion on it.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... It refers to the belief that the life span of emerging events online is only 7 days (Zhang, 2019). Kwak et al. (2010) used data from Twitter and concluded that the active period of any widespread public opinion lasts no more than a week, and with 31% of the lifespans found lasting for only one day. ...
... Additionally, they created combinations of the four types, then classified into four popularity patterns: exogenous subcritical, exogenous critical, endogenous critical, and endogenous subcritical. Utilizing the patterns, Kwak et al. (2010) studied the duration of these four patterns on Twitter. Fujita et al. (2018) estimated the influence of exogenous and endogenous forces on events. ...
Article
Full-text available
Based on event history analysis, this study examined the survival distribution of the duration of online public opinions related to major health emergencies and its influencing factors. We analyzed the data of such emergencies (N = 125) that took place in China during a period of 10 years (2012–2021). The results of the Kaplan-Meier method and Cox proportional hazards regression analysis showed that the average duration of online public opinions regarding health emergencies is 43 days, and the median is 19 days, which dispels the myth of the “Seven-day Law of Propagation.” Furthermore, the duration of online public opinions can be divided into three stages: the rapid decline stage (0–50 days), the slowdown stage (51–200 days), and the disappearing stage (after 200 days). In addition, the type of event, and the volume of both social media discussion and traditional media coverage all had significant impacts on the duration. Our findings provide practical implications for the carrying out of targeted and stage-based governance of public opinions.
... Previous research has shown that the number of tweets about a particular topic reflects the users' attention to that 33,34 . We thus estimated the popularity of tweets for each topic (grouped in 4 major themes, see previous section) to monitor temporal changes in the interest of users (Fig. 3A). ...
... Part of the delay was associated with fears that the Japanese population could resist vaccination, given past experiences with HPV vaccines 9 . Twitter provides a platform to monitor in real-time the public debate and engagement in topics of relevance to health and policy, and not least social and economic implications of government decisions 33 . We leveraged the textual information on tweets and performed a topic analysis of 114 357 691 vaccine-related tweets to identify 15 topics further grouped into four major themes: 1) Personal issue, 2) Breaking news, 3) Politics, and 4) Conspiracy and humour during the vaccination campaign in Japan. ...
Preprint
Full-text available
Vaccines are promising tools to control the spread of COVID-19. An effective vaccination campaign requires government policies and community engagement, sharing experiences for social support, and voicing concerns to vaccine safety and efficiency. The increasing use of online social platforms allows us to trace large-scale communication and infer public opinion in real-time. We collected more than 100 million vaccine-related tweets posted by 8 million users and used the Latent Dirichlet Allocation model to perform automated topic modeling of tweet texts during the vaccination campaign in Japan. We identified 15 topics grouped into 4 themes on Personal issue, Breaking news, Politics, and Conspiracy and humour. The evolution of the popularity of themes revealed a shift in public opinion, initially sharing the attention over personal issues (individual aspect), collecting information from the news (knowledge acquisition), and government criticisms, towards personal experiences once confidence in the vaccination campaign was established. An interrupted time series regression analysis showed that the Tokyo Olympic Games affected public opinion more than other critical events but not the course of the vaccination. Public opinion on politics was significantly affected by various events, positively shifting the attention in the early stages of the vaccination campaign and negatively later. Tweets about personal issues were mostly retweeted when the vaccination reached the younger population. The associations between the vaccination campaign stages and tweet themes suggest that the public engagement in the social platform contributed to speedup vaccine uptake by reducing anxiety via social learning and support.
... Yet, Wharf can still support much larger graphs for PPR use cases where walks are shorter (around 5-15 vertices long). As explained in [2], the theoretical guarantees are preserved for = 10 and = 10, so Wharf can scale to graphs with up to (2 32 − 1)/100 ≈ 42.94M vertices, such as the Twitter dataset [25] as we show in our experiments. Datasets. ...
... Datasets. We used four real-world (Table 1) and eight synthetic graph datasets: Real Graphscom-Youtube is an undirected graph of the Youtube social network [56], soc-LiveJournal is a directed graph of LiveJournal social network [1], com-Orkut is an undirected graph of Orkut social network [56], and Twitter is a directed graph of the Twitter network, in which edges model the follower-followee relationship [25]; Synthetic Graphs -We generated large-scale synthetic graphs sampled from the R-MAT model [7]. Specifically, we used the TrillionG 7 [42] tool to generate Erdős Rényi, er-, graphs with 2 nodes, uniformly distributed edges and witha an average vertex degree of 100 by setting the R-MAT parameters to = = = = 0.25. ...
Preprint
Graphs in many applications, such as social networks and IoT, are inherently streaming, involving continuous additions and deletions of vertices and edges at high rates. Constructing random walks in a graph, i.e., sequences of vertices selected with a specific probability distribution, is a prominent task in many of these graph applications as well as machine learning (ML) on graph-structured data. In a streaming scenario, random walks need to constantly keep up with the graph updates to avoid stale walks and thus, performance degradation in the downstream tasks. We present Wharf, a system that efficiently stores and updates random walks on streaming graphs. It avoids a potential size explosion by maintaining a compressed, high-throughput, and low-latency data structure. It achieves (i) the succinct representation by coupling compressed purely functional binary trees and pairing functions for storing the walks, and (ii) efficient walk updates by effectively pruning the walk search space. We evaluate Wharf, with real and synthetic graphs, in terms of throughput and latency when updating random walks. The results show the high superiority of Wharf over inverted index- and tree-based baselines.
... De ahí que en el caso de Twitter se haya convertido en una herramienta esencial y de uso frecuente (Parmelee & Bichard, 2011). De hecho, esta herramienta se considera más cercana de ser un medio de comunicación que una red social (Kwak et al., 2010). ...
Chapter
Full-text available
En la ciudad de Latacunga, provincia de Cotopaxi, se encuentra el primer canal comunitario del Ecuador, Tv Micc Canal 47. En la pantalla chica encontramos un sinnúmero de ofertas para elegir; en la pantalla nacional y regional los productos educativos tienen presencia fragmentada, principalmente, los que promueven mantener su idioma autóctono. Este trabajo analiza la percepción del programa Wawa kuna tv en la población infantil del cantón Latacunga, la producción televisiva puede ser una estrategia para llegar a los menores con mensajes a través de diferentes temáticas. Para fomentar el idioma quichua a través del juego en la televisión, el programa es un vínculo entre la educación y la sociedad, lo audiovisual es aprovechado como un medio para impartir valores y educar. Con la entrevista en profundidad, el grupo de discusión y las fichas de observación, se conoce la planificación de los espacios, la técnica del dibujo, permite ver el sentir de los niños que participan en el canal y de los infantes que vieron por primera vez la transmisión, dibujando lo que aprendieron. La segunda infancia del cantón Latacunga considera que Wawa kuna tv elabora los contenidos para cada emisión tomando en cuenta su criterio, al ser un segmento con bloques de participación, aprendizaje y de fácil comprensión, buscan reforzar el conocimiento adquirido por los niños en sus hogares o escuela, promoviendo la interculturalidad.
... Social media has dramatically changed the way in which people get and consume news [28,31]. Alas, this has also facilitated the dissemination of misleading information (i.e., misinformation) and of deliberate campaigns to spread false narratives (i.e., disinformation) [45,46,58,60]. ...
Conference Paper
Growing evidence points to recurring influence campaigns on social media, often sponsored by state actors aiming to manipulate public opinion on sensitive political topics. Typically, campaigns are performed through instrumented accounts, known as troll accounts; despite their prominence, however, little work has been done to detect these accounts in the wild. In this paper, we present TROLLMAGNIFIER, a detection system for troll accounts. Our key observation, based on analysis of known Russian-sponsored troll accounts identified by Reddit, is that they show loose coordination, often interacting with each other to further specific narratives. Therefore, troll accounts controlled by the same actor often show similarities that can be leveraged for detection. TROLLMAGNIFIER learns the typical behavior of known troll accounts and identifies more that behave similarly. We train TROLLMAGNIFIER on a set of 335 known troll accounts and run it on a large dataset of Reddit accounts. Our system identifies 1,248 potential troll accounts; we then provide a multi-faceted analysis to corroborate the correctness of our classification. In particular, 66% of the detected accounts show signs of being instrumented by malicious actors (e.g., they were created on the same exact day as a known troll, they have since been suspended by Reddit, etc.). They also discuss similar topics as the known troll accounts and exhibit temporal synchronization in their activity. Overall, we show that using TROLLMAGNIFIER, one can grow the initial knowledge of potential trolls provided by Reddit by over 300%.
... In this way, APIs can be leveraged to gather data and metadata using specific keywords (hashtags, mentions, cashtags, etc.), allowing the creation of rich datasets [61]. Given its open features, Twitter has become the preferred platform for news sharing and consumption and political discussions [49,111]. As such, it has become the perfect place for disinformation, misbehavior, opinion manipulation, and propaganda, which has been extensively investigated [12,72]. ...
... In contrast, between October 29 and November 6, only 869 posts, or 19.86% of the dataset's tweets, were posted. This dwindling in the number of tweets after the first few days of the event aligns with "the overall temporal pattern of information diffusion on Twitter" (Kwak et al., 2010;Kwon et al., 2019). In the days immediately following the event, the public tweeted questions concerning the tragedy such as who was responsible for the event, who the victims were, who their families were, and where they were from. ...
Article
Full-text available
In this article, we analyze anti- and pro-immigrant attitudes expressed following the Essex Lorry Deaths tragedy in October 2019 in Britain, in which 39 Vietnamese immigrants died in a sealed lorry truck on their way to their destination. We apply Structural Topic Modeling, an automated text analysis method, to a Twitter dataset (N = 4,376), to understand public responses to the Lorry Deaths incident. We find that Twitter users' posts were organized into two themes regarding attitudes toward immigrants: (1) migration narratives, stereotypes, and victim identities, and (2) border control. Within each theme, both pro- and anti-immigration attitudes were expressed. Pro-immigration posts reflected counter-narratives that challenged the mainstream media's coverage of the incident and critiqued the militarization of borders and the criminalization of immigration. Anti-immigration posts ranged from reproducing stereotypes about Vietnamese immigrants to explicitly blaming the victims themselves or their families for the deaths. This study demonstrates the uses and limitations of using Twitter for public opinion research by offering a nuanced analysis of how pro-and anti-immigration attitudes are discussed in response to a tragic event. Our research also contributes to a growing literature on public opinion about an often-forgotten immigrant group in the UK, the Vietnamese.
... We use prediction benchmark Tox21 Huang et al. (2016), addressing toxicity prediction as a binary classification task. For social networks we use the Facebook dataset (McAuley & Leskovec, 2012) and the Twitter dataset Kwak et al. (2010). The combination of social network and molecular datasets were chosen in order to show the framework's capability in very different domains. ...
Article
Full-text available
There have been many models made to achieve optimal results on classification tasks. We present a novel framework that is able to augment these models to achieve even higher levels of classification accuracy. Our framework is used in addition to and flexibly on top of other models and uses a reinforcement learning approach to learn and generate new difficult training data samples in order to further refine the classification model. By making new, harder, and more meaningful data samples our framework helps the model learn meaningful relationships in the data for its classification task. This allows our framework to augment models during training rather than working on pre-trained classifiers. Through our experimentation we show that our framework improves models’ classification accuracy. We also show the effectiveness of tuning our components through our ablation studies. Lastly, we discuss possible improvements to our framework and directions for future works.
... Twitter 1 has become an important fresh information source, and has inspired recent research, such as influential Twitter user detection (Kwak et al. 2010), fresh links mining (Dong et al. 2010) and breaking news extraction (Sankaranarayanan et al. 2009) from tweets. Semantic Role Labeling (SRL) for tweets, which takes a tweet as input and identifies arguments with their semantic roles for every predicate, develops this line of research, representing a critical step towards fine-grained information extraction (e.g., events and opinions) from tweets. ...
Article
Semantic Role Labeling (SRL) for tweets is a meaningful task that can benefit a wide range of applications such as fine-grained information extraction and retrieval from tweets. One main challenge of the task is the lack of annotated tweets, which is required to train a statistical model. We introduce self-training to SRL, leveraging abundant unlabeled tweets to alleviate its depending on annotated tweets. A novel strategy of tweet selection is presented, ensuring the chosen tweets are both correct and informative. More specifically, the correctness is estimated according to the labeling confidences and agreement of two Conditional Random Fields based labelers, which are trained on the randomly evenly spitted labeled data; while the informativeness is in proportion to the maximum distance between the tweet and the already selected tweets. We evaluate our method on a human annotated data set and show that bootstrapping improve a baseline by 3.4% F1.
... Thus retweeting is deemed as a key mechanism of information diffusion in Twitter since the original tweet is propagated to a new set of audiences [1]. A number of research have been studied to find out the factors that affect retweetability [2], [1], [3], [4], retweet prediction [5], [6], [7], and predicting information diffusion by analyzing the properties of tweets and users [8], [9], [10]. ...
Article
Full-text available
A fundamental question in modeling information cascades is to predict the final size of an information cascade. That is, to predict how many reshares a given post will ultimately receive. A growing line of recent research has studied the spread prediction of online content in online social networks (OSN). Predicting the spread of such contents is important for obtaining latest information on different topics, viral marketing etc. Existing approaches on spread prediction are mainly focused on content and past behavior of users. However, not enough attention is paid to the structural characteristics of the network. We apply Latent Dirichlet Allocation (LDA) model on users' past tweets of learn the users latent interests on different topics. We next identify top-k topics relevant to the new tweet using word-topic distribution from LDA. Finally, we measure the spread prediction of the new tweet considering its acceptance in the underlying social network by taking into account the possible effect of all the propagation paths between tweet owner and the recipient user. Our experimental results on real dataset show the efficacy of the proposed approach.
... Tuttavia, anche altri tipi di interazioni, come menzionare un utente o rispondere (reply) in riferimento a un tweet, possono in un certo senso innestare una trasmissione di informazioni o opinioni su un argomento specifico (Kwak et al., 2010). ...
Conference Paper
Full-text available
In this work, we reconstruct the tweet-retweet and tweet-reply relations of opinions about a trending topic on the Twitter platform. We propose a multi-steps approach to derive a signed network expressing the spread of contents and opinions. The first step consists in reducing data dimensionality by means of a clustering procedure on tweets able to identify the concepts they convey. In the second step, focusing on message contents, we adapt different sentiment analysis algorithms in order to determine the sign of both the original tweet (with respect to the trending topic) and the sign of the edge connecting the original tweet to the replies, conditional on the replied tweet. Each tweet will spread its concepts by means of signed retweet and reply relations. The aim is to study the different structure, in terms of both network structure and sentiment, of the signed network related to each concept. A comparative analysis will be possible as well among the various identified signed networks riassunto In questo lavoro ricostruiamo le relazioni tweet-retweet e tweet-reply delle opinioni su uno specifico tema di riferimento sulla piattaforma social Twitter. Proponiamo un approccio costituito da più fasi per derivare una rete "segnata" che esprima il modo in cui si diffondono contenuti e opinioni su tale piattaforma. Il primo passo della procedura consiste nel ridurre la dimensionalità dei dati attraverso il clustering (suddivisione in gruppi) dei tweet originali, in grado di rappresentare ogni gruppo di tweet attraverso il "concetto" principale che esprime. Nella seconda fase, concentrandoci sui contenuti del
... There is a distinction between blogs and microblogs, which refers to online social networks such as Twitter (Bornmann and Haunschild, 2018). Surprisingly, due to the high amount of news that is shared on Twitter, the online social network was even considered as news media itself (Kwak et al., 2010). Scholars have also investigated how news coverage of research outputs is related to citation rates, for example, in specific academic disciplines, such as biomedicine (Dumas-Mallet et al., 2020). ...
Article
Full-text available
We present a brief review of literature related to blogs and news sites; our focus is on publications related to COVID-19. We primarily focus on the role of blogs and news sites in disseminating research on COVID-19 to the wider public, that is knowledge transfer channels. The review is for researchers and practitioners in scholarly communication and social media studies of science who would like to find out more about the role of blogs and news sites during the COVID-19 pandemic. From our review, we see that blogs and news sites are widely used as scholarly communication channels and are closely related to each other. That is, the same research might be reported in blogs and news sites at the same time. They both play a particular role in higher education and research systems, due to the increasing blogging and science communication activity of researchers and higher education institutions (HEIs). We conclude that these two media types have been playing an important role for a long time in disseminating research, which even increased during the COVID-19 pandemic. This can be verified, for example, through knowledge graphs on COVID-19 publications that contain a significant amount of scientific publications mentioned in blogs and news sites.
Chapter
Sentiment classification or opinion mining is conducted to extract the core data representing opinions, and emotions. Motivated by the role of social media on appointing the dimension of the discussions, a solution is proposed to classify users’ sentiments and define the most common topics deliberated by the users in Twitter. The proposed technique classifies user’s sentiments by applying semantic measure to build classifications (classes) from user tweets. The classifications are alternated according to the study domain. This research has picked International Women’s Day as the study domain because of the availability of historical data and the renewability of the topic. The proposed technique is building the classes that define the topics of interest. WuP (Wu and Palmer) measure is used to compute how much the tweets’ words are similar to the classes’ concepts. In the topic domain “International Women’s Day”, 2000 tweets have been used to build the four classes (Activity, Women role, Time, and Surroundings).
Article
Full-text available
The rise of social media accompanied by the Covid-19 Pandemic has instigated a shift in paradigm in the presidential campaigns in Iran from the real world to social media. Unlike previous presidential elections, there was a decrease in physical events and advertisements for the candidates; in turn, the online presence of presidential candidates is significantly increased. Farsi Twitter played a specific role in this matter, as it became the platform for creating political content. In this study, we found traces of organizational activities in Farsi Twitter, and our investigations reveal that the discussion network of the 2021 election is heterogeneous and highly polarized. However, unlike many other documented election cases in Iran and around the globe, communities of candidates’ supporters are very close in one pole, and the other pole is for “Anti-voters” who endorse boycotting the election. With almost no reciprocal ties, these two poles form two echo chambers, one favoring the election and the other for voter suppression. Furthermore, a high presence of bot activity is observed among the most influential users in all of the involved communities.
Article
Twitter as a platform is used for news dissemination, with high volumes of campaigning and populism. This situation coincides with the growth of audiences who embrace social media as their primary news source. In general, effects like the deterioration of political education, misinformation, or ideological segregation then arguably represent a tremendous risk for democratic societies. We analyze a comprehensive data set of the German-speaking Twitter community – a concise, well-defined Twitter population – to understand the extent and form of consumption of controversial news. Our results affirm a high interest of German Twitter users in daily news and corresponding discussions. In-depth studies on the behavior, including tweeting- and grouping patterns, revealed the emergence of a new, more self-assured form of echo chambers.
Article
The continuous proliferation of social media platforms and the exponential increase in users’ engagement are impacting social behavior and leading to various challenges, including the detection and identification of key influencers. In fact the opinions of these influencers are at the core of decision-making strategies, and are leading trends on the virtual social media landscape. Moreover, influencers might play a crucial role when it comes to misinformation and conspiracy during sensitive, controversial and trending events. However, due to the dynamic and unrestricted nature of social media, and diversity of targeted topics and audiences, identifying and ranking key influencers that are impactful, credible, and knowledgeable about their specialist topic or event remains an evolving and open research paradigm. In this paper, we address the aforementioned problem by proposing a novel influence rating and ranking scheme to identify key and highly influential users for a certain event over Twitter using a mixed theme/event based approach while considering historical data and profile reputation. We further apply our approach to a global pandemic case study, the novel Coronavirus, and conduct performance analysis. The presented experimental results and theoretical analysis explore the relevance of our proposed scheme for identifying and ranking reputable and theme/event related influencers.
Chapter
Why and how more and more people get involved and use social networking systems are critical topics in social network analysis (SNA). As a matter of fact, social networking systems bring online a growing number of acquaintances, for many different purposes. Both business interests and personal recreational goals are motivations for using online social networks (OSN) or other social networking systems. The participation in social networks is a phenomenon which has been studied with several theories, and SNA is useful for common business problems, e.g., launching distributed teams, retaining people with vital knowledge for the organization, improving access to knowledge and spreading ideas and innovation. Nevertheless, there are some difficulties, such as anti-social behaviors of participants, lack of incentives, organizational costs and risks. In this article, a survey of the basic features of SNA, participation theories and models are discussed, with emphasis on social capital, information spreading, motivations for participation, and anti-social behaviors of social network users.
Chapter
Twitter is one of the largest sources of real-time information on the Internet and is continuously fed by millions of users around the world. Each of these users publishes text messages with their opinions, concerns, information, or simply their daily happenings. It is a challenge to address the analysis of massive data in the network, just as it is an objective to look for ways to understand everything that data can offer today in terms of knowledge of society and the market. The sector of science communication is still discovering everything that the web 2.0 and social networks can offer to reach all audiences. This article develops a classification model of messages launched on Twitter, on science topics, in Spanish, with machine learning techniques. The training of this type of models requires the creation of a specific corpus in Spanish for the subject of science, which is one of the most laborious tasks. The classifier is able to predict the sentiment of the message in real time on Twitter, with a confidence interval greater than 80%. The results of its evaluation are at 72% accuracy.
Chapter
This study focuses on Twitter affordances and sense-making outcomes during a single emergency situation. By using an interpretive affordance lens, this study aims to assess rumors as influencers of sense-making during the 2017 Manchester terrorist attack. The authors combined a quantitative network analysis with a qualitative content analysis to assess the role of rumors during the emergency management after the attack. This study provides argumentative grounds for the notion of sense-making as a consequence of affording social media and builds on prior research to place sense-making as a cognitive process within the affordance concept. The authors emphasize new potentials to prevent or control rumors on social media for practitioners and contribute insights to rumor research. Namely, the authors contribute a novel perspective of rumors and their role during emergency management on social media.
Article
Internet-based technologies, which mark a revolutionary period in journalism, have revealed new understandings in journalism. This change in the news media has affected the established forms of production and consumption, and has transformed traditional relations and understandings. Expanding information distribution capacities and reader expectations growing, differentiating and specializing necessitate using new communication technologies more effectively. Social media platforms and online environments, which provide additional channels to reach the reader, both open new areas for news producers and reveal different usage patterns from traditional media for consumers. The aim of the study is to investigate the trends in news production and consumption in different digital media platforms. In this context, Twitter, which functions more as a news sharing application, Instagram, which has increased in popularity in recent years, and Telegram, which stands out with its bot and channel creation features, are discussed and evaluated in terms of news production, distribution and consumption styles. The shares on the Cia Medya Telegram channel, Cia Haber Instagram page and Cia Haber Twitter page, which constitute the sample of the research, were analyzed by content analysis method, and it was concluded that the Telegram platform is used more actively in content production, distribution and interaction.
Analysis of social networks with limited data access is challenging for third parties. To address this challenge, a number of studies have developed algorithms that estimate properties of social networks via a simple random walk. However, most existing algorithms do not assume private nodes that do not publish their neighbors’ data when they are queried in empirical social networks. Here we propose a practical framework for estimating properties via random walk-based sampling in social networks involving private nodes. First, we develop a sampling algorithm by extending a simple random walk to the case of social networks involving private nodes. Then, we propose estimators with reduced biases induced by private nodes for the network size, average degree, and density of the node label. Our results show that the proposed estimators reduce biases induced by private nodes in the existing estimators by up to \(92.6\% \) on social network datasets involving private nodes.
Article
Social media users hardly know who is reading their posts, but they form ideas about their readership. Researchers have coined the term imagined audience for the social groups that actors imagine seeing their public communication. However, social groups are not the only aspect that requires imagination: In the potentially borderless online environment, the geographical scope and locations of one’s audience are also unknown. Furthermore, research has demonstrated that imagined audiences vary between people and situations, but what explains these variations is unclear. In this article, we address these two gaps—the geographical scope and predictors of imagined audiences—using data from a mobile experience sampling method study of 105 active Twitter users from Berlin, Germany. Our results show that respondents mostly think of a geographically broad audience, which is spread out across the country or even globally. The imagined geographical scope and social groups depend on both the communicator and the usage situation. While the audience’s social composition especially depends on tweet content and respondents’ sociodemographic characteristics, the geographical scope is best explained by respondents’ biography and personal mobility, including their experience of living in other countries and local residential duration.
Article
People’s attitudes towards hydraulic fracturing (fracking) can be shaped by socio-demographics, economic development, social equity and politics, environmental impacts, and fracking-related information. Existing research typically conducts surveys and interviews to study public attitudes towards fracking among a small group of individuals in a specific geographic area, where limited samples may introduce bias. Here, we compiled geo-referenced social media big data from Twitter during 2018–2019 for the entire United States to present a more holistic picture of people’s attitudes towards fracking. We used a multiscale geographically weighted regression (MGWR) to investigate county-level relationships between the aforementioned factors and percentages of negative tweets concerning fracking. Results indicate spatial heterogeneity and varying scales of those associations. Counties with higher median household income, larger African American populations, and/or lower educational level are less likely to oppose fracking, and these associations show global stationarity in all contiguous US counties. Eastern and Central US counties with higher unemployment rates, counties east of the Great Plains with less fracking sites nearby, and Western and Gulf Coast region counties with higher health insurance enrolments are more likely to oppose fracking activities. These three variables show clear East-West geographical divides in influencing public perspective on fracking. In counties across the southern Great Plains, negative attitudes towards fracking are less often vocalized on Twitter as the share of Republican voters increases. These findings have implications for both predicting public perspectives and needed policy adjustments. The methodology can also be conveniently applied to investigate public perspectives on other controversial topics.
Article
Full-text available
Live streaming commerce is emerging as one new business model in e-commerce, with an influx of key opinion leaders (KOLs) flushing into the business as live streamers. Prior literature has noticed the improved communication channels and product differentiation of the live streaming commerce, but how will the proliferation of live streaming commerce affect sales stays under-explored. In addition, how to select the right KOL streamer is poorly understood. This study examines how the channel proliferation and stock-keeping unit proliferation affect live stream sales by increasing consumers’ live streaming views converting into purchases, while the KOL’s popularity, professionalism, attractiveness to female fans, and quote moderate the above mediation effect. This study contributes to the e-commerce literature by revealing the proliferation affecting sales performance mechanisms and providing practical guidance about selecting the right KOL in live streaming commerce.
Article
Human beings tend to organize themselves in groups. These groups need to be robust to enable effective cooperation among individuals. According to some researchers (Ostrom, 1990; Suárez et al., 2011), a collective group identity based on shared cultural symbols, a shared religion or a common language is key to foster cooperation. To investigate this hypothesis, data was extracted from Twitter and two network graphs (the nodes were Twitter users and the links were the relationships among users) were created around two Spanish political parties during the 2017 Catalan elections, Ciudadanos and Podemos. On the one hand, the members of Ciudadanos’ network shared ideological positioning and cultural collective identity (they identified themselves with Spanish cultural symbols). On the other hand, Podemos’ members in the network shared ideological positioning but not a cultural identity (some of Podemos’ users identified with Catalan symbols and others with Spanish symbols). The results of different network cohesion metrics (e.g., Clustering Coefficient and Average Distance) show that Ciudadanos’ network was more cohesive than Podemos’ one.
Article
Although an increasing amount of aggressive and polarized tweets about climate change are being observed, little is known about how they spread on Twitter. This study focuses on how different types of network gatekeepers use aggressive styles and how the styles affect their propagation. The current study employed a computational method and identified 951 influential accounts from 7.25 million tweets about climate change in 2019 and 2020. We analyzed their use of aggression and politicized cues, and the relationship with the volume of retweets. Results showed that even though aggressive tweets were a small portion of the overall tweets about climate change, aggressive tweets were more likely to be politicized and retweeted. Specifically, aggressive tweets from politicians received the most retweets and news media amplified the aggression. The findings of this study build upon the current knowledge of the use of aggression online and provide practical implications for environmental communicators.
Article
Online social network such as Twitter, Facebook and Instagram are increasingly becoming the go-to medium for users to acquire information and discuss what is happening globally. Understanding real-time conversations with masses on social media platforms can provide rich insights into events, provided that there is a way to detect and characterise events. To this end, in the past twenty years, many researchers have been developing event detection methods based on the data collected from various social media platforms. The developed methods for discovering events are generally modular in design and novel in scale and speed. To review the research in this field, we line up existing works for event detection in online social networks and organise them to provide a comprehensive and in-depth survey. This survey comprises three major parts: research methodologies, the review of state-of-the-art literature and the evolution of significant challenges. Each part is supposed to attract readers with different motivations and expectations on the ‘things’ delivered in this survey. For example, the methodologies provide the life-cycle to design new event detection models, from data collection to model evaluations. A timeline and a taxonomy of existing methods are also introduced to elaborate the development of various technologies under the umbrella of event detection. These two parts benefit those with a background in event detection and want to commit a deep exploration of existing models such as discussing their pros and cons alike. The third part shows the development of the major open issues in this field. It also indicates the milestones of each challenge in terms of typical models. Our survey can contribute to the community by highlighting possible new problem statements and opening new research directions.
Article
The clustering coefficient has been introduced to capture the social phenomena that a friend of a friend tends to be my friend. This metric has been widely studied and has shown to be of great interest to describe the characteristics of a social graph. But, the clustering coefficient is originally defined for a graph in which the links are undirected, such as friendship links (Facebook) or professional links (LinkedIn). For a graph in which links are directed from a source of information to a consumer of information, it is no more adequate. We show that former studies have missed much of the information contained in the directed part of such graphs. In this article, we introduce a new metric to measure the clustering of directed social graphs with interest links, namely the interest clustering coefficient. We compute it (exactly and using sampling methods) on a very large social graph, a Twitter snapshot with 505 million users and 23 billion links, as well as other various datasets. We additionally provide the values of the formerly introduced directed and undirected metrics, a first on such a large snapshot. We observe a higher value of the interest clustering coefficient than classic directed clustering coefficients, showing the importance of this metric. By studying the bidirectional edges of the Twitter graph, we also show that the interest clustering coefficient is more adequate to capture the interest part of the graph while classic ones are more adequate to capture the social part. We also introduce a new model able to build random networks with a high value of interest clustering coefficient. We finally discuss the interest of this new metric for link recommendation.
Article
Full-text available
Online social networks have emerged as useful tools to communicate or share information and news on a daily basis. One of the most popular networks is Twitter, where users connect to each other via directed follower relationships. Twitter follower graphs have been studied and described with various topological features. Collecting Twitter data, especially crawling the followers of users, is a tedious and time-consuming process and the data needs to be treated carefully due to its sensitive nature, containing personal user information. We therefore aim at the fast generation of directed social network graphs with reciprocal edges and high clustering. Our proposed method is based on a previously developed model, but relies on less hyperparameters and has a significantly lower runtime. Results show that our method does not only replicate the crawled directed Twitter graphs well regarding several topological features and the application of an epidemics spreading process, but that it is also highly scalable which allows the fast creation of bigger graphs that exhibit similar properties as real-world networks.
Chapter
In recent years, state-backed troll accounts have been adopted extensively by many political parties, organizations, and governments to negatively influence political systems, persecute perceived opponents, and exacerbate divisiveness within societies. Thus, the need for an automatic state-backed troll classification system has increased. Various algorithms have been proposed in the literature to handle this problem, but a majority of them consider all types of trolls as one type which decreases the performance of classification algorithms. Our goal in this paper is to design a thorough method for detecting state-backed trolls on Twitter with the ability to work efficiently in any case regardless of the language, the location, and the purpose of the troll account. For accurate classification, a set of novel effective and powerful features from various categories are proposed. To train our algorithm, we gathered a large and relevant dataset from Twitter. The results show that the proposed algorithm achieves high classification accuracy (approximately 99%) and has the ability to classify state-backed troll accounts regardless of the language or the location of the account.KeywordsTroll detectionState-backed trollsOnline antisocial behaviorsSocial mediaTwitter troll accounts
Article
Full-text available
The main aim in this paper is to create a friend suggestion algorithm that can be used to recommend new friends to a user on Twitter when their existing friends and other details are given. The information gathered to make these predictions includes the user's friends, tags, tweets, language spoken, ID, etc. Based on these features, the authors trained their models using supervised learning methods. The machine learning-based approach used for this purpose is the k-nearest neighbor approach. This approach is by and large used to decrease the dimensionality of the information alongside its feature space. K-nearest neighbor classifier is normally utilized in arrangement-based situations to recognize and distinguish between a few parameters. By using this, the features of the central user's non-friends were compared. The friends and communities of a user are likely to be very different from any other user. Due to this, the authors select a single user and compare the results obtained for that user to suggest friends.
Information diffusion, spreading of infectious diseases, and spreading of rumors are fundamental processes occurring in real-life networks. In many practical cases, one can observe when nodes become infected, but the underlying network, over which a contagion or information propagates, is hidden. Inferring properties of the underlying network is important since these properties can be used for constraining infections, forecasting, viral marketing, and so on. Moreover, for many applications, it is sufficient to recover only coarse high-level properties of this network rather than all its edges. This article conducts a systematic and extensive analysis of the following problem: Given only the infection times, find communities of highly interconnected nodes. This task significantly differs from the well-studied community detection problem since we do not observe a graph to be clustered. We carry out a thorough comparison between existing and new approaches on several large datasets and cover methodological challenges specific to this problem. One of the main conclusions is that the most stable performance and the most significant improvement on the current state-of-the-art are achieved by our proposed simple heuristic approaches agnostic to a particular graph structure and epidemic model. We also show that some well-known community detection algorithms can be enhanced by including edge weights based on the cascade data.
Article
Nowadays, research on social networks has attracted a large amount of attention from both academic and industrial societies. To understand the diffusion process and guide viral marketing, it is of importance to model and then estimate the influence of a seed user on a target user, which is defined as target influence in this paper. In famous diffusion models like independent cascade model and linear threshold model, tremendous computational costs are usually required in estimating influence probability through simulation. In this paper, we adopt duplicate forwarding model, and propose two measurements for the target influence, which can be analyzed theoretically. The former is the average number of duplicates the target user receives, and the latter is the probability of the target user receiving at least one duplicate. We further find the former will approach infinity if the spread intensity exceeds some threshold, but the latter can be adopted without this constraint. We also seek to use the latter to estimate the influence probability in the independent cascade model, and find it achieves much better accuracy than other heuristic metrics. All results are verified through simulations in real-world social networks, and we believe approach proposed here can provide insights to solve the problems like target influence maximization and influence maximization.
Article
Full-text available
The purpose of topic popularity prediction is to predict whether a topic on the Internet will become popular. Various elegant models have been proposed for this problem. However, different datasets and evaluation metrics they use lead to low comparability. In this paper, we conduct a comprehensive survey, propose a modularized evaluation scheme for evaluating the models and apply it to existing methods. Our scheme has four modules: categorization; qualitative evaluation on several metrics; quantitative experiment on real world data; and final ranking with risk matrix and MinDis to reflect performances under different scenarios. Furthermore, we analyze the efficiency and contribution of features used in feature-oriented methods. Our work helps users compare models and select appropriate ones for different requirements.
Article
Online Social Networks (OSNs) have gained enormous popularity in recent years. They provide a dynamic platform for sharing content (text messages or multimedia) and for facilitating communication between friends and acquaintances. Microblogging services are a popular form of OSNs. They allow sending small messages in a one-to-many messaging model so that users can communicate with their favorite celebrity, brand, politician, or other regular users without the obligation of a pre-existing social relationship. A chain of privacy-related scandals linked to questionable data handling practices in microblogging services has arisen in the last past few years. Most current microblogging service providers offer centralized services and their business model is based on monitoring, analyzing, and selling users’ activity and patterns. In the end, the personal information shared by the users to benefit from the free-of-charge services is used for the underlying payment in such systems. In this paper, we present Garlanet, a privacy-aware censorship-resistant microblogging social network that does not rely on a centralized service provider as all data is hosted in computers voluntarily contributed by the users of the system. Garlanet provides microblogging functionalities while protecting privacy and preserving the confidentiality and integrity of users and data. It ensures that users’ identities and their social graphs are hidden from the system and adversaries and it provides availability and scalability of the services. We also evaluate the privacy level of Garlanet and we compare it with the privacy level of eight other microblogging systems.
Article
Full-text available
The COVID-19 pandemic has created complex problems that require organizations to collaborate within and across the sector line. Social media data can provide insights into how nonprofits interact for the pandemic response from both social network and geographical perspectives. This study innovatively investigated the connection and interaction patterns among 74 National Voluntary Organizations Active in Disaster (NVOAD) nonprofits and three government agencies based on structural analyses and content analyses of their Twitter communications during the long-term global COVID-19 pandemic. The daily tweeting quantities of all nonprofits were generally consistent with the pandemic severity in the United States before July 2020 and remained stable afterward. Nonprofits' tweets can reflect their purposes of sharing information, building communities, and taking actions for disaster response. Government agencies played leadership roles in providing COVID-19 guidelines and information. Human services, International and Foreign Affairs, and Public and Societal Benefit nonprofits, especially American Red Cross played central roles in the nonprofit communication network. Possible explanations include: (1) Geographically, connections and interactions among nonprofits are more likely to happen within the same city or in neighboring states. (2) Both mission homophily and heterophily contribute to connections and interactions among nonprofits, depending on their subsectors. The findings not only help the public better understand how nonprofits are collaboratively fighting the pandemic, but also provide guidance for nonprofits to plan for better interactions and communications in future disaster response.
Chapter
The last decade has seen an increasing number of online social network (OSN) users. As they grew more and more popular over the years, OSNs became also more and more profitable. Indeed, users share a considerable amount of personal information on these sites, both intentionally and unintentionally. And thanks to this enormous user base, social networks are able to generate recommendations, attract numerous advertisers, and sell data to companies. This situation has sparked a lot of interest in the research community. Indeed, users grow more uncomfortable with the idea that they do not have full control over their own data. The lack of control can even be amplified when a user holds an account on various OSNs. The data she shares is then spread over multiple platforms. This chapter addresses the notion of portable profile, which could help users to gain more control or more awareness of the data collected about her. In this chapter, the authors discuss the advantages and drawbacks of a portable profile. Secondly, they propose a conceptual model for the data in this unified profile.
Article
Full-text available
We present a study of anonymized data capturing a month of high-level communication activities within the whole of the Microsoft Messenger instant-messaging system. We examine characteristics and patterns that emerge from the collective dynamics of large numbers of people, rather than the actions and characteristics of individuals. The dataset contains summary properties of 30 billion conversations among 240 million people. From the data, we construct a communication graph with 180 million nodes and 1.3 billion undirected edges, creating the largest social network constructed and analyzed to date. We report on multiple aspects of the dataset and synthesized graph. We find that the graph is well-connected and robust to node removal. We investigate on a planetary-scale the oft-cited report that people are separated by "six degrees of separation" and find that the average path length among Messenger users is 6.6. We also find that people tend to communicate more with each other when they have similar age, language, and location, and that cross-gender conversations are both more frequent and of longer duration than conversations with the same gender.
Article
Full-text available
There has been rapid growth in the study of diffusion across organizations and social movements in recent years, fueled by interest in institutional arguments and in network and dynamic analysis. This research develops a sociologically grounded account of change emphasizing the channels along which practices flow. Our review focuses on characteristic lines of argument, emphasizing the structural and cultural logic of diffusion processes. We argue for closer theoretical attention to why practices diffuse at different rates and via different pathways in different settings. Three strategies for further development are proposed: broader comparative research designs, closer inspection of the content of social relations between collective actors, and more attention to diffusion industries run by the media and communities of experts.
Conference Paper
Full-text available
Understanding how users behave when they connect to social networking sites creates opportunities for better interface design, richer studies of social interactions, and improved design of content distribution systems. In this paper, we present a rst of a kind analysis of user workloads in on- line social networks. Our study is based on detailed click- stream data, collected over a 12-day period, summarizing HTTP sessions of 37,024 users who accessed four popular social networks: Orkut, MySpace, Hi5, and LinkedIn. The data were collected from a social network aggregator web- site in Brazil, which enables users to connect to multiple social networks with a single authentication. Our analysis of the clickstream data reveals key features of the social net- work workloads, such as how frequently people connect to social networks and for how long, as well as the types and sequences of activities that users conduct on these sites. Ad- ditionally, we crawled the social network topology of Orkut, so that we could analyze user interaction data in light of the social graph. Our data analysis suggests insights into how users interact with friends in Orkut, such as how frequently users visit their friends' or non-immediate friends' pages. In summary, our analysis demonstrates the power of using clickstream data in identifying patterns in social network workloads and social interactions. Our analysis shows that browsing, which cannot be inferred from crawling publicly available data, accounts for 92% of all user activities. Con- sequently, compared to using only crawled data, considering silent interactions like browsing friends' pages increases the measured level of interaction among users.
Conference Paper
Full-text available
In this paper, we report research results investigating micro-blogging as a form of online word of mouth branding. We analyzed 149,472 micro-blog postings containing branding comments, sentiments, and opinions. We investigated the overall structure of these micro-blog postings, types of expressions, and sentiment fluctuations. Of the branding micro-blogs, nearly 20 percent contained some expressions of branding sentiments. Of these tweets with sentiments, more than 50 percent were positive and 33 percent critical of the company or product. We discuss the implications for organizations in using micro-blogging as part of their overall marketing strategy and branding campaigns.
Conference Paper
Full-text available
Whether they are modeling bookmarking behavior in Flickr or cascades of failure in large networks, models of diffusion often start with the assumption that a few nodes start long chain reactions, resulting in large-scale cascades. While rea- sonable under some conditions, this assumption may not hold for social media networks, where user engagement is high and information may enter a system from multiple dis- connected sources. Using a dataset of 262,985 Facebook Pages and their as- sociated fans, this paper provides an empirical investigation of diffusion through a large social media network. Although Facebook diffusion chains are often extremely long (chains of up to 82 levels have been observed), they are not usually the result of a single chain-reaction event. Rather, these dif- fusion chains are typically started by a substantial number of users. Large clusters emerge when hundreds or even thousands of short diffusion chains merge together. This paper presents an analysis of these diffusion chains using zero-inflated negative binomial regressions. We show that after controlling for distribution effects, there is no meaningful evidence that a start node's maximum diffusion chain length can be predicted with the user's demographics or Facebook usage characteristics (including the user's number of Facebook friends). This may provide insight into future research on public opinion formation.
Conference Paper
Full-text available
We present a study of anonymized data capturing a month of high-level communication activities within the whole of the Microsoft Messenger instant-messaging system. We examine characteristics and patterns that emerge from the collective dynamics of large numbers of people, rather than the actions and characteristics of individuals. The dataset contains summary properties of 30 billion conversations among 240 million people. From the data, we construct a communication graph with 180 million nodes and 1.3 billion undirected edges, creating the largest social network constructed and analyzed to date. We report on multiple aspects of the dataset and synthesized graph. We find that the graph is well-connected and robust to node removal. We investigate on a planetary-scale the oft-cited report that people are separated by "six degrees of separation" and find that the average path length among Messenger users is 6.6. We find that people tend to communicate more with each other when they have similar age, language, and location, and that cross-gender conversations are both more frequent and of longer duration than conversations with the same gender.
Conference Paper
Full-text available
Google, Yahoo and MSN all provide both web user interfaces (WUIs) and application programming interfaces (APIs) to their collections. Whether building collections of resources or studying the search engines themselves, the search engines request that researchers use their APIs and not "scrape" the WUIs. However, anecdotal evidence suggests the interfaces produce dierent results. We provide the first in depth quan- titative analysis of the results produced by the Google, MSN and Yahoo API and WUI interfaces. We have queried both interfaces for five months and found significant discrepancies between the interfaces in several categories. In general, we found MSN to produce the most consistent results between their two interfaces. Our findings suggest that the API in- dexes are not older, but they are probably smaller for Google and Yahoo. We also examined how search results decay over time and built predictive models based on the observed de- cay rates. Based on our findings, it can take over a year for half of the top 10 results to a popular query to be replaced in Google and Yahoo; for MSN it may take only 2-3 months.
Conference Paper
Full-text available
Online social networking services are among the most popular Internet services according to Alexa.com and have become a key feature in many Internet services. Users interact through various features of online social networking services: making friend relationships, sharing their photos, and writing comments. These friend relationships are expected to become a key to many other features in web services, such as recommendation engines, security measures, online search, and personalization issues. However, we have very limited knowledge on how much interaction actually takes place over friend relationships declared online. A friend relationship only marks the beginning of online interaction. Does the interaction between users follow the declaration of friend relationship? Does a user interact evenly or lopsidedly with friends? We venture to answer these questions in this work. We construct a
Conference Paper
Full-text available
Online social networking sites like My Space, Facebook, and Flickr have become a popular way to share and disseminate content. Their massive popularity has led to viral marketing techniques that attempt to spread content, products, and ideas on these sites. However, there is little data publicly available on viral propagation in the real world and few studies have characterized how information spreads over current online social networks. In this paper, we collect and analyze large-scale traces of information dissemination in the Flickr social network. Our analysis, based on crawls of the favorite markings of 2.5 million users on 11 million photos, aims at answering three key questions: (a) how widely does information propagate in the social network? (b) how quickly does information propagate? and (c) what is the role of word-of-mouth exchanges between friends in the overall propagation of information in the network? Contrary to viral marketing "intuition," we find that (a) even popular photos do not spread widely throughout the network, (b) even popular photos spread slowly through the network, and (c) information exchanged between friends is likely to account for over 50% of all favorite-markings, but with a significant delay at each hop. Copyright is held by the International World Wide Web Conference Committee (IW3C2).
Article
Full-text available
Similarity breeds connection. This principle--the homophily principle--structures network ties of every type, including marriage, friendship, work, advice, support, information transfer, exchange, comembership, and other types of relationship. The result is that people's personal networks are homogeneous with regard to many sociodemographic, behavioral, and intrapersonal characteristics. Homophily limits people's social worlds in a way that has powerful implications for the information they receive, the attitudes they form, and the interactions they experience. Homophily in race and ethnicity creates the strongest divides in our personal environments, with age, religion, education, occupation, and gender following in roughly that order. Geographic propinquity, families, organizations, and isomorphic positions in social systems all create contexts in which homophilous relations form. Ties between nonsimilar individuals also dissolve at a higher rate, which sets the stage for the formation of niches (localized positions) within social space. We argue for more research on: (a) the basic ecological processes that link organizations, associations, cultural communities, social movements, and many other social forms; (b) the impact of multiplex ties on the patterns of homophily; and (c) the dynamics of network change over time through which networks and other social entities co-evolve.
Article
Full-text available
Scholars, advertisers and political activists see massive online social networks as a representation of social interactions that can be used to study the propagation of ideas, social bond dynamics and viral marketing, among others. But the linked structures of social networks do not reveal actual interactions among people. Scarcity of attention and the daily rythms of life and work makes people default to interacting with those few that matter and that reciprocate their attention. A study of social interactions within Twitter reveals that the driver of usage is a sparse and hidden network of connections underlying the declared set of friends and followers.
Article
Full-text available
Cellular metabolism, the integrated interconversion of thousands of metabolic substrates through enzyme-catalysed biochemical reactions, is the most investigated complex intracellular web of molecular interactions. Although the topological organization of individual reactions into metabolic networks is well understood, the principles that govern their global functional use under different growth conditions raise many unanswered questions. By implementing a flux balance analysis of the metabolism of Escherichia coli strain MG1655, here we show that network use is highly uneven. Whereas most metabolic reactions have low fluxes, the overall activity of the metabolism is dominated by several reactions with very high fluxes. E. coli responds to changes in growth conditions by reorganizing the rates of selected fluxes predominantly within this high-flux backbone. This behaviour probably represents a universal feature of metabolic activity in all cells, with potential implications for metabolic engineering.
Article
Full-text available
Motivated by several applications, we introduce various distance measures between “top k lists.” Some of these distance measures are metrics, while others are not. For each of these latter distance measures, we show that they are “almost ” a metric in the following two seemingly unrelated aspects: (i) they satisfy a relaxed version of the polygonal (hence, triangle) inequality, and (ii) there is a metric with positive constant multiples that bound our measure above and below. This is not a coincidence—we show that these two notions of almost being a metric are same. Based on the second notion, we define two distance measures to be equivalent if they are bounded above and below by constant multiples of each other. We thereby identify a large and robust equivalence class of distance measures. Besides the applications to the task of identifying good notions of (dis-)similarity between two top k lists, our results imply polynomial-time constant-factor approximation algorithms for the rank aggregation problem with respect to a large class of distance measures.
Article
Full-text available
Models for the processes by which ideas and influence propagate through a social network have been studied in a number of domains, including the diffusion of medical and technological innovations, the sudden and widespread adoption of various strategies in game-theoretic settings, and the effects of "word of mouth" in the promotion of new products. Recently, motivated by the design of viral marketing strategies, Domingos and Richardson posed a fundamental algorithmic problem for such social network processes: if we can try to convince a subset of individuals to adopt a new product or innovation, and the goal is to trigger a large cascade of further adoptions, which set of individuals should we target? We consider this problem in several of the most widely studied models in social network analysis. The optimization problem of selecting the most influential nodes is NP-hard here, and we provide the first provable approximation guarantees for efficient algorithms. Using an analysis framework based on submodular functions, we show that a natural greedy strategy obtains a solution that is provably within 63% of optimal for several classes of models; our framework suggests a general approach for reasoning about the performance guarantees of algorithms for these types of influence problems in social networks.
Article
Full-text available
We present an analysis of a person-to-person recommendation network, consisting of 4 million people who made 16 million recommendations on half a million products. We observe the propagation of recommendations and the cascade sizes, which we explain by a simple stochastic model. We analyze how user behavior varies within user communities defined by a recommendation network. Product purchases follow a 'long tail' where a significant share of purchases belongs to rarely sold items. We establish how the recommendation network grows over time and how effective it is from the viewpoint of the sender and receiver of the recommendations. While on average recommendations are not very effective at inducing purchases and do not spread very far, we present a model that successfully identifies communities, product and pricing categories for which viral marketing seems to be very effective.
Article
Web 2.0 has brought about several new applications that have enabled arbitrary subsets of users to communicate with each other on a social basis. Such communication increasingly happens not just on Facebook and MySpace but on several smaller network applications such as Twitter and Dodgeball. We present a detailed characterization of Twitter, an application that allows users to send short messages. We gathered three datasets (covering nearly 100,000 users) including constrained crawls of the Twitter network using two different methodologies, and a sampled collection from the publicly available timeline. We identify distinct classes of Twitter users and their behaviors, geographic growth patterns and current size of the network, and compare crawl results obtained under rate limiting constraints.
Article
Similarity breeds connection. This principle - the homophily principle - structures network ties of every type, including marriage, friendship, work, advice, support, information transfer, exchange, comembership, and other types of relationship. The result is that people's personal networks are homogeneous with regard to many sociodemographic, behavioral, and intrapersonal characteristics. Homophily limits people's social worlds in a way that has powerful implications for the information they receive, the attitudes they form, and the interactions they experience. Homophily in race and ethnicity creates the strongest divides in our personal environments, with age, religion, education, occupation, and gender following in roughly that order. Geographic propinquity, families, organizations, and isomorphic positions in social systems all create contexts in which homophilous relations form. Ties between nonsimilar individuals also dissolve at a higher rate, which sets the stage for the formation of niches (localized positions) within social space. We argue for more research on: (a) the basic ecological processes that link organizations, associations, cultural communities, social movements, and many other social forms; (b) the impact of multiplex ties on the patterns of homophily; and (c) the dynamics of network change over time through which networks and other social entities co-evolve.
Article
The importance of a Web page is an inherently subjective matter, which depends on the readers interests, knowledge and attitudes. But there is still much that can be said objectively about the relative importance of Web pages. This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them. We compare PageRank to an idealized random Web surfer. We show how to efficiently compute PageRank for large numbers of pages. And, we show how to apply PageRank to search and to user navigation.
Conference Paper
Tracking new topics, ideas, and "memes" across the Web has been an issue of considerable interest. Recent work has developed meth- ods for tracking topic shifts over long time scales, as well as abrupt spikes in the appearance of particular named entities. However, these approaches are less well suited to the identification of content that spreads widely and then fades over time scales on the order of days — the time scale at which we perceive news and events. We develop a framework for tracking short, distinctive phrases that travel relatively intact through on-line text; developing scalable algorithms for clustering textual variants of such phrases, we iden- tify a broad class of memes that exhibit wide spread and rich vari- ation on a daily basis. As our principal domain of study, we show how such a meme-tracking approach can provide a coherent repre- sentation of the news cycle — the daily rhythms in the news media that have long been the subject of qualitative interpretation but have never been captured accurately enough to permit actual quantitative analysis. We tracked 1.6 million mainstream media sites and blogs over a period of three months with the total of 90 million articles and we find a set of novel and persistent temporal patterns in the news cycle. In particular, we observe a typical lag of 2.5 hours between the peaks of attention to a phrase in the news media and in blogs respectively, with divergent behavior around the overall peak and a "heartbeat"-like pattern in the handoff between news and blogs. We also develop and analyze a mathematical model for the kinds of temporal variation that the system exhibits.
Conference Paper
Social networks are popular platforms for interaction, communication and collaboration between friends. Researchers have recently proposed an emerging class of applications that leverage relationships from social networks to improve security and performance in applications such as email, web browsing and overlay routing. While these applications often cite social network connectivity statistics to support their designs, researchers in psychology and sociology have repeatedly cast doubt on the practice of inferring meaningful relationships from social network connections alone. This leads to the question: Are social links valid indicators of real user interaction? If not, then how can we quantify these factors to form a more accurate model for evaluating socially-enhanced applications? In this paper, we address this question through a detailed study of user interactions in the Facebook social network. We propose the use of interaction graphs to impart meaning to online social links by quantifying user interactions. We analyze interaction graphs derived from Facebook user traces and show that they exhibit significantly lower levels of the "small-world" properties shown in their social graph counterparts. This means that these graphs have fewer "supernodes" with extremely high degree, and overall network diameter increases significantly as a result. To quantify the impact of our observations, we use both types of graphs to validate two well-known social-based applications (RE and SybilGuard). The results reveal new insights into both systems, and confirm our hypothesis that studies of social applications should use real indicators of user interactions in lieu of social graphs.
Conference Paper
Micro-blogs, a relatively new phenomenon, provide a new communication channel for people to broadcast information that they likely would not share otherwise using existing channels (e.g., email, phone, IM, or weblogs). Micro-blogging has become popu-lar quite quickly, raising its potential for serving as a new informal communication medium at work, providing a variety of impacts on collaborative work (e.g., enhancing information sharing, building common ground, and sustaining a feeling of connectedness among colleagues). This exploratory research project is aimed at gaining an in-depth understanding of how and why people use Twitter - a popular micro-blogging tool - and exploring micro-blog's poten-tial impacts on informal communication at work.
Conference Paper
In this paper, we consider the evolution of structure within large online social networks. We present a series of measurements of two such networks, together comprising in excess of five million people and ten million friendship links, annotated with metadata capturing the time of every event in the life of the network. Our measurements expose a surprising segmentation of these networks into three regions: singletons who do not participate in the network; isolated communities which overwhelmingly display star structure; and a giant component anchored by a well-connected core region which persists even in the absence of stars.We present a simple model of network growth which captures these aspects of component structure. The model follows our experimental results, characterizing users as either passive members of the network; inviters who encourage offline friends and acquaintances to migrate online; and linkers who fully participate in the social evolution of the network.
Conference Paper
How do real graphs evolve over time? What are "normal" growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network, or in a very small number of snapshots; these include heavy tails for in- and out-degree distributions, communities, small-world phenomena, and others. However, given the lack of information about network evolution over long periods, it has been hard to convert these findings into statements about trends over time.Here we study a wide range of real graphs, and we observe some surprising phenomena. First, most of these graphs densify over time, with the number of edges growing super-linearly in the number of nodes. Second, the average distance between nodes often shrinks over time, in contrast to the conventional wisdom that such distance parameters should increase slowly as a function of the number of nodes (like O(log n) or O(log(log n)).Existing graph generation models do not exhibit these types of behavior, even at a qualitative level. We provide a new graph generator, based on a "forest fire" spreading process, that has a simple, intuitive justification, requires very few parameters (like the "flammability" of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study.
Article
We study the relaxation response of a social system after endogenous and exogenous bursts of activity using the time series of daily views for nearly 5 million videos on YouTube. We find that most activity can be described accurately as a Poisson process. However, we also find hundreds of thousands of examples in which a burst of activity is followed by an ubiquitous power-law relaxation governing the timing of views. We find that these relaxation exponents cluster into three distinct classes and allow for the classification of collective human dynamics. This is consistent with an epidemic model on a social network containing two ingredients: a power-law distribution of waiting times between cause and action and an epidemic cascade of actions becoming the cause of future actions. This model is a conceptual extension of the fluctuation-dissipation theorem to social systems [Ruelle, D (2004) Phys Today 57:48–53] and [Roehner BM, et al., (2004) Int J Mod Phys C 15:809–834], and provides a unique framework for the investigation of timing in complex systems. • complex systems • human dynamics
Article
Networks of coupled dynamical systems have been used to model biological oscillators, Josephson junction arrays, excitable media, neural networks, spatial games, genetic control networks and many other self-organizing systems. Ordinarily, the connection topology is assumed to be either completely regular or completely random. But many biological, technological and social networks lie somewhere between these two extremes. Here we explore simple models of networks that can be tuned through this middle ground: regular networks 'rewired' to introduce increasing amounts of disorder. We find that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs. We call them 'small-world' networks, by analogy with the small-world phenomenon (popularly known as six degrees of separation. The neural network of the worm Caenorhabditis elegans, the power grid of the western United States, and the collaboration graph of film actors are shown to be small-world networks. Models of dynamical systems with small-world coupling display enhanced signal-propagation speed, computational power, and synchronizability. In particular, infectious diseases spread more easily in small-world networks than in regular lattices.
Article
We argue that social networks differ from most other types of networks, including technological and biological networks, in two important ways. First, they have nontrivial clustering or network transitivity and second, they show positive correlations, also called assortative mixing, between the degrees of adjacent vertices. Social networks are often divided into groups or communities, and it has recently been suggested that this division could account for the observed clustering. We demonstrate that group structure in networks can also account for degree correlations. We show using a simple model that we should expect assortative mixing in such networks whenever there is variation in the sizes of the groups and that the predicted level of assortative mixing compares well with that observed in real-world networks.
Article
Although information, news, and opinions continuously circulate in the worldwide social network, the actual mechanics of how any single piece of information spreads on a global scale have been a mystery. Here, we trace such information-spreading processes at a person-by-person level using methods to reconstruct the propagation of massively circulated Internet chain letters. We find that rather than fanning out widely, reaching many people in very few steps according to “small-world” principles, the progress of these chain letters proceeds in a narrow but very deep tree-like pattern, continuing for several hundred steps. This suggests a new and more complex picture for the spread of information through a social network. We describe a probabilistic model based on network clustering and asynchronous response times that produces trees with this characteristic structure on social-network data. • social networks • algorithms • epidemics • diffusion in networks
Article
IntroductionComputers and epidemiologyEpidemic spreading in homogeneous networksReal data analysisEpidemic spreading in scale-free networks Analytic solution for the Barabási-Albert networkFinite size scale-free networksImmunization of scale-free networks Uniform immunizationTargeted immunizationConclusions References Analytic solution for the Barabási-Albert networkFinite size scale-free networks Uniform immunizationTargeted immunization
Information diffusion through blogspace Social networks that matter: Twitter under the microscope
  • D Gruhl
  • R Guha
  • D Liben
  • A Nowell
  • Tomkins
D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. In Proc. of the 13th international conference on World Wide Web. ACM, 2004. [11] B. A. Huberman, D. M. Romero, and F. Wu. Social networks that matter: Twitter under the microscope. arXiv:0812.1045v1, Dec 2008.
State of the twittersphere
  • Hubspot
HubSpot. State of the twittersphere. http://bit.ly/sotwitter, June 2009.