Conference PaperPDF Available

Detecting and Tracking Political Abuse in Social Media

Authors:

Figures

Content may be subject to copyright.
Detecting and Tracking Political Abuse in Social Media
J. Ratkiewicz, M. D. Conover, M. Meiss, B. Gonc¸alves, A. Flammini, F. Menczer
Center for Complex Networks and Systems Research
School of Informatics and Computing
Indiana University, Bloomington, IN, USA
Abstract
We study astroturf political campaigns on microblogging
platforms: politically-motivated individuals and organiza-
tions that use multiple centrally-controlled accounts to create
the appearance of widespread support for a candidate or opin-
ion. We describe a machine learning framework that com-
bines topological, content-based and crowdsourced features
of information diffusion networks on Twitter to detect the
early stages of viral spreading of political misinformation. We
present promising preliminary results with better than 96%
accuracy in the detection of astroturf content in the run-up to
the 2010 U.S. midterm elections.
1 Introduction
Social networking and microblogging services reach hun-
dreds of millions of users and have become fertile ground
for a variety of research efforts. They offer a unique op-
portunity to study patterns of social interaction among far
larger populations than ever before. In particular, Twitter has
recently generated much attention in the research commu-
nity due to its peculiar features, open policy on data shar-
ing, and enormous popularity. The popularity of Twitter,
and of social media in general, is further enhanced by the
fact that traditional media pay close attention to the ebb
and flow of the communication that they support. With this
scrutiny comes the potential for the hosted discussions to
reach a far larger audience than simply the original social
media users. Along with the recent growth of social media
popularity, we are witnessing an increased usage of these
platforms to discuss issues of public interest, as they offer
unprecedented opportunities for increased participation and
information awareness among the Internet-connected pub-
lic (Adamic and Glance 2005). While some of the discus-
sions taking place on social media may seem banal and su-
perficial, the attention is not without merit. Social media of-
ten enjoy substantial user bases with participants drawn from
diverse geographic, social, and political backgrounds (Java
et al. 2007). Moreover, the user-as-information-producer
model provides researchers and news organizations alike
with a means of instrumenting and observing a represen-
tative sample of the population in real time. Indeed, it has
Copyright c
2011, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
been recently demonstrated that useful information can be
mined from Twitter data streams(Asur and Huberman 2010;
Tumasjan et al. 2010; Bollen, Mao, and Zeng 2011).
With this increasing popularity, however, comes a dark
side — as social media grows in prominence, it is natural
that people find ways to abuse it. As a result, we observe
various types of illegitimate use; spam is a common exam-
ple (Grier et al. 2010; Wang 2010). Here we focus on a par-
ticular social media platform, Twitter, and on one particular
type of abuse, namely political astroturf — political cam-
paigns disguised as spontaneous “grassroots” behavior that
are in reality carried out by a single person or organization.
This is related to spam but with a more specific domain con-
text, and potentially larger consequences.
Online social media tools play a crucial role in the suc-
cesses and failures of numerous political campaigns and
causes. Examples range from the grassroots organizing
power of Barack Obama’s 2008 presidential campaign, to
Howard Dean’s failed 2004 presidential bid and the first-
ever Tea Party rally (Rasmussen and Schoen 2010; Wiese
and Gronbeck 2005).
The same structural and systemic properties that enable
social media such as Twitter to boost grassroots political
organization can also be leveraged, even inadvertently, to
spread less constructive information. For example, during
the political campaign for the 2010 midterm election, several
major news organizations picked up on the messaging frame
of a viral tweet relating to the allocation of stimulus funds,
succinctly describing a study of decision making in drug-
addicted macaques as “Stimulus $ for coke monkeys” (The
Fox Nation 2010).
While the “coke monkeys” meme developed organically
from the attention dynamics of thousands of users, it illus-
trates the powerful and potentially detrimental role that so-
cial media can play in shaping public discourse. As we will
demonstrate, a motivated attacker can easily orchestrate a
distributed effort to mimic or initiate this kind of organic
spreading behavior, and with the right choice of inflamma-
tory wording, influence a public well beyond the confines of
his or her own social network.
Unlike traditional news sources, social media provide lit-
tle in the way of individual accountability or fact-checking
mechanisms. Catchiness and repeatability, rather than truth-
fulness, can function as the primary drivers of information
diffusion. While flame wars and hyperbole are hardly new
phenomena online, Twitter’s 140-character sound bytes are
ready-made headline fodder for the 24-hour news cycle.
In the remainder of this paper we describe a system to an-
alyze the diffusion of information in social media, and, in
particular, to automatically identify and track orchestrated,
deceptive efforts to mimic the organic spread of information
through the Twitter network. The main contributions of this
paper are very encouraging preliminary results on the detec-
tion of suspicious memes via supervised learning (96% ac-
curacy) based on features extracted from the topology of the
diffusion networks, sentiment analysis, and crowdsourced
annotations. Because what distinguishes astoturf from true
political dialogue includes the way they are spread, our ap-
proach explicitly takes into account the diffusion patterns of
messages across the social network.
2 Background and Related Work
2.1 Information Diffusion
The study of opinion dynamics and information diffusion
in social networks has a long tradition in the social, physi-
cal, and computational sciences (Castellano, Fortunato, and
Loreto 2009; Barrat, Barthelemy, and Vespignani 2008;
Leskovec, Adamic, and Huberman 2006; Leskovec, Back-
strom, and Kleinberg 2009). Twitter has recently been con-
sidered as case study for information diffusion. For example,
Galuba et al. (2010) take into account user behavior, user-
user influence, and resource virulence to predict the spread
of URLs through the social network. While usually referred
to as ‘viral,’ the way in which information or rumors diffuse
in a network has important differences with respect to in-
fectious diseases (Morris 2000). Rumors gradually acquire
more credibility as more and more network neighbors ac-
quire them. After some time, a threshold is crossed and the
rumor is believed to be true within a community.
A serious obstacle in the modeling of information prop-
agation in the real world as well as in the blogosphere
is the fact that the structure of the underlying social net-
work is often unknown. When explicit information on the
social network is available (e.g. the Twitter’s follower re-
lations) the strength of the social links are hardly known
and their importance cannot be deemed uniform across
the network (Huberman, Romero, and Wu 2008). Heuris-
tic methods are being developed to face this issue. Gomez-
Rodriguez, Leskovec, and Krause (2010) propose an algo-
rithm that can efficiently approximate linkage information
based on the times at which specific URLs appear in a net-
work of news sites. For the purposes of our study such prob-
lem can be, at least partially, ignored. Twitter provides an
explicit way to follow the diffusion of information via the
tracking of retweets. This metadata tells us which links in the
social network have actually played a role in the diffusion of
information. Retweets have already been considered, e.g., to
highlight the conversational aspects of online social inter-
action (Honeycutt and Herring 2008). and because it is not
published or accessible yet The reliability of retweeted in-
formation has also been investigated. Mendoza, Poblete, and
Castillo (2010) found that false information is more likely
to be questioned by users than reliable accounts of an event.
Their work is distinct from our own in that it does not inves-
tigate the dynamics of misinformation propagation.
2.2 Mining Microblog Data
Several studies have demonstrated that information shared
on Twitter has some intrinsic value, facilitating, e.g., predic-
tions of box office success (Asur and Huberman 2010) and
the results of political elections (Tumasjan et al. 2010). Con-
tent has been further analyzed to study consumer reactions to
specific brands (Jansen et al. 2009), the use of tags to alter
content (Huang, Thornton, and Efthimiadis 2010), its rela-
tion to headline news (Kwak et al. 2010), and the factors that
influence the probability of a meme to be retweeted (Suh et
al. 2010). Romero et al. (2010) have focused on how passive
and active users influence the spreading paths.
Recent work has leveraged the collective behavior of
Twitter users to gain insight into a number of diverse phe-
nomena. Analysis of tweet content has shown that some
correlation exists between the global mood of its users and
important worldwide events, including stock market fluc-
tuations (Bollen, Mao, and Pepe 2010; Bollen, Mao, and
Zeng 2011). Similar techniques have been applied to in-
fer relationships between media events such as presiden-
tial debates and affective responses among social media
users (Diakopoulos and Shamma 2010). Sankaranarayanan
et al. (2009) developed an automated breaking news de-
tection system based on the linking behavior of Twitter
users, while Heer and boyd (2005) describe a system for
visualizing and exploring the relationships between users
in large-scale social media systems. Driven by practical
concerns, others have successfully approximated the epi-
center of earthquakes in Japan by treating Twitter users
as a geographically-distributed sensor network (Sakaki,
Okazaki, and Matsuo 2010).
2.3 Political Astroturf and Truthiness
In the remainder of this paper we describe the analysis of
data obtained by a system designed to detect astroturfing
campaigns on Twitter (Ratkiewicz et al. 2011). An illus-
trative example of such campaign has been recently docu-
mented by Mustafaraj and Metaxas (2010). They described
a concerted, deceitful attempt to cause a specific URL to
rise to prominence on Twitter through the use of a network
of nine fake user accounts. These accounts produced 929
tweets over the course of 138 minutes, all of which included
a link to a website smearing one of the candidates in the
2009 Massachusetts special election. The tweets injecting
this meme mentioned users who had previously expressed
interest in the election. The initiators sought not just to ex-
pose a finite audience to a specific URL, but to trigger an in-
formation cascade that would lend a sense of credibility and
grassroots enthusiasm to a specific political message. Within
hours, a substantial portion of the targeted users retweeted
the link, resulting in a rapid spread detected by Google’s
real-time search engine. This caused the URL in question
to be promoted to the top of the Google results page for a
query on the candidate’s name — a so-called Twitter bomb.
This case study demonstrates the ease with which a focused
Bob
bp.com
#oilspill
Event 1: Bob tweets with memes #oilspill and bp.com
(Analysis may infer dashed edges)
Bob
bp.com
#oilspill
Event 2: Alice re-tweets Bob's message
Alice
EDGE
Weight
Source, Target
EVENT
Timestamp
Figure 1: Model of streaming social media events.
effort can initiate the viral spread of information on Twitter,
and the serious consequences of such abuse.
Mass creation of accounts, impersonation of users, and
the posting of deceptive content are behaviors that are likely
common to both spam and political astroturfing. However,
political astroturf is not exactly the same as spam. While the
primary objective of a spammer is often to persuade users
to click a link, someone interested in promoting an astroturf
message wants to establish a false sense of group consen-
sus about a particular idea. Related to this process is the fact
that users are more likely to believe a message that they per-
ceive as coming from several independent sources, or from
an acquaintance (Jagatic et al. 2007). Spam detection sys-
tems often focus on the content of a potential spam mes-
sage — for instance, to see if the message contains a certain
link or set of tags. In detecting political astroturf, we focus
on how the message is delivered rather than on its content.
Further, many legitimate users may be unwittingly complicit
in the propagation of astroturf, having been themselves de-
ceived. Spam detection methods that focus solely on proper-
ties of user accounts, such as the number of URLs in tweets
from an account or the interval between successive tweets,
may therefore be unsuccessful in finding such abuse.
We adopt the term truthy to discriminate falsely-
propagated information from organic grassroots memes. The
term was coined by comedian Stephen Colbert to describe
something that a person believes based on emotion rather
than facts. We can then define our task as the detection of
truthy memes in the Twitter stream. Not every truthy meme
will result in a viral cascade like the one documented by
Mustafaraj and Metaxas, but we wish to test the hypothesis
that the initial stages exhibit identifiable signatures.
3 Analytical Framework
We developed a unified framework, which we call Klatsch,
that analyzes the behavior of users and diffusion of ideas in
a broad variety of data feeds. This framework is designed
to provide data interoperability for the real-time analysis of
massive social media data streams (millions of posts per
day) from sites with diverse structures and interfaces. To
this end, we model a generic stream of social networking
data as a series of events that represent interactions between
actors and memes, as shown in Fig. 1. Each event involves
some number of actors (entities that represent users), some
number of memes (entities that represent units of informa-
tion at the desired level of detail), and interactions among
them. For example, a single tweet event might involve three
or more actors: the poster, the user she is retweeting, and
the people she is addressing. The post might also involve a
set of memes consisting of ‘hashtags’ and URLs referenced
in the tweet. Each event can be thought of as contributing a
unit of weight to edges in a network structure, where nodes
are associated with either actors or memes. The timestamps
associated with the events allow us to observe the changing
structure of this network over time.
3.1 Meme Types
To study the diffusion of information on Twitter it is neces-
sary to identify a specific topic as it propagates through the
social substrate. While there exist sophisticated statistical
techniques for modeling the topics underlying bodies of text,
the small size of each tweet and the contextual drift present
in streaming data create significant complications (Wang et
al. 2003). Fortunately, several conventions shared by Twit-
ter users allow us to sidestep these issues. We focus on the
following features to identify different types of memes:
Hashtags The Twitter community uses tokens prefixed by
a hashmark (#) to label the topical content of tweets.
Some examples of popular tags are #gop,#obama, and
#desen, marking discussion about the Republican party,
President Obama, and the Delaware race for U.S. Senate,
respectively. These are often called hashtags.
Mentions A Twitter user can include another user’s screen
name in a post, prepended by the @symbol. These men-
tions can be used to denote that a particular Twitter user
is being discussed.
URLs We extract URLs from tweets by matching strings of
valid URL characters that begin with ‘http://.’ Honey-
cutt and Herring (2008) suggest that URLs are associated
with the transmission of information on Twitter.
Phrases Finally, we consider the entire text of the tweet it-
self to be a meme, once all Twitter metadata, punctuation,
and URLs have been removed.
Relying on these conventions we are able to focus on the
ways in which a large number of memes propagate through
the Twitter social network. Note that a tweet may be in-
cluded in several of these categories. A tweet containing (for
instance) two hashtags and a URL would count as a member
of each of the three resulting memes.
3.2 Network Edges
To represent the flow of information through the Twitter
community, we construct a directed graph in which nodes
are individual user accounts. An example diffusion network
involving three users is shown in Fig. 2. An edge is drawn
from node Ato Bwhen either Bis observed to retweet a
message from A, or Amentions Bin a tweet. The weight
of an edge is incremented each time we observe an event
connecting two users. In this way, either type of edge can be
understood to represent a flow of information from Ato B.
A
C
B
kin =1
kout =2
sin =1
sout =2
kin =2
kout =1
sin =3
sout =1
kin =1
kout =1
sin =1
sout =2
Figure 2: Example of a meme diffusion network involving
three users mentioning and retweeting each other. The val-
ues of various node statistics are shown next to each node.
The strength srefers to weighted degree, kstands for degree.
Observing a retweet at node Bprovides implicit confirma-
tion that information from Aappeared in B’s Twitter feed,
while a mention of Boriginating at node Aexplicitly con-
firms that A’s message appeared in Bs Twitter feed. This
may or may not be noticed by B, therefore mention edges
are less reliable indicators of information flow compared to
retweet edges.
Retweet and reply/mention information parsed from the
text can be ambiguous, as in the case when a tweet is marked
as being a ‘retweet’ of multiple people. Rather, we rely
on Twitter metadata, which designates users replied to or
retweeted by each message. Thus, while the text of a tweet
may contain several mentions, we only draw an edge to the
user explicitly designated as the mentioned user by the meta-
data. In so doing, we may miss retweets that do not use the
explicit retweet feature and thus are not captured in the meta-
data. Note that this is separate from our use of mentions as
memes (§3.1), which we parse from the text of the tweet.
4 System Architecture
We implemented a system based on the data representation
described above to automatically monitor the data stream
from Twitter, detect relevant memes, collect the tweets that
match themes of interest, and produce basic statistical fea-
tures relative to patterns of diffusion. These features are
then passed to our meme classifier and/or visualized. We
called this system “Truthy.” The different stages that lead
to the identification of the truthy memes are described in the
following subsections. A screenshot of the meme overview
page of our website (truthy.indiana.edu) is shown
in Fig. 3. Upon clicking on any meme, the user is taken to
another page with more detailed statistics about that meme.
They are also given an opportunity to label the meme as
‘truthy;’ the idea is to crowdsource the identification of
truthy memes, as an input to the classifier described in §5.
4.1 Data Collection
To collect meme diffusion data we rely on whitelisted ac-
cess to the Twitter ‘Gardenhose’ streaming API (dev.
twitter.com/pages/streaming_api). The Gar-
denhose provides detailed data on a sample of the Twitter
corpus at a rate that varied between roughly 4million tweets
Figure 3: Screenshot of the Meme Overview page of our
website, displaying a number of vital statistics about tracked
memes. Users can then select a particular meme for more
detailed information.
per day near the beginning of our study, to around 8mil-
lion tweets per day at the time of this writing. While the
process of sampling edges (tweets between users) from a
network to investigate structural properties has been shown
to produce suboptimal approximations of true network char-
acteristics (Leskovec and Faloutsos 2006), we find that the
analyses described below are able to produce accurate clas-
sifications of truthy memes even in light of this shortcoming.
4.2 Meme Detection
A second component of our system is devoted to scanning
the collected tweets in real time. The task of this meme de-
tection component is to determine which of the collected
tweets are to be stored in our database for further analysis.
Our goal is to collect only tweets (a) with content related
to U.S. politics, and (b) of sufficiently general interest in
that context. Political relevance is determined by matching
against a manually compiled list of keywords. We consider a
meme to be of general interest if the number of tweets with
that meme observed in a sliding window of time exceeds a
given threshold. We implemented a filtering step for each of
these criteria, described elsewhere (Ratkiewicz et al. 2011).
Our system has tracked a total of approximately 305 mil-
lion tweets collected from September 14 until October 27,
2010. Of these, 1.2 million contain one or more of our polit-
ical keywords; the meme filtering step further reduced this
number to 600,000. Note that this number of tweets does not
directly correspond to the number of tracked memes, as each
tweet might contribute to several memes.
4.3 Network Analysis
To characterize the structure of each meme’s diffusion net-
work we compute several statistics based on the topology
of the largest connected component of the retweet/mention
Table 1: Features used in truthy classification.
nodes Number of nodes
edges Number of edges
mean k Mean degree
mean s Mean strength
mean w Mean edge weight in largest con-
nected component
max k(i,o) Maximum (in,out)-degree
max k(i,o) user User with max. (in,out)-degree
max s(i,o) Maximum (in,out)-strength
max s(i,o) user User with max. (in,out)-strength
std k(i,o) Std. dev. of (in,out)-degree
std s(i,o) Std. dev. of (in,out)-strength
skew k(i,o) Skew of (in,out)-degree distribution
skew s(i,o) Skew of (in,out)-strength distribution
mean cc Mean size of connected components
max cc Size of largest connected component
entry nodes Number of unique injections
num truthy Number of times ‘truthy’ button was
clicked
sentiment scores Six GPOMS sentiment dimensions
graph. These include the number of nodes and edges in the
graph, the mean degree and strength of nodes in the graph,
mean edge weight, mean clustering coefficient across nodes
in the largest connected component, and the standard devi-
ation and skew of each network’s in-degree, out-degree and
strength distributions (see Fig. 2). Additionally we track the
out-degree and out-strength of the most prolific broadcaster,
as well as the in-degree and in-strength of the most focused-
upon user. We also monitor the number of unique injection
points of the meme, reasoning that organic memes (such as
those relating to news events) will be associated with larger
number of originating users.
4.4 Sentiment Analysis
We also utilize a modified version of the Google-based
Profile of Mood States (GPOMS) sentiment analysis
method (Bollen, Mao, and Pepe 2010) in the analysis of
meme-specific sentiment on Twitter. The GPOMS tool as-
signs to a body of text a six-dimensional vector with bases
corresponding to different mood attributes (Calm, Alert,
Sure, Vital, Kind, and Happy). To produce scores for a meme
along each of the six dimensions, GPOMS relies on a vocab-
ulary taken from an established psychometric evaluation in-
strument extended with co-occurring terms from the Google
n-gram corpus. We applied the GPOMS methodology to the
collection of tweets, obtaining a six-dimensional mood vec-
tor for each meme.
5 Automatic Classification
As an application of the analyses performed by the Truthy
system, we trained a binary classifier to automatically label
legitimate and truthy memes.
We began by producing a hand-labeled corpus of train-
ing examples in three classes — ‘truthy,’ ‘legitimate,’ and
‘remove.’ We labeled these by presenting random memes to
several human reviewers (the authors of the paper and a few
Table 2: Performance of two classifiers with and without re-
sampling training data to equalize class sizes. All results are
averaged based on 10-fold cross-validation.
Classifier Resampling? Accuracy AUC
AdaBoost No 92.6% 0.91
AdaBoost Yes 96.4% 0.99
SVM No 88.3% 0.77
SVM Yes 95.6% 0.95
Table 3: Confusion matrices for a boosted decision stump
classifier with and without resampling. The labels on the
rows refer to true class assignments; the labels on the
columns are those predicted.
No resampling With resampling
Truthy Legitimate Truthy Legitimate
T 45 (12%) 16 (4%) 165 (45%) 6 (1%)
L 11 (3%) 294 (80%) 7 (2%) 188 (51%)
additional volunteers), and asking them to place each meme
in one of the three categories. A meme was to be classified as
‘truthy’ if a significant portion of the users involved in that
meme appeared to be spreading it in misleading ways —
e.g., if a number of the accounts tweeting about the meme
appeared to be robots or sock puppets, the accounts appeared
to follow only other propagators of the meme (clique behav-
ior), or the users engaged in repeated reply/retweet exclu-
sively with other users who had tweeted the meme. ‘Legit-
imate’ memes were described as memes representing nor-
mal use of Twitter — several non-automated users convers-
ing about a topic. The final category, ‘remove,’ was used for
memes in a non-English language or otherwise unrelated to
U.S. politics (#youth, for example). These memes were
not used in the training or evaluation of classifiers.
Upon gathering 252 annotated memes, we observed an
imbalance in our labeled data (231 legitimate and only 21
truthy). Rather than simply resampling from the smaller
class, as is common practice in the case of class imbal-
ance, we performed a second round of human annotations
on previously-unlabeled memes predicted to be ‘truthy’ by
the classifier trained in the previous round, gaining 103 more
annotations (74 legitimate and 40 truthy). We note that the
human classifiers knew that the additional memes were pos-
sibly more likely to be truthy, but that the classifier was not
very good at this point due to the paucity of training data
and indeed was often contradicted by the human classifica-
tion. This bootstrapping procedure allowed us to manually
label a larger portion of truthy memes with less bias than
resampling. Our final training dataset consisted of 366 train-
ing examples — 61 ‘truthy’ memes and 305 legitimate ones.
In a few cases where multiple reviewers disagreed on the la-
beling of a meme, we determined the final label by reaching
consensus in a group discussion among all reviewers. The
dataset is available online.1
We experimented with several classifiers, as implemented
1cnets.indiana.edu/groups/nan/truthy
Table 4: Top 10 most discriminative features, according to a
χ2analysis under 10-fold cross validation. Intervals repre-
sent the variation of the χ2or rank across the folds.
Feature χ2Rank
mean w 230 ±4 1.0±0.0
mean s 204 ±6 2.0±0.0
edges 188 ±4 4.3±1.9
skew ko 185 ±4 4.4±1.1
std si 183 ±5 5.1±1.3
skew so 184 ±4 5.1±0.9
skew si 180 ±4 6.7±1.3
max cc 177 ±4 8.1±1.0
skew ki 174 ±4 9.6±0.9
std ko 168 ±5 11.5±0.9
by Hall et al. (2009). Since comparing different learning
algorithms is not our goal, we report on the results ob-
tained with just two well-known classifiers: AdaBoost with
DecisionStump, and SVM. We provided each classifier with
31 features about each meme, as shown in Table 1. A few
of these features bear further explanation. Measures relating
to ‘degree’ and ‘strength’ refer to the nodes in the diffusion
network of the meme in question — that is, the number of
people that each user retweeted or mentioned, and the num-
ber of times these connections were made, respectively. We
defined an ‘injection point’ as a tweet containing the meme
which was not itself a retweet; our intuition was that memes
with a larger number of injection points were more likely to
be legitimate. No features were normalized.
As the number of instances of truthy memes was still less
than instances of legitimate ones, we also experimented with
resampling the training data to balance the classes prior to
classification. The performance of the classifiers is shown in
Table 2, as evaluated by their accuracy and the area under
their ROC curves (AUC). The latter is an appropriate evalu-
ation measure in the presence of class imbalance. In all cases
these preliminary results are quite encouraging, with accu-
racy around or above 90%. The best results are obtained by
AdaBoost with resampling: better than 96% accuracy and
0.99 AUC. Table 3 further shows the confusion matrices for
AdaBoost. In this task, false negatives (truthy memes incor-
rectly classified as legitimate, in the upper-right quadrant
of each matrix) are less desirable than false positives (the
lower-left quadrant). In the worst case, the false negative rate
is 4%. We did not perform any feature selection or other op-
timization; the classifiers were provided with all the features
computed for each meme (Table 1). Table 4 shows the 10
most discriminative features, as determined by χ2analysis.
Network features appear to be more discriminative than sen-
timent scores or the few user annotations that we collected.
6 Examples of Astroturf
The Truthy system allowed us to identify several egregious
instances of astroturf memes. Some of these cases caught
the attention of the popular press due to the sensitivity of the
topic in the run up to the 2010 U.S. midterm political elec-
tions, and subsequently many of the accounts involved were
suspended by Twitter. Let us illustrate a few representative
examples.
#ampat The #ampat hashtag is used by many conserva-
tive users. What makes this meme suspicious is that the
bursts of activity are driven by two accounts, @CSteven
and @CStevenTucker, which are controlled by the
same user, in an apparent effort to give the impression
that more people are tweeting about the same topics. This
user posts the same tweets using the two accounts and has
generated a total of over 41,000 tweets in this fashion.
See Fig. 4(A) for the #ampat diffusion network.
@PeaceKaren 25 This account did not disclose informa-
tion about the identity of its owner, and generated a very
large number of tweets (over 10,000 in four months). Al-
most all of these tweets supported several Republican can-
didates. Another account, @HopeMarie 25, had a simi-
lar behavior to @PeaceKaren 25 in retweeting the ac-
counts of the same candidates and boosting the same web-
sites. It did not produce any original tweets, and in addi-
tion it retweeted all of @PeaceKaren 25’s tweets, pro-
moting that account. These accounts had also succeeded
at creating a ‘twitter bomb:’ for a time, Google searches
for “gopleader” returned these tweets in the first page
of results. A visualization of the interaction between these
two accounts can be seen in Fig. 4(B). Both accounts were
suspended by Twitter by the time of this writing.
gopleader.gov This meme is the website of the Re-
publican Leader John Boehner. It looks truthy because
it is promoted by the two suspicious accounts described
above. The diffusion of this URL is shown in Fig. 4(C).
How Chris Coons budget works- uses tax $ 2 attend din-
ners and fashion shows
This is one of a set of truthy memes smearing Chris
Coons, the Democratic candidate for U.S. Senate from
Delaware. Looking at the injection points of these
memes, we uncovered a network of about ten bot ac-
counts. They inject thousands of tweets with links to
posts from the freedomist.com website. To avoid
detection by Twitter and increase visibility to different
users, duplicate tweets are disguised by adding different
hashtags and appending junk query parameters to the
URLs. To generate retweeting cascades, the bots also
coordinate mentioning a few popular users. When these
targets perceive receiving the same news from several
people, they are more likely to think it is true and spread
it to their followers. Most bot accounts in this network
can be traced back to a single person who runs the
freedomist.com website. The diffusion network
corresponding to this case is illustrated in Fig. 4(D).
These are just a few examples of truthy memes that our
system was able to identify. Two other networks of bots were
shut down by Twitter after being detected by Truthy.
Fig. 4 also shows the diffusion networks for four le-
gitimate memes. One, #Truthy, was injected as an ex-
periment by the NPR Science Friday radio program. An-
other, @senjohnmccain, displays two different commu-
nities in which the meme was propagated: one by retweets
A B
C)
CD
E)
E F
G)
GH
Figure 4: Diffusion networks of sample memes from our dataset. Edges are represented using the same notation as in Fig. 2.
Four truthy memes are shown in the top row and four legitimate ones in the bottom row. (A) #ampat (B) @PeaceKaren 25
(C) gopleader.gov (D) “How Chris Coons budget works- uses tax $ 2 attend dinners and fashion shows” (E) #Truthy
(F) @senjohnmccain (G) on.cnn.com/aVMu5y (H) “Obama said taxes have gone down during his administration. That’s
ONE way to get rid of income tax — getting rid of income”
from @ladygaga in the context of discussion on the re-
peal of the “Don’t ask, don’t tell” policy on gays in the mil-
itary, and the other by mentions of @senjohnmccain. A
gallery with detailed explanations about various truthy and
legitimate memes can be found on our website (truthy.
indiana.edu/gallery).
7 Discussion
Our simple classification system was able to accurately de-
tect ‘truthy’ memes based on features extracted from the
topology of the diffusion networks. Using this system we
have been able to identify a number of ‘truthy’ memes.
Though few of these exhibit the explosive growth charac-
teristic of true viral memes, they are nonetheless clear ex-
amples of coordinated attempts to deceive Twitter users.
Truthy memes are often spread initially by bots, causing
them to exhibit, when compared with organic memes, patho-
logical diffusion graphs. These graphs show a number of pe-
culiar features, including high numbers of unique injection
points with few or no connected components, strong star-
like topologies characterized by high average degree, and
most tellingly large edge weights between dyads.
In addition, we observed several other approaches to de-
ception that were not discoverable using graph-based prop-
erties only. One case was that of a bot network using unique
query string suffixes on otherwise identical URLs in an ef-
fort to make them look distinct. This works because many
URL-shortening services ignore query strings when process-
ing redirect requests. In another case we observed a number
of automated accounts that use text segments drawn from
newswire services to produce multiple legitimate-looking
tweets in between the injection of URLs. These instances
highlight several of the more general properties of truthy
memes detected by our system.
The accuracy scores we obtain in the classification task
are surprisingly high. We hypothesize that this performance
is partially explained by the fact that a consistent propor-
tion of the memes were failed attempts of starting a cascade.
In these cases the networks reduced to isolated injection
points or small components, resulting in network properties
amenable to easy classification.
Despite the fact that many of the memes discussed in this
paper are characterized by small diffusion networks, it is im-
portant to note that this is the stage at which such attempts
at deception must be identified. Once one of these attempts
is successful at gaining the attention of the community, the
meme spreading pattern becomes indistinguishable from an
organic one. Therefore, the early identification and termina-
tion of accounts associated with astroturf memes is critical.
Future work could explore further crowdsourcing the an-
notation of truthy memes. In our present system, we were
not able to collect sufficient crowdsourcing data (only 304
clicks of the ‘truthy’ button, and mostly correlated with
meme popularity), but these annotations may well prove use-
ful with more data. Several other promising features could
be used as input to a classifier, such as the age of the ac-
counts involved in spreading a meme, the reputation of users
based on other memes they have contributed, and other fea-
tures from bot detection methods (Chu et al. 2010).
Acknowledgments. We are grateful to A. Vespignani, C.
Catutto, J. Ramasco, and J. Lehmann for helpful discussions, J.
Bollen for his GPOMS code, T. Metaxas and E. Mustafaraj for
inspiration and advice, and Y. Wang for Web design support. We
thank the Gephi toolkit for aid in our visualizations and the many
users who have provided feedback and annotations. We acknowl-
edge support from NSF (grant No. IIS-0811994), Lilly Foundation
(Data to Insight Center Research Grant), the Center for Complex
Networks and Systems Research, and the IUB School of Informat-
ics and Computing.
References
Adamic, L., and Glance, N. 2005. The political blogosphere and
the 2004 U.S. election: Divided they blog. In Proc. 3rd Intl. Work-
shop on Link Discovery (LinkKDD), 36–43.
Asur, S., and Huberman, B. A. 2010. Predicting the future with
social media. Technical Report arXiv:1003.5699, CoRR.
Barrat, A.; Barthelemy, M.; and Vespignani, A. 2008. Dynamical
Processes on Complex Networks. Cambridge University Press.
Bollen, J.; Mao, H.; and Pepe, A. 2010. Determining the public
mood state by analysis of microblogging posts. In Proc. of the Alife
XII Conf. MIT Press.
Bollen, J.; Mao, H.; and Zeng, X. 2011. Twitter mood predicts the
stock market. J. of Computational Science In Press.
Castellano, C.; Fortunato, S.; and Loreto, V. 2009. Statistical
physics of social dynamics. Rev. Mod. Phys. 81(2):591–646.
Chu, Z.; Gianvecchio, S.; Wang, H.; and Jajodia, S. 2010. Who is
tweeting on twitter: human, bot, or cyborg? In Proc. 26th Annual
Computer Security Applications Conf. (ASAC), 21–30.
Diakopoulos, N. A., and Shamma, D. A. 2010. Characterizing
debate performance via aggregated twitter sentiment. In Proc. 28th
Intl. Conf. on Human Factors in Computing Systems (CHI), 1195–
1198.
Galuba, W.; Aberer, K.; Chakraborty, D.; Despotovic, Z.; and
Kellerer, W. 2010. Outtweeting the Twitterers - Predicting Infor-
mation Cascades in Microblogs. In 3rd Workshop on Online Social
Networks (WOSN).
Gomez-Rodriguez, M.; Leskovec, J.; and Krause, A. 2010. In-
ferring networks of diffusion and influence. In Proc. 16th ACM
SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining
(KDD), 1019–1028.
Grier, C.; Thomas, K.; Paxson, V.; and Zhang, M. 2010. @spam:
the underground on 140 characters or less. In Proc. 17th ACM
Conf. on Computer and Communications Security (CCS), 27–37.
Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; and
Witten, I. H. 2009. The WEKA data mining software: An update.
ACM SIGKDD Explorations 11(1):10–18.
Heer, J., and boyd, d. 2005. Vizster: Visualizing online social net-
works. In Proc. IEEE Symp. on Information Visualization (InfoVis).
Honeycutt, C., and Herring, S. C. 2008. Beyond microblogging:
Conversation and collaboration via Twitter. In Proc. 42nd Hawaii
Intl. Conf. on System Sciences.
Huang, J.; Thornton, K. M.; and Efthimiadis, E. N. 2010. Conver-
sational tagging in Twitter. In Proc. 21st ACM Conf. on Hypertext
and Hypermedia (HT).
Huberman, B. A.; Romero, D. M.; and Wu, F. 2008. Social net-
works that matter: Twitter under the microscope. Technical Report
arXiv:0812.1045, CoRR.
Jagatic, T.; Johnson, N.; Jakobsson, M.; and Menczer, F. 2007.
Social phishing. Communications of the ACM 50(10):94–100.
Jansen, B. J.; Zhang, M.; Sobel, K.; and Chowdury, A. 2009. Twit-
ter power: Tweets as electronic word of mouth. J. of the American
Society for Information Science 60:2169–2188.
Java, A.; Song, X.; Finin, T.; and Tseng, B. 2007. Why we Twitter:
understanding microblogging usage and communities. In Proc. 9th
WebKDD and 1st SNA-KDD Workshop on Web mining and social
network analysis, 56–65.
Kwak, H.; Lee, C.; Park, H.; and Moon, S. 2010. What is Twitter,
a social network or a news media? In Proc. 19th Intl. World Wide
Web Conf. (WWW), 591–600.
Leskovec, J.; Adamic, L. A.; and Huberman, B. A. 2006. Dynamics
of viral marketing. ACM Trans. Web 1(1):5.
Leskovec, J., and Faloutsos, C. 2006. Sampling from large graphs.
In Proc. 12th ACM SIGKDD Intl. Conf. on Knowledge Discovery
and Data Mining (KDD), 631–636.
Leskovec, J.; Backstrom, L.; and Kleinberg, J. 2009. Meme-
tracking and the dynamics of the news cycle. In Proc. 15th ACM
SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining
(KDD), 497–506.
Mendoza, M.; Poblete, B.; and Castillo, C. 2010. Twitter under
crisis: Can we trust what we RT? In Proc. 1st Workshop on Social
Media Analytics (SOMA).
Morris, S. 2000. Contagion. Rev. Economic Studies 67(1):57–78.
Mustafaraj, E., and Metaxas, P. 2010. From obscurity to promi-
nence in minutes: Political speech and real-time search. In Proc.
Web Science: Extending the Frontiers of Society On-Line (WebSci),
317.
Rasmussen, S., and Schoen, D. 2010. Mad as Hell: How the Tea
Party Movement Is Fundamentally Remaking Our Two-Party Sys-
tem. HarperCollins.
Ratkiewicz, J.; Conover, M.; Meiss, M.; Gonc¸alves, B.; Patil, S.;
Flammini, A.; and Menczer, F. 2011. Truthy : Mapping the spread
of astroturf in microblog streams. In Proc. 20th Intl. World Wide
Web Conf. (WWW).
Romero, D. M.; Galuba, W.; Asur, S.; and Huberman, B. A.
2010. Influence and passivity in social media. Technical Report
arXiv:1008.1253, CoRR.
Sakaki, T.; Okazaki, M.; and Matsuo, Y. 2010. Earthquake shakes
twitter users: real-time event detection by social sensors. In Proc.
19th Intl. World Wide Web Conf. (WWW), 851–860.
Sankaranarayanan, J.; Samet, H.; Teitler, B.; Lieberman, M.; and
Sperling, J. 2009. Twitterstand: news in tweets. In Proc. 17th ACM
SIGSPATIAL Intl. Conf. on Advances in Geographic Information
Systems (GIS), 42–51.
Suh, B.; Hong, L.; Pirolli, P.; and Chi, E. H. 2010. Want to be
retweeted? Large scale analytics on factors impacting retweet in
Twitter network. In Proc. IEEE Intl. Conf. on Social Computing.
The Fox Nation. 2010. Stimulus $ for coke monkeys. politifi.
com/news/Stimulus-for- Coke-Monkeys- 267998.html.
Tumasjan, A.; Sprenger, T. O.; Sandner, P. G.; and Welpe, I. M.
2010. Predicting Elections with Twitter: What 140 Characters Re-
veal about Political Sentiment. In Proc.4th Intl. AAAI Conf. on
Weblogs and Social Media (ICWSM).
Wang, H.; Fan, W.; Yu, P. S.; and Han, J. 2003. Mining concept-
drifting data streams using ensemble classifiers. In Proc.9th ACM
SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining
(KDD), 226–235.
Wang, A. H. 2010. Don’t follow me: Twitter spam detection. In
Proc. 5th Intl. Conf. on Security and Cryptography (SECRYPT).
Wiese, D. R., and Gronbeck, B. E. 2005. Campaign 2004: Develop-
ments in cyberpolitics. In Denton, R. E., ed., The 2004 Presidential
Campaign: A Communication Perspective. Rowman & Littlefield.
217–240.
... There are also obvious incentives for the adoption of covert methods that enhance both perceived and actual popularity of promoted information. There are abundant recently reported examples of abuse: astroturf in political campaigns, or attempts to spread fake news through social bots under the pretense of grassroots conversations [64,30,9]; pervasive spreading of unsubstantiated rumors and conspiracy theories [8]; orchestrated boosting of perceived consensus on relevant social issues performed by governments [67]; propaganda and recruitment by terrorist organizations, like ISIS [6,32]; and actions involving social media and stock market manipulation [73]. ...
... There are at least three questions about information campaigns that present scientific challenges: what, how, and who. The first concerns the subtle notion of trustworthiness of information, ranging from verified facts [18], to rumors and exaggerated, biased, unverified or fabricated news [64,85,8]. The second considers the tools employed for the propaganda. ...
... The network structure carries crucial information for the characterization of different types of communication. In fact, the usage of network features significantly helps in tasks like astroturf detection [64]. Our system reconstructs three types of networks: retweet, mention, and hashtag co-occurrence networks. ...
Preprint
Social media expose millions of users every day to information campaigns --- some emerging organically from grassroots activity, others sustained by advertising or other coordinated efforts. These campaigns contribute to the shaping of collective opinions. While most information campaigns are benign, some may be deployed for nefarious purposes. It is therefore important to be able to detect whether a meme is being artificially promoted at the very moment it becomes wildly popular. This problem has important social implications and poses numerous technical challenges. As a first step, here we focus on discriminating between trending memes that are either organic or promoted by means of advertisement. The classification is not trivial: ads cause bursts of attention that can be easily mistaken for those of organic trends. We designed a machine learning framework to classify memes that have been labeled as trending on Twitter.After trending, we can rely on a large volume of activity data. Early detection, occurring immediately at trending time, is a more challenging problem due to the minimal volume of activity data that is available prior to trending.Our supervised learning framework exploits hundreds of time-varying features to capture changing network and diffusion patterns, content and sentiment information, timing signals, and user meta-data. We explore different methods for encoding feature time series. Using millions of tweets containing trending hashtags, we achieve 75% AUC score for early detection, increasing to above 95% after trending. We evaluate the robustness of the algorithms by introducing random temporal shifts on the trend time series. Feature selection analysis reveals that content cues provide consistently useful signals; user features are more informative for early detection, while network and timing features are more helpful once more data is available.
... Leskovec et al. [16] develop a framework for tracking the spread of misinformation and observe a set of persistent temporal patterns in the news cycle. Ratkiewicz et al. [17] build a machine learning framework to detect the early stages of viral spreading of political misinformation. In [18], Qazvinian et al. address the rumor detection problem by exploring the effectiveness of three categories of features: content-based, network-based, and microblogspecific memes. ...
... rand ← a random number from 0 to 1 generated in uniform; 16: if rand ≤ p (u2,u1) then 17: ...
Preprint
Social networks allow rapid spread of ideas and innovations while the negative information can also propagate widely. When the cascades with different opinions reaching the same user, the cascade arriving first is the most likely to be taken by the user. Therefore, once misinformation or rumor is detected, a natural containment method is to introduce a positive cascade competing against the rumor. Given a budget k, the rumor blocking problem asks for k seed users to trigger the spread of the positive cascade such that the number of the users who are not influenced by rumor can be maximized. The prior works have shown that the rumor blocking problem can be approximated within a factor of (11/eδ)(1-1/e-\delta) by a classic greedy algorithm combined with Monte Carlo simulation with the running time of O(k3mnlnnδ2)O(\frac{k^3mn\ln n}{\delta^2}), where n and m are the number of users and edges, respectively. Unfortunately, the Monte-Carlo-simulation-based methods are extremely time consuming and the existing algorithms either trade performance guarantees for practical efficiency or vice versa. In this paper, we present a randomized algorithm which runs in O(kmlnnδ2)O(\frac{km\ln n}{\delta^2}) expected time and provides a (11/eδ)(1-1/e-\delta)-approximation with a high probability. The experimentally results on both the real-world and synthetic social networks have shown that the proposed randomized rumor blocking algorithm is much more efficient than the state-of-the-art method and it is able to find the seed nodes which are effective in limiting the spread of rumor.
... A wide literature branch is also devoted to understanding the spread of rumors and behaviors by focusing on structural properties of social networks to determine the way in which news spread in social networks, what makes messages go viral, and what are the characteristics of users who help spread such information [15,21,13,48]. Several works investigated how social media can shape and influence the public sphere [1,9,17,18], and efforts to contrast misinformation spreading range from algorithmic-based solutions up to tailored communication strategies [5,16,25,42,43,44]. ...
Preprint
Social media are pervaded by unsubstantiated or untruthful rumors, that contribute to the alarming phenomenon of misinformation. The widespread presence of a heterogeneous mass of information sources may affect the mechanisms behind the formation of public opinion. Such a scenario is a florid environment for digital wildfires when combined with functional illiteracy, information overload, and confirmation bias. In this essay, we focus on a collection of works aiming at providing quantitative evidence about the cognitive determinants behind misinformation and rumor spreading. We account for users' behavior with respect to two distinct narratives: a) conspiracy and b) scientific information sources. In particular, we analyze Facebook data on a time span of five years in both the Italian and the US context, and measure users' response to i) information consistent with one's narrative, ii) troll contents, and iii) dissenting information e.g., debunking attempts. Our findings suggest that users tend to a) join polarized communities sharing a common narrative (echo chambers), b) acquire information confirming their beliefs (confirmation bias) even if containing false claims, and c) ignore dissenting information.
... Misinformation is an instance of the broader issue of abuse of social media platforms, which has received a lot of attention in the recent literature [4][5][6][7][8][9][10][11][12][13][14][15]. The traditional method to cope with misinformation is to fact-check claims. ...
Preprint
Massive amounts of fake news and conspiratorial content have spread over social media before and after the 2016 US Presidential Elections despite intense fact-checking efforts. How do the spread of misinformation and fact-checking compete? What are the structural and dynamic characteristics of the core of the misinformation diffusion network, and who are its main purveyors? How to reduce the overall amount of misinformation? To explore these questions we built Hoaxy, an open platform that enables large-scale, systematic studies of how misinformation and fact-checking spread and compete on Twitter. Hoaxy filters public tweets that include links to unverified claims or fact-checking articles. We perform k-core decomposition on a diffusion network obtained from two million retweets produced by several hundred thousand accounts over the six months before the election. As we move from the periphery to the core of the network, fact-checking nearly disappears, while social bots proliferate. The number of users in the main core reaches equilibrium around the time of the election, with limited churn and increasingly dense connections. We conclude by quantifying how effectively the network can be disrupted by penalizing the most central nodes. These findings provide a first look at the anatomy of a massive online misinformation diffusion network.
... Tracking abuse of social media has been a topic of intense research in recent years. Beginning with the detection of simple instances of political abuse like astroturfing [31], researchers noted the need for automated tools for monitoring social media streams. Several such systems have been proposed in recent years, each with a particular focus or a different approach. ...
Preprint
Massive amounts of misinformation have been observed to spread in uncontrolled fashion across social media. Examples include rumors, hoaxes, fake news, and conspiracy theories. At the same time, several journalistic organizations devote significant efforts to high-quality fact checking of online claims. The resulting information cascades contain instances of both accurate and inaccurate information, unfold over multiple time scales, and often reach audiences of considerable size. All these factors pose challenges for the study of the social dynamics of online news sharing. Here we introduce Hoaxy, a platform for the collection, detection, and analysis of online misinformation and its related fact-checking efforts. We discuss the design of the platform and present a preliminary analysis of a sample of public tweets containing both fake news and fact checking. We find that, in the aggregate, the sharing of fact-checking content typically lags that of misinformation by 10--20 hours. Moreover, fake news are dominated by very active users, while fact checking is a more grass-roots activity. With the increasing risks connected to massive online misinformation, social news observatories have the potential to help researchers, journalists, and the general public understand the dynamics of real and fake news sharing.
... We expect that the emerging network structure carries important information to characterize different types of communication. Prior work shows that using network features significantly helps prediction tasks like social bot detection [8], [13], [15], and campaign detection [16], [17]. Our framework focuses on two types of networks: (i) retweet, and (ii) mention networks. ...
Preprint
We present a machine learning framework that leverages a mixture of metadata, network, and temporal features to detect extremist users, and predict content adopters and interaction reciprocity in social media. We exploit a unique dataset containing millions of tweets generated by more than 25 thousand users who have been manually identified, reported, and suspended by Twitter due to their involvement with extremist campaigns. We also leverage millions of tweets generated by a random sample of 25 thousand regular users who were exposed to, or consumed, extremist content. We carry out three forecasting tasks, (i) to detect extremist users, (ii) to estimate whether regular users will adopt extremist content, and finally (iii) to predict whether users will reciprocate contacts initiated by extremists. All forecasting tasks are set up in two scenarios: a post hoc (time independent) prediction task on aggregated data, and a simulated real-time prediction task. The performance of our framework is extremely promising, yielding in the different forecasting scenarios up to 93% AUC for extremist user detection, up to 80% AUC for content adoption prediction, and finally up to 72% AUC for interaction reciprocity forecasting. We conclude by providing a thorough feature analysis that helps determine which are the emerging signals that provide predictive power in different scenarios.
... Since the World Economic Forum listed massive digital misinformation as one of the main threats to our society [16], community-driven [17] and algorithmicdriven [18,19,20,21,22,23,24] solutions have been proposed to counteract the pervasiveness of online misinformation. However, a part of the scientific community is skeptical about the real effectiveness of such solutions. ...
Preprint
The massive diffusion of online social media allows for the rapid and uncontrolled spreading of conspiracy theories, hoaxes, unsubstantiated claims, and false news. Such an impressive amount of misinformation can influence policy preferences and encourage behaviors strongly divergent from recommended practices. In this paper, we study the statistical properties of viral misinformation in online social media. By means of methods belonging to Extreme Value Theory, we show that the number of extremely viral posts over time follows a homogeneous Poisson process, and that the interarrival times between such posts are independent and identically distributed, following an exponential distribution. Moreover, we characterize the uncertainty around the rate parameter of the Poisson process through Bayesian methods. Finally, we are able to derive the predictive posterior probability distribution of the number of posts exceeding a certain threshold of shares over a finite interval of time.
... By understanding these factors, this study can provide a better understanding of political dynamics and voter preferences in the context of Indonesia's 2024 presidential election. Previous research has shown that factors such as political events and important issues can influence the interest and popularity of presidential candidates (Kenski & Stroud, 2006;Ratkiewicz et al., 2011;Yang & DeHart, 2016). However, this study will offer a fresh perspective by focusing on how these factors are reflected in digital search trends, thus bridging the gap in understanding the connection between online behavior and electoral outcomes (Gayo-Avello, 2013;Ariel et al., 2024). ...
Article
Full-text available
Google Trends is an alternative and effective tool for predicting candidate popularity and election results with a simple method. This research aims to analyze and compare the popularity of Indonesia's 2024 presidential candidates using Google Trends. This research uses Google Trends as a tool. Data is taken from December 2022-December 2023 with the keywords 'Anies Baswedan', 'Ganjar Pranowo', and 'Prabowo Subianto. Crowed data is visualized using facilities provided by Google Trends, Canva and Flourish Studio's Data Visualization Software with three focus analyses: Interest over time, Interest by Region, and Related queries. The findings of this research show that the trend of searching for information about Indonesia's 2024 presidential candidates has been crowded since October 2022 and increased significantly until December 2023. The popularity of Anies Baswedan and Ganjar Pranowo on Google Trends was the same when each candidate made a declaration. Anies gained full popularity with 100 achievements, as well as Ganjar Pranowo gained full popularity when declared by Megawati Soekarno Putri with 100 achievements. However, the facts were very much different on the day when the Gerinda Party declared Prabowo Subianto. Prabowo's popularity when measured by Google Trends is only perched at position 30 while other candidates are above Prabowo, namely Ganjar 100 and Anies 78. Prabowo's popularity rose on August 13, 2022, which managed to get 100, Anies rose to 79 and Ganjar dropped to 57 even though it was only one day apart. These three candidates have different voter bases in the 2024 presidential election. Anies Baswedan's searchers on Google Trends are spread almost throughout the province. While Ganjar Pranowo excelled in Central Java, while West Papua, Maluku and Sulawesi many wanted to know Prabowo Subianto's information.
Chapter
Policies related to food, on the one hand, have been at the core of the developmental narratives of any nation, and on the other hand, they have also been the cause of various socio-political upheavals and movements across the world. Food policies have resulted in generational changes in production and consumption patterns and practices of people across the globe including Asia. Discourses are also ripe around how various economic and developmental factors like urbanization and globalization and the resultant emergence of the supermarket culture and big food chains have significantly impacted consumption patterns and practices among various regions in Asia.
Article
Full-text available
The ever-increasing amount of information flowing through Social Media forces the members of these networks to compete for attention and influence by relying on other people to spread their message. A large study of information propagation within Twitter reveals that the majority of users act as passive information consumers and do not forward the content to the network. Therefore, in order for individuals to become influential they must not only obtain attention and thus be popular, but also overcome user passivity. We propose an algorithm that determines the influence and passivity of users based on their information forwarding activity. An evaluation performed with a 2.5 million user dataset shows that our influence measure is a good predictor of URL clicks, outperforming several other measures that do not explicitly take user passivity into account. We demonstrate that high popularity does not necessarily imply high influence and vice-versa.
Conference Paper
Full-text available
In this article we explore the behavior of Twitter users under an emergency situation. In particular, we analyze the activity related to the 2010 earthquake in Chile and characterize Twitter in the hours and days following this disaster. Furthermore, we perform a pre-liminary study of certain social phenomenons, such as the dissem-ination of false rumors and confirmed news. We analyze how this information propagated through the Twitter network, with the pur-pose of assessing the reliability of Twitter as an information source under extreme circumstances. Our analysis shows that the propa-gation of tweets that correspond to rumors differs from tweets that spread news because rumors tend to be questioned more than news by the Twitter community. This result shows that it is posible to detect rumors by using aggregate analysis on tweets.
Article
Full-text available
Recently, all major search engines introduced a new fea-ture: real-time search results, embedded in the first page of organic search results. The content appearing in these results is pulled within minutes of its generation from the so-called "real-time Web" such as Twitter, blogs, and news websites. In this paper, we argue that in the context of political speech, this feature provides disproportionate ex-posure to personal opinions, fabricated content, unverified events, lies and misrepresentations that otherwise would not find their way in the first page, giving them the opportunity to spread virally. To support our argument we provide con-crete evidence from the recent Massachusetts (MA) senate race between Martha Coakley and Scott Brown, analyzing political community behavior on Twitter. In the process, we analyze the Twitter activity of those involved in exchanging messages, and we find that it is possible to predict their po-litical orientation and detect attacks launched on Twitter, based on behavioral patterns of activity.
Article
Full-text available
The ever-increasing amount of information owing through Social Media forces the members of these networks to compete for attention and influence by relying on other peopleto spread their message. A large study of information propagation within Twitter reveals that the majority of users act as passive information consumers and do not forward the content to the network. Therefore, in order for individuals to become influential they must not only obtain attention and thus be popular, but also overcome user passivity. We propose an algorithm that determines the influence and passivity of users based on their information forwarding activity. An evaluation performed with a 2.5 million user dataset shows that our influence measure is a good predictor of URL clicks, outperforming several other measures that do not explicitly take user passivity into account. We also explicitly demonstrate that high popularity does not necessarily imply high influence and vice-versa.
Conference Paper
Full-text available
The microblogging service Twitter is in the process of being appropriated for conversational interaction and is starting to be used for collaboration, as well. In an attempt to determine how well Twitter supports user-to-user exchanges, what people are using Twitter for, and what usage or design modifications would make it (more) usable as a tool for collaboration, this study analyzes a corpus of naturally-occurring public Twitter messages (tweets), focusing on the functions and uses of the @ sign and the coherence of exchanges. The findings reveal a surprising degree of conversationality, facilitated especially by the use of @ as a marker of addressivity, and shed light on the limitations of Twitter's current design for collaborative use.
Conference Paper
Given a huge real graph, how can we derive a representative sample? There are many known algorithms to compute interesting measures (shortest paths, centrality, betweenness, etc.), but several of them become impractical for large graphs. Thus graph sampling is essential. The natural questions to ask are (a) which sampling method to use, (b) how small can the sample size be, and (c) how to scale up the measurements of the sample (e.g., the diameter), to get estimates for the large graph. The deeper, underlying question is subtle: how do we measure success? We answer the above questions, and test our answers by thorough experiments on several, diverse datasets, spanning thousands nodes and edges. We consider several sampling methods, propose novel methods to check the goodness of sampling, and develop a set of scaling laws that describe relations between the properties of the original and the sample. In addition to the theoretical contributions, the practical conclusions from our work are: Sampling strategies based on edge selection do not perform well; simple uniform random node selection performs surprisingly well. Overall, best performing methods are the ones based on random-walks and "forest fire"; they match very accurately both static as well as evolutionary graph patterns, with sample sizes down to about 15% of the original graph.
Article
Microblogging is a new form of communication in which users can describe their current status in short posts distributed by instant messages, mobile phones, email or the Web. Twitter, a popular microblogging tool has seen a lot of growth since it launched in October, 2006. In this paper, we present our observations of the microblogging phenomena by studying the topological and geographical properties of Twitter's social network. We find that people use microblogging to talk about their daily activities and to seek or share information. Finally, we analyze the user intentions associated at a community level and show how users with similar intentions connect with each other.
Article
Extended Abstract Microblogging is a form of online communication by which users broadcast brief text updates, also known as tweets, to the public or a selected circle of contacts. A variegated mosaic of microblogging uses has emerged since the launch of Twitter in 2006: daily chatter, conversation, information sharing, and news commentary, among others (Java et al, 2007). Regard-less of their content and intended use, tweets often convey pertinent information about their authors mood status. As such, tweets can be regarded as temporally-authentic microscopic instantiations of public mood state (O'Connor et al, 2010). Here we perform a sentiment analysis of all public tweets broadcasted by Twitter users between August 1 and December 20, 2008. For every day in the timeline, we extract six dimensions of mood (tension, depression, anger, vigor, fatigue, confusion) using an extended version (Pepe and Bollen, 2008) of the Profile of Mood States (POMS), a well-established psychometric instrument (Norcross et al, 2006; McNair et al, 2003). We compare our results to fluctuations recorded by stock market and crude oil price indices and major events in media and popular culture, such as the U.S. Presidential Election of November 4, 2008 and Thanksgiving Day (see Fig. 1). We find that events in the social, political, cultural and economic sphere do have a significant, immediate and highly specific effect on the various dimensions of public mood. In addition, we found long-term changes in public mood that may reflect the cumulative effect of various underlying socio-economic indicators. With the present investigation (Bollen et al, 2010), we bring about the following methodological contributions: we argue that sentiment analysis of minute text corpora (such as tweets) is efficiently obained via a syntac-tic, term-based approach that requires no training or machine learning. Moreover, we stress the importance of measuring mood and emotion using well-established instruments rooted in decades of empirical psychometric research. Finally, we speculate that collective emotive trends can be modeled and predicted using large-scale analyses of user-generated content but results should be discussed in terms of the social, economic, and cultural spheres in which the users are embedded.
Article
In this paper, we study the linking patterns and discussion topics of political bloggers. Our aim is to measure the degree of interaction between liberal and conservative blogs, and to uncover any differences in the structure of the two communities. Specifically, we analyze the posts of 40 "A-list" blogs over the period of two months preceding the U.S. Presidential Election of 2004, to study how often they referred to one another and to quantify the overlap in the topics they discussed, both within the liberal and conservative communities, and also across communities. We also study a single day snapshot of over 1,000 political blogs. This snapshot captures blogrolls (the list of links to other blogs frequently found in sidebars), and presents a more static picture of a broader blogosphere. Most significantly, we find differences in the behavior of liberal and conservative blogs, with conservative blogs linking to each other more frequently and in a denser pattern.