ArticlePDF Available

Abstract and Figures

Recently, all major search engines introduced a new fea-ture: real-time search results, embedded in the first page of organic search results. The content appearing in these results is pulled within minutes of its generation from the so-called "real-time Web" such as Twitter, blogs, and news websites. In this paper, we argue that in the context of political speech, this feature provides disproportionate ex-posure to personal opinions, fabricated content, unverified events, lies and misrepresentations that otherwise would not find their way in the first page, giving them the opportunity to spread virally. To support our argument we provide con-crete evidence from the recent Massachusetts (MA) senate race between Martha Coakley and Scott Brown, analyzing political community behavior on Twitter. In the process, we analyze the Twitter activity of those involved in exchanging messages, and we find that it is possible to predict their po-litical orientation and detect attacks launched on Twitter, based on behavioral patterns of activity.
Content may be subject to copyright.
From Obscurity to Prominence in Minutes:
Political Speech and Real-Time Search
Panagiotis Takis Metaxas
Wellesley College
Wellesley, MA02481, USA
pmetaxas@wellesley.edu
Eni Mustafaraj
Wellesley College
Wellesley, MA02481, USA
emustafa@wellesley.edu
ABSTRACT
Recently, all major search engines introduced a new fea-
ture: real-time search results, embedded in the first page
of organic search results. The content appearing in these
results is pulled within minutes of its generation from the
so-called “real-time Web” such as Twitter, blogs, and news
websites. In this paper, we argue that in the context of
political speech, this feature provides disproportionate ex-
posure to personal opinions, fabricated content, unverified
events, lies and misrepresentations that otherwise would not
find their way in the first page, giving them the opportunity
to spread virally. To support our argument we provide con-
crete evidence from the recent Massachusetts (MA) senate
race between Martha Coakley and Scott Brown, analyzing
political community behavior on Twitter. In the process, we
analyze the Twitter activity of those involved in exchanging
messages, and we find that it is possible to predict their po-
litical orientation and detect attacks launched on Twitter,
based on behavioral patterns of activity.
Keywords
Social Web, Real-Time Web, US elections, Twitter, Twitter-
bomb, Google
1. INTRODUCTION
The web has become a primary source of information for
most of decision-making situations. In particular, 55% of
all American adults went online in 2008 to get involved in
the political process or to get news and information about
the election, up from 37% in 2004 [8]. Though just 1% of
Americans used Twitter to post their thoughts about the
campaign, the vast majority used search engines to be in-
formed on any issue. It is well established that people trust
search engine results, usually consulting only the first page
of ranked results by the search engine. The belief that they
are receiving trustworthy results is expressed through their
consistent use of their favorite search engine.
Search engines, throughout their evolution, have struggled
with the burden of having to deliver results that are both
relevant to the query and trustworthy. And they have to
wage an everyday war against spammers who use all kind of
tricks to bypass the barriers and land in the first page of the
search results [6]. Being in the first page is widely viewed as
Copyright is held by the authors.
Web Science Conf. 2010, April 26-27, 2010, Raleigh, NC, USA.
.
a strong indicator of reputation and popularity. It takes time
to reach ranking levels that will allow a web site’s link to
appear in the first page of the search results. But spammers,
scammers, or defamatory trouble-makers are in the business
of reaping the rewards of ephemeral success. When they can
trick a search engine to appear, even for a short time, in their
first page search results, they are succeeding in their goal.
By incorporating real-time search results about timely
popular queries in their first page of results, search engines
have introduced a new opportunity for success to tricksters
of all trades. This is especially troublesome in the context
of political speech, where defamation of a candidate, once it
catches the attention of the public, might have far-reaching
consequences (e.g., the “Swift Boat campaign” [10] against
Senator John Kerry).
In this paper, we argue that in the context of political
speech, this feature provides disproportionate exposure to
personal opinions, fabricated content, unverified events, lies
and misrepresentations that otherwise would not find their
way in the first page, giving them the opportunity to spread
virally. To support our argument we provide concrete ev-
idence from the recent Massachusetts senate race between
Martha Coakley and Scott Brown, analyzing the activity of
users on Twitter. In the process, we find that it is possi-
ble to predict their political orientation and detect attacks
launched on Twitter, based on graph-theoretic properties,
statistical properties and behavioral patterns of activity.
The rest of the paper is organized as follows: in Section 2
we show how Twitter messages (known as tweets) are dis-
played in the Google results page. In Section 3 we describe
the data collected during the MA senate race in January
2010 and provide a detailed analysis of deriving the politi-
cal orientation of users, community behavior and patterns
of interaction. In Section 4, we discuss in detail spamming
attacks from within Twitter. Section 5 summarizes our find-
ings and offers some proposals to alleviate some of the con-
cerns raised in the paper.
2. REAL-TIME SEARCH
Google announced on Dec 7, 2009 the introduction of real-
time search [1], which provides fresh results relevant to a
timely query. As people start generating new content, for
example related to a sudden earthquake or a live event on
TV, other people searching around the same time for such
events on the Google search engine will get to see a box of
latest results with the fresh content dynamically scrolling.
The content appearing in these results is pulled from the
so-called “real-time Web” such as Twitter, blogs, and news
1
Table 1: Number and percent of tweets by message
type. Repetitions is a separate category.
Type of tweet Number Percent
replies (@) 13,866 7.47
retweets (RT @) 75,407 40.63
other 96,311 51.91
TOTAL 185,584 100
repetitions 59,412 32.01
websites, within minutes of its generation.
Elections have always increased the public’s interest in fol-
lowing the candidates. It is not surprising, then, that there
was a similar public interest for the January 19, 2010, MA
special election to replace the late Senator Ted Kennedy who
died the year before. The election was contested mainly be-
tween two candidates, the Republican Scott Brown and the
Democrat Martha Coakley. One of the ways that the public
sought information about these two candidates was using
search engines, as Figure 1 shows. Given the recent addi-
tion of the “real-time” content in Google’s search results,
the people who searched for “Martha Coakley” or “Scott
Brown” saw the posts that were displayed around the same
time on Twitter, like those shown in Figure 2 and 3. (Since
the name “Coakley” is not common, searching for it brings
in the same results as for searching for “Martha Coakley”.
This is not the case for “Brown” and so we included the
whole string “Scott Brown” in our searches.)
3. TWEETS DURING THE 2010 MASS.
SPECIAL ELECTION
During the period of January 13 to January 20, 2010, we
monitored and collected the stream of more than 185,000
messages
1
containing the keywords “Coakley” and “Scott
Brown” using the Streaming Twitter API [9]. About 41%
of these messages (see Table 1 and Figure 5) were retweets,
or messages that users had received and posted on their
own account for their followers to see. A small percentage
(7.47%) were replies, or messages directed towards another
user. Interestingly, one out of three tweets was repetition of
another identical message.
These messages were posted by almost 40,000 users in
the period of 7 days, but not all users were equally active.
The number of posts follows a power law-like distribution,
as can be seen on Table 2. Based on their activity levels,
we divided the users into three broad categories: Those who
sent at least 100 messages, (there were 205 such users; we
refer to them as top200 ); those who sent between 100 and
30 messages, (there were 765 such users; we refer to them
as topK ) and the remaining who sent less than 30 messages
(we refer to them as the low39K ).
3.1 Show me your friends, and
I tell you who you are
For the top200 users we also retrieved the friend and
1
We have recently discovered about 50 thousand more
tweets recorded during this period, but due to time con-
straints we have not included them in the analysis of this
paper.
Table 2: Number of messages posted by users fol-
lows a power law-like distribution.
Number of messages Number of users
1 22482
2-3 9121
4-7 4090
8-15 2002
16-31 1093
32-63 524
64-127 227
128-255 88
256-511 36
512-1024 10
TOTAL 39,673
follower networks (Twitter API provides two social graph
methods that allow to get the list of all followers or friends
of a user. For privacy reasons, the list contains user IDs, in-
stead of account names). Using graph-theoretic techniques,
we drew their follower connections using a force-directed al-
gorithm (see, e.g., [4]) and we found that the group clearly
separated itself in two major components, as evident in Fig-
ure 4. The larger group is composed of 175 users leaning
conservative, 29 users leaning liberal, one neutral, displayed
as a light blue node at the top of the graph (who is on a
mission to end the use of robo-calls by both candidates)
and one spammer displayed as a light-colored node in the
middle on the figure’s right margin (who is likely trying to
monitor the twitter trends over time). The figure reveals a
number of other unconnected users, most of which did not
have a Twitter account anymore at the time of this writing.
We suspect that Tweeter deleted them as spammers, due
to their unusual activity (high volume in a short period of
time, no friends or followers). We report on the activity of
most of them in Section 4.
The determination of the the political orientation of the
top200 group was done both manually (by reviewing the
users’ self-description or some of their messages) and auto-
matically by searching for some obvious sentiment-revealing
short phrases, such as “Go Scott Brown!”). Perhaps as a
clear indication of the validity of the well-known proverb
“show me your friends, and I tell you who you are” in so-
cial networks, the graph algorithm accurately guessed 98%
of users’ political orientation from the top200 group. In a
later paper, we report of the success of determining political
orientation for all the users using a combination of graph
theoretic and automatic mining methods.
3.2 Whose message would you RT?
As we mentioned earlier, a large percentage of messages
in our corpus were retweets of other messages. In Twitter
vocabulary, a retweet is recognized by the initial phrase “RT
@originalSender”, where originalSender is the user name of
the person who sent the original tweet. Interestingly, many
of these messages were not simple RTs, but sequences of up
to 7 RTs, as the table 3 shows. This fact made us won-
der about the purpose of such activity. We formulated the
following hypothesis:
2
Figure 1: Google Trends for keywords “Coakley” and “Scott Brown” in January, 2010, shows a huge increase
in searches during the week leading to the special election (January 19, 2010) and for a couple days after
that.
Figure 2: Real-time results for the phrase “Scott Brown” displayed on the first page of Google’s search
results. Retrieved on January 15, 2010, four days before the election.
Figure 3: Real-time results for the phrase “Martha Coakley” displayed on the first page of Google’s search
results. Retrieved on January 15, 2010.
One is much more likely to retweet a message coming from
an original sender with whom one agrees (shares political
orientation).
One way to test this hypothesis was to test it on the fully-
characterized top200 group. Members of this group sent
10,008 RTs. The results, shown in Figure 6, shows this to
be largely true: 96% of liberals and 99% of conservatives did
so. The few messages that did not follow the overall trend
are retweets with a negative commentary. By and large,
users were very unlikely to retweet a message that they did
not agree with. We should note, however, that the results
are skewed by the fact that many users may not see the
messages of users they do not follow, unless someone who
does, retweets that message.
About 57% of the retweeted messages were between mem-
bers of the low39K group which included those that sent a
small number of messages overall. We are currently analyz-
ing whether this hypothesis is able to distinguish as clearly
the political orientation of members of the low39K group as
well.
3
Figure 4: Two groups of users based on the followers
graph. The graph is created using a force-directed
drawing algorithm which draws nodes sharing many
neighbors closer to those who do not.
Figure 5: Overall characterization of corpus.
Table 3: Number of chain retweet messages.
RTs/msg Number of messages
1 47730
2 21090
3 5349
4 939
5 149
6 47
7 18
TOTAL 75322
3.3 Repeating the same message
Since many of the users were aware of the fact that Google’s
search results were featuring Twitter trends, it made sense
that they would repeat the same message in the hope that
this message will show up in the first page of the search re-
sults. In fact, a surprisingly high number of tweets in our
corpus, (one out of three tweets or 59,412 messages) are
repetition of 16,453 different messages. Moreover, our data
show that the top200 group was far more likely to repeat
messages (see Figure 7). We believe that this fact shows
awareness of the new role that real-time web plays, since it
Figure 6: Both liberal-leaning and conservative-
leaning users did not retweet messages they clearly
did not agree with, though they retweeted 40% of
all the messages.
Figure 7: The members of the top200 group, both
liberals and conservatives, were far more likely to
repeat a message (about 70 times) compared to the
members of the other groups. This behavior reveals
a highly motivated group who try to influence their
followers and dominate search results on a topic.
does not make sense to bombard your followers, with whom
you greatly agree, with the same message.
We discovered several threads of conversation that reveal
the interest of the involved communities in following the
real-time web, by discussing how certain phrases are trend-
ing in Google or Twitter, as well as encouraging others to
google for a certain phrase they would like to see trending.
Additionally, users are aware that by googling often for a
person or topic, spikes in Google searches will attract me-
dia reports that attribute to such spikes a predictive power,
noticed in previous political races [3]. Because metrics such
as Google searches, number of views in YouTube, number of
followers in Facebook or Twitter, or Twitter trending top-
ics are being publicized as indicators that show advantage
of one candidate over the other (because of greater public
interest in them), we see a tendency from communities to
skew these numbers toward their desired outcome.
3.4 Why would you reply?
If retweeting indicates agreement with the message, and
repeating the same message multiple times indicates an ef-
fort to motivate the community and influence the Google
search, what does it mean when you choose to reply to a
message? We hypothesized that this direct engagement with
the person who sent the message indicates that you are will-
4
Figure 8: Despite their overall high activity, the
top200 users spent very little time replying to oth-
ers. The majority of such messages were directed
towards users of topK and low39K groups.
Figure 9: The reply activity of the top200 users show
a topology of closer engagement.
ing to be involved in an argument with the sender over some
issue in our case, the special elections.
While retweeting and repeating involves low levels of hu-
man activity (the press of a button or maybe the action of
a computer program), truly replying requires time and en-
ergy. Not surprisingly, therefore, only 7.4% of all the mes-
sages were replies. Interestingly, the vast majority of the
replies did not come from the top200 users, despite their
large message volume. Only 28.7% of replies were sent by
the top200 group, and a meager 7.4% of their replies were
directed to members of the top200 group. We present the
following data (Figure 8) with the note that they are drawn
from a very small part of our corpus (1016 messages).
Another way to visualize the reply-activity of the top200
users is offered in Figure 9. This is also drawn with the force-
directed algorithm. Note that the two groups are not sepa-
rable based on their reply behavior. We observed, however,
that a small number of top200 accounts were responsible for
many of the replies, in an attempt to flood the network with
spam, as the next section 4 describes.
4. REPLYING AS A SPAM ATTACK
The common way in which spam works (independently of
the distribution, by email, a web ad, or a tweet) is to provide
a link to a website, that a user would likely not visit oth-
erwise. Until recently, the best-known method of political
spam on the Web involved the involuntary help of search en-
gines. It has been widely reported in the news that, in 2006,
political blogs had been actively trying to influence the US
elections by pushing web pages carrying negative content to
the top of the relevant search results of the major search
engines. This practice of “gaming” the search engines was
implemented with link bombing techniques (also known as
Googlebombing), in which web site masters and bloggers
use the anchor text to associate an obscure, negative term
with a public entity [5]. In particular, during the 2006 US
midterm congressional election, a concerted effort to manip-
ulate ranking results in order to bring to public attention
negative stories about Republican incumbents running for
Congress took openly place under the solicitation of the lib-
eral blog, MyDD.com (My Direct Democracy) [11]. Google
took steps to curb such activity by promoting uncontrover-
sial results in the first page, and it was found that political
spammers were not very successful in the 2008 Congressional
elections [7]
Thus, our search for spammers started with the analysis
of tweets containing links. We extracted links and ranked
them by their frequency in the corpus. Some of the links
were expected, such as, mybarackobama, or the two cam-
paign websites of the candidates, brownforussenate and
marthacoakley. However, there were some unexpected links
as well. One of them was coakleysaidit, which appeared
1088 times. Analyzing the content of the tweets containing
this link, we discovered a concentrated spam attack. The
tweets containing the links originated from 9 Twitter ac-
counts, created within a 13 minutes interval, as shown in
Table 4. The names of the accounts are related to the name
of the website and are similar with each other. A domain
lookup for coakleysaidit reveals that the website was also
registered in the same day of their creation, January 15,
2010, using a service that hides the domain’s owner iden-
tity.
It turns out that two months later, this web site was even-
tually signed. The group that signs the web site is a Repub-
lican group from Iowa that has been accused in the past
of being behind several other attacks on Democratic candi-
dates, including the “Swift Boat” attack [2].
An analysis of the spam attack shows that these 9 ac-
counts sent 929 tweets addressed to 573 unique users in the
course of 138 minutes. All tweets have the identical signa-
ture @account Message URL. Some examples of such tweets
are shown in Table 5. We discovered that there are 10 unique
text messages and 2 unique shortened URLs, both point-
ing to the same website. When treating all the volume of
tweets as coming from one spammer, the median interval
between two tweets is 1 second. Our assumption is that the
attacker used an automatic script that randomly picked a
user account, a text message, and a URL; packaged them in
a tweet; and sent it by randomly choosing as sender one of
the 9 spam accounts. While this seems as a good strategy
to circumvent Twitter spamming detectors and may qualify
as the first example of a Twitter-bomb, the attack was nev-
ertheless discovered and all the spam accounts suspended.
The success of a Twitter-bomb relies on two factors: tar-
geting users interested in the spam topic and relying on those
users to spread the spam further. Especially the second fac-
tor is important, since spam accounts created only a few
5
Table 4: Accounts created for a spam attack
Account Name Creation Time (EDT) Nr. of tweets
CoakleySaidWhat Jan 15 18:43:46 2010 28
CoakleyWhat Jan 15 18:44:55 2010 127
CoakleySaidThat Jan 15 18:46:12 2010 125
CoakleyAgainstU Jan 15 18:48:21 2010 127
CoakleyCatholic Jan 15 18:50:22 2010 127
CoakleyER Jan 15 18:52:05 2010 127
CoakleyAG Jan 15 18:53:17 2010 32
CoakleyMass Jan 15 18:54:31 2010 109
CoakleyAndU Jan 15 18:56:02 2010 127
hours before an attack have 0 followers, thus, no one would
read their messages. The strategy used to find users inter-
ested in the topic, is a common spamming techqnique in
Twitter: collect tweets that contain some desired keywords
and find out the users who sent these tweets. Then, send a
reply to these users and hope they will act upon it. There
was a 4 hour interval between the creation of the accounts
and the timestamp of sent messages and during that time,
the attacker collected accounts that were tweeting about the
senate race. In fact, 96% of the targeted accounts are also
in our corpus posting in that time interval.
The attack was successful in terms of reaching the Twit-
ter accounts of many users. We found 143 retweets in our
corpus, the first after 5 minutes and the last after 24 hours
of the attack. To estimate the audience of these messages,
we calculated the set of all unique followers of the users that
retweeted the original tweets. The audience size amounts to
61,732 Twitter users.
On the other hand, the effect of this attack could be seen
as “preaching to the choir:” If the networks of friends and
followers of the people following this campaign are as sepa-
rate as the ones we observed in the top200 group (Figure 4),
far fewer undecided potential voters would have seen the
message. But the attack would certainly have the effect of
exciting the anti-Coakley conservatives.
While we cannot know how many of these users either
read or acted upon these tweets (by clicking on the pro-
vided URL), the fact that a few minutes of work, using
automated scripts and exploiting the open architecture of
social networks such as Twitter, makes possible reaching a
large audience for free (compared to TV and radio ads which
cost several thousands of dollars), raises concerns about the
deliberate exploitation of the medium.
Therefore, analyzing the signature of such spam attacks is
important, because it helps in building mechanisms that will
automatically detect such attacks in the future. An example
is shown in Figure 10, which depicts the hourly rate of sent
tweets during the 26 hours that include the attack timeline
for the top 10 most active users. Accounts U5 to U10 be-
long to the spam attackers and it can be noticed that they
have an identical signature (going from 0 to almost 60 tweets
per hour). Thus, an averaged hourly sending rate would be
a good distinguishing feature, though not sufficient. Cur-
rently, we are investigating a combination of features that
take into account data on the source of the tweet (web, API,
mobile web, etc.), the number of followers of the sender, the
number of total tweets, the life of the account, etc.
Our experiments with Google real-time search has shown
that, even though Google doesn’t display tweets from users
that have a spammer signature, it does display tweets from
non-suspected users, even when these are retweets coming
from spammers. Thus, simply suspending spamming attacks
is not sufficient. There should be some mechanism that
allows for retroactively deleting retweets of spam and some
mechanism that labels some Twitter users as enablers of
spam.
5. CONCLUSION
The introduction of real-time search results gives a search
engine an aspect of social network communication, which
recently has seen dramatic growth. But, by its current im-
plementation by search engines, it also opens the door to
exploitation and easy spamming. Currently, there is no way
for the users to have any way of evaluating the trustworthi-
ness of the real-time results, and the vast majority of the
population that is not familiar with the way Twitter and
blogs operate are likely to be fooled. In the political arena,
it makes possible for a small fraction of the population to
hijack the trustworthiness of a search engine and propagate
their messages to a huge audience for free, with little effort,
and without trace. We expect that, unless addressed by the
search engines, this practice will intensify during the next
Congressional elections in 2010.
6. ACKNOWLEDGEMENTS
Part of this research was funded by a Brachman-Hoffman
grant.
7. REFERENCES
[1] A. Singhal. Relevance meets the real-time web.
http://googleblog.blogspot.com/2009/12/relevance-
meets-real-time-web.html, Dec., 7
2009.
[2] J. Hancock. Secrets of the American Future Fund.
http://iowaindependent.com/4203/
secrets-of-the-american-future-fund, 2008.
[3] R. Klein. Is Scott Brown closing the GOP technology
gap? http://blogs.abcnews.com/thenote/2010/01/is-
scott-brown-closing-the-gop-technology-gap.html,
Jan., 18, 2010.
[4] S. G. Kobourov. Force-Directed Drawing Algorithms.
CRC Press, R. Tamassia (ed.), Handbook of Graph
Drawing and Visualization, 2010.
6
Figure 10: Top-10 users activity XXX.
Table 5: Tweets from spamming accounts
@account Message URL
@theRQ AG Coakley thinks Catholics shouldn’t be in the ER, take action now! http://bit.ly/8gDSp5
@Leann az Tell AG Coakley not to discriminate against Catholics in medicine! http://bit.ly/8gDSp5
@mabvet Catholics can practice medicine too! Tell AG Coakley today. http://bit.ly/7yXbTd
@BrianD82 Sign the petition to AG Coakley today. We won’t tolerate discrimination of any kind! http://bit.ly/8gDSp5
[5] T. McNichol. Engineering google results to make a
point. New York Times, January 22., 2004.
[6] P. T. Metaxas. On the evolution of search engine
rankings. In In the Proceedings of the 2009 WEBIST
Conference, March 2009.
[7] P. T. Metaxas and E. Mustafaraj. The battle for the
2008 us congressional elections on the web. In In the
Proceedings of the 2009 WebScience: Society On-Line
Conference, March 2009.
[8] The Pew Foundation. The Internet’s Role in
Campaig n 2008. Published at
http://www.pewinternet.org/Reports/2009/6–The-
Internets-Role-in-Campaign-2008.aspx, New York,
2010.
[9] Twitter. Streaming API documentation.
http://apiwiki.twitter.com/
Streaming-API-Documentation, 2010.
[10] Wikipedia. Swift vets and pows for truth.
http://en.wikipedia.org/wiki/Swift Vets and POWs for Truth,
Retrieved on March 25, 2010.
[11] T. Zeller Jr. Gaming the search engine, in a political
season. New York Times, November 6., 2006.
7
... Studies examining the spread of fake news have largely focused on the manner in which it spreads (e.g., Metaxas & Mustafaraj, 2010;Mustafaraj & Metaxas, 2017) and the characteristics of fake news that encourage its sharing (for a review, see Zhou & Zafarani, 2020). For example, fake news spreads by having fictitious user accounts infiltrate into a community of social media users who are engaged in conversations on a particular topic; these accounts then hijack discussions with fake news, which then spread organically via sharing and are consequently disseminated into the extended networks of users (Mustafaraj & Metaxas, 2017). ...
Article
Full-text available
Research on the sharing of fake news has primarily focused on the manner in which fake news spreads and the literary style of fake news. These studies, however, do not explain how characteristics of fake news could affect people's inclination toward sharing these news articles. Drawing on the Terror Management Theory, we proposed that fake news is more likely to elicit death-related thoughts than real news. Consequently, to manage the existential anxiety that had been produced, people share the news articles to feel connected to close others as a way of resolving the existential anxiety. Across three experimental studies (total N = 416), we found that it was not news type per se (i.e., real versus fake news) that influenced news-sharing intentions; instead, it was the increased accessibility to death-related thoughts elicited from the content of news articles that motivated news-sharing. The findings support the Terror Management framework and contribute to the existing literature by providing an empirical examination of the underlying psychological motive behind fake news-sharing tendencies.
... There is proof that social bots are crucial in the propagation of fake news and misinformation [26] [45][42] [66]. Moreover, as the bots improve how to simulate the human behavior, the line between the human user and this socio-technical entity becomes less clear [28], causing concern in the participation of bots in political events because of the negative effect on the quality of democracy [63]. This fact has motivated the development of many bot detection techniques during the last few years [27], not always being successful in completely solving the problem [28]. ...
Preprint
Full-text available
Bot Detection is an essential asset in a period where Online Social Networks(OSN) is a part of our lives. This task becomes more relevant in crises, as the Covid-19 pandemic, where there is an incipient risk of proliferation of social bots, producing a possible source of misinformation. In order to address this issue, it has been compared different methods to detect automatically social bots on Twitter using Data Selection. The techniques utilized to elaborate the bot detection models include the utilization of features as the tweets metadata or the Digital Fingerprint of the Twitter accounts. In addition, it was analyzed the presence of bots in tweets from different periods of the first months of the Covid-19 pandemic, using the bot detection technique which best fits the scope of the task. Moreover, this work includes also analysis over aspects regarding the discourse of bots and humans, such as sentiment or hashtag utilization.
... An interesting study [68] analyzed the real-time search option implemented by the real-time websites, such as Twitter. According to the article, when considering political topics, the search provides results that would not be found on the first page while surfing the Web. ...
Book
Full-text available
This book revises the strategic objectives of Information Warfare, interpreting them according to the modern canons of information age, focusing on the fabric of society, the economy, and critical Infrastructures. The authors build plausible detailed real-world scenarios for each entity, showing the related possible threats from the Information Warfare point of view. In addition, the authors dive into the description of the still open problems, especially when it comes to critical infrastructures, and the countermeasures that can be implemented, possibly inspiring further research in the domain. This book intends to provide a conceptual framework and a methodological guide, enriched with vivid and compelling use cases for the readers (e.g. technologists, academicians, military, government) interested in what Information Warfare really means, when its lenses are applied to current technology. Without sacrificing accuracy, rigor and, most importantly, the big picture of Information Warfare, this book dives into several relevant and up-to-date critical domains. The authors illustrate how finance (an always green target of Information Warfare) is intertwined with Social Media, and how an opponent could exploit these latter ones to reach its objectives. Also, how cryptocurrencies are going to reshape the economy, and the risks involved by this paradigm shift. Even more compelling is how the very fabric of society is going to be reshaped by technology, for instance how our democratic elections are exposed to risks that are even greater than what appears in the current public discussions. Not to mention how our Critical Infrastructure is becoming exposed to a series of novel threats, ranging from state-supported malware to drones. A detailed discussion of possible countermeasures and what the open issues are for each of the highlighted threats complete this book. This book targets a widespread audience that includes researchers and advanced level students studying and working in computer science with a focus on security. Military officers, government officials and professionals working in this field will also find this book useful as a reference.
... An interesting study [68] analyzed the real-time search option implemented by the real-time websites, such as Twitter. According to the article, when considering political topics, the search provides results that would not be found on the first page while surfing the Web. ...
Chapter
Full-text available
Since the dawn of Humanity, the progress machine tirelessly introduced tools and resources that facilitated our everyday tasks. Over the years, new technologies have continually changed society with novel discoveries and inventions that proved capable of greatly improving human life. Historically, many of the processes that radically changed human lifestyle occurred gradually. However, in the past few decades, modern technology has enabled a fast and radical change of our society, modifying our habits, production means, and in some cases the very essence of work, through the widespread adoption of a plethora of new devices comprising smartphones, voice assistants, chatbots and smartwatches that made our lives faster, easier, and funnier. Technology is also introducing new habits and addictions, changing every aspect of our society such as personal interactions, education, communication, financial services, physical goods production, logistics, and entertainment. This is happening in parallel with a wild race to the digitization of information.
... An interesting study [68] analyzed the real-time search option implemented by the real-time websites, such as Twitter. According to the article, when considering political topics, the search provides results that would not be found on the first page while surfing the Web. ...
Chapter
Full-text available
Technology has, to different degrees, always been part of the financial world, starting from the 1950s with the introduction of credit cards and ATMs, passing through electronic trading floors and personal finance apps, until present days where technologies such as Artificial Intelligence (AI), High-Frequency Trading (HFT), and cryptocurrencies are widespread. The prominent role of technology in finance has become so important as to obtain a specific term to describe the intersection between the two—that is, FinTech. A portmanteau of “financial technology,” FinTech refers to the application of new technological advancements to products and services in the financial industry. The definition is rather broad and also encompasses “innovative ideas that improve financial service processes by proposing technological solutions according to different business situations, while the ideas could also lead to new business models or even new businesses.” Following the previous definitions, FinTech cannot be categorized as a brand new industry but rather as one that has evolved at an extremely rapid pace.
Article
Bot Detection is crucial in a world where Online Social Networks (OSNs) play a pivotal role in our lives as public communication channels. This task becomes highly relevant in crises like the Covid-19 pandemic when there is a growing risk of proliferation of automated accounts designed to produce misinformation content. To address this issue, we first introduce a comparison between supervised Bot Detection models using Data Selection. The techniques used to develop the bot detection models use features such as the tweets’ metadata or accounts’ Digital Fingerprint. The techniques implemented in this work proved effective in detecting bots with different behaviors. Social Fingerprint-based methods have been found to be effective with bots that behave in a coordinated manner. Furthermore, all these approaches have produced excellent results compared to the Botometer v3. Second, we present and discuss a case study related to the Covid-19 pandemic that analyses the differences in the discourse between bots and humans on Twitter, a platform used worldwide to express opinions and engage in dialogue in a public arena. While bots and humans generally express themselves alike, the tweets’ content and sentiment analysis reveal some dissimilitudes, especially in tweets concerning President Trump. When the discourse switches to pandemic management by Trump, sentiment-related values display a drastic difference, showing that tweets generated by bots have a predominantly negative attitude. However, according to our findings, while automated accounts are numerous and active in discussing controversial issues, they usually do not seem to increase exposure to negative and inflammatory content for human users.
Conference Paper
We use a dynamical systems perspective to analyze a collection of 2.4 million tweets known to originate from ISIS and ISIS-related users. From those users active over a long period of time (i.e., 2+ years), we derive sequences of behaviors and show that the top users cluster into behavioral classes, which naturally describe roles within the ISIS communication structure. We then correlate these classes to the retweet network of the top users showing the relationship between dynamic behavior and retweet network centrality. We use the underlying model to formulate informed hypotheses about the role each user plays. Finally, we show that this model can be used to detect outliers, i.e. accounts that are thought to be outside the ISIS organization but seem to be playing a key communications role and have dynamic behavior consistent with ISIS members.
Article
For more than a decade now, academicians and online platform administrators have been studying solutions to the problem of bot detection. Bots are computer algorithms whose use is far from being benign: malicious bots are purposely created to distribute spam, sponsor public characters and, ultimately, induce a bias within the public opinion. To fight the bot invasion on our online ecosystem, several approaches have been implemented, mostly based on (supervised and unsupervised) classifiers, which adopt the most varied account features, from the simplest to the most expensive ones to be extracted from the raw data obtainable through the Twitter public APIs. In this exploratory study, using Twitter as a benchmark, we compare the performances of four state-of-art feature sets in detecting novel bots: one of the output scores of the popular bot detector Botometer, which considers more than 1,000 features of an account to take a decision; two feature sets based on the account profile and timeline; and the information about the Twitter client from which the user tweets. The results of our analysis, conducted on six recently released datasets of Twitter accounts, hint at the possible use of general-purpose classifiers and cheap-to-compute account features for the detection of evolved bots.
Chapter
Full-text available
The rise of new technologies, including Online Social Network (OSN)s, media sharing services, online discussion boards, and online instant messaging applications, make information production and propagation increasingly fast.
Chapter
Full-text available
Article
Full-text available
Search Engines have greatly influenced the way we experience the web. Since the early days of the web, users have been relying on them to get informed and make decisions. When the web was relatively small, web directories were built and maintained using human experts to screen and categorize pages according to their characteristics. By the mid 1990's, however, it was apparent that the human expert model of categorizing web pages does not scale. The first search engines appeared and they have been evolving ever since, taking over the role that web directories used to play. But what need makes a search engine evolve? Beyond the financial objectives, there is a need for quality in search results. Users interact with search engines through search query results. Search engines know that the quality of their ranking will determine how successful they are. If users perceive the results as valuable and reliable, they will use it again. Otherwise, it is easy for them to switch to another search engine. Search results, however, are not simply based on well-designed scientific principles, but they are influenced by web spammers. Web spamming, the practice of introducing artificial text and links into web pages to affect the results of web searches, has been recognized as a major search engine problem. It is also a serious users problem because they are not aware of it and they tend to confuse trusting the search engine with trusting the results of a search. In this paper, we analyze the influence that web spam has on the evolution of the search engines and we identify the strong relationship of spamming methods on the web to propagandistic techniques in society. Our analysis provides a foundation for understanding why spamming works and offers new insight on how to address it. In particular, it suggests that one could use social anti-propagandistic techniques to recognize web spam.
Article
Full-text available
It has been reported that, in the past, political activists have tried to influence web search results. They did that using link-bombing techniques to raise negative web pages with contents close to the their agendas to the top-10 search results. Google has admitted that this happen in the 2006 US Elections, but did it still happen in the all-important 2008 US Congressional Elections? In this paper we try to evaluate whether "gaming" the search engines during the election period is a widespread problem, how serious is it, and how search engines have tried to maintain the integrity of their search results.
Article
Full-text available
Force-directed algorithms are among the most flexible methods for calculating layouts of simple undirected graphs. Also known as spring embedders, such algorithms calculate the layout of a graph using only information contained within the structure of the graph itself, rather than relying on domain-specific knowledge. Graphs drawn with these algorithms tend to be aesthetically pleasing, exhibit symmetries, and tend to produce crossing-free layouts for planar graphs. In this survey we consider several classical algorithms, starting from Tutte's 1963 barycentric method, and including recent scalable multiscale methods for large and dynamic graphs.
Streaming API documentation
  • Twitter
Twitter. Streaming API documentation. http://apiwiki.twitter.com/ Streaming-API-Documentation, 2010.
Gaming the search engine, in a political season
  • T Zeller
T. Zeller Jr. Gaming the search engine, in a political season. New York Times, November 6., 2006.
Secrets of the American Future Fund
  • J Hancock
J. Hancock. Secrets of the American Future Fund. http://iowaindependent.com/4203/ secrets-of-the-american-future-fund, 2008.
Is Scott Brown closing the GOP technology gap? http://blogs.abcnews.com/thenote/2010/01/is- scott-brown-closing-the-gop-technology-gap.html
  • R Klein
R. Klein. Is Scott Brown closing the GOP technology gap? http://blogs.abcnews.com/thenote/2010/01/is- scott-brown-closing-the-gop-technology-gap.html, Jan., 18, 2010.
Swift vets and pows for truth. http://en.wikipedia.org/wiki/Swift Vets and POWs for Truth
  • Wikipedia
Wikipedia. Swift vets and pows for truth. http://en.wikipedia.org/wiki/Swift Vets and POWs for Truth, Retrieved on March 25, 2010.