Conference PaperPDF Available

A Review on Social Bot Detection Techniques and Research Directions


Abstract and Figures

The rise of web services and popularity of online social networks (OSN) like Facebook, Twitter, LinkedIn etc. have led to the rise of unwelcome social bots as automated social actors. Those actors can play many malicious roles including infiltrators of human conversations, scammers, impersonators, misinformation disseminators, stock market manipulators, astroturfers, and any content polluter (spammers, malware spreaders) and so on. It is undeniable that social bots have major importance on social networks. Therefore, this paper reveals the potential hazards of malicious social bots, reviews the detection techniques within a methodological categorization and proposes avenues for future research.
Content may be subject to copyright.
A Review on Social Bot Detection Techniques and
Research Directions
Arzum Karataş
Department of Computer Engineering
Izmir Institute of Technology
Izmir, Turkey
Serap Şahin
Department of Computer Engineering
Izmir Institute of Technology
Izmir, Turkey
Abstract The rise of web services and popularity of online
social networks (OSN) like Facebook, Twitter, LinkedIn etc. have
led to the rise of unwelcome social bots as automated social actors.
Those actors can play many malicious roles including infiltrators
of human conversations, scammers, impersonators,
misinformation disseminators, stock market manipulators,
astroturfers, and any content polluter (spammers, malware
spreaders) and so on. It is undeniable that social bots have major
importance on social networks. Therefore, this paper reveals the
potential hazards of malicious social bots, reviews the detection
techniques within a methodological categorization and proposes
avenues for future research.
Index Terms Social bots, OSN, Sybils, social bot detection.
Our world has been dominated by online social networks
(OSN) like Facebook, Twitter, and LinkedIn and so on. They
play a pivotal role in our lives as public communication
channels. They provide a platform for their users to involve,
interact, and share information. Therefore, they lead a great
community with the value of attracting for advertisements. Due
to the popularity and rich API of OSNs, they are attractive
targets for exploitations of social bots [1] as well.
A social bot is software to automate user activities. These
activities can be (i) generating pseudo posts which look like
human generated to interact with humans on a social network,
(ii) reposting post, photographs or status of the others, and (iii)
adding comments or likes to posts, (iv) building connections
with other accounts. Therefore, the level of the sophistication
of the bots is diverge. A social bot [2, 3] could be dummy like
bots aggregating information from news, weather news, blog
posts and then reposts them in the social network. On the other
hand, they also can be extremely sophisticated such as
infiltrating human conversations. These capabilities have pros
and cons for users of OSN and they can be used for good or bad
(i). One hand, bots can be designed for good intentions.
They can use to protect anonymity of members as mentioned in
related work or automate and perform tasks much faster than
humans, like automatically pushing news, weather updates or
adding a template in Wikipedia to all pages in a specific
category[4], or sending a thank-you message to your new
followers out of courtesy. They can be designed to be helpful
like virtual assistants for individuals such as Siri
or serving a
user-friendly customer service [5] for the companies and
chatbots like Microsoft’s Tay[6] artificial intelligence bot.
(ii). On the other hand, social bots can be designed for doing
malicious activities such as spamming, malware dissemination,
impersonation, Sybil attack launching and so on.
One of the malicious functionality of social bots is the
power of dissemination of misinformation. For
example, Syrian Electronic Army hacks the Twitter
account of Associated Press and announces the White
House is under attack and Obama is injured. This fake
news lead to a panic and huge loss in the stock market
in 2013 [7].
Another malicious functionality of social bots is that
they are convenient way of propaganda. This malice
activity is so-called astroturfing an attempt to create
a fake impression on real grassroots to support a policy,
individual, product campaign [8]. Concerning this,
Ratkiewicz et al.’s study [9] dissects how Twitter can
be exploited by astroturfing campaigns during the 2010
U.S. midterm elections. According to Boshmaf et
al.[10], as democratic communication platforms, OSN
are one of the key enablers of the recent Arab Spring in
the Middle East in 2011. Additionally, there is a
concern whether automated propaganda sway or not
2016 elections between Trump and Clinton [11].
According to these possibilities, we can assume that
social bots can be very powerful tools to fire social
Another obstacle is that the bots can be leveraged for
getting fake rating and reviews. For example, there are
influence bots that serve this purpose. Also, it is
possible to find many web pages that serve fake
followers and likes even for free by simply searching
on any search engine. Subrahmanian et al. [3] state
some politicians have been accused of buying influence
on social media.
In addition, a social bot can be malicious by
impersonating actual person or an organization, i.e.
identity fraud. One of the evil purpose of impersonation
is to serve promoting ideologies. Via this promotion,
attackers have a power to mislead the individuals on the
networks or create real-looking fake identities. Next,
they are able to use them in malicious activities such as
follower fraud [3, 13] or Sybil attacks consisting of
large-scale bot armies(botnets) with simple OSN
Hence, the main question is focused on “How we separately
detect malicious activities on OSN”. Many techniques are
proposed to detect social bots on OSN in the literature. We
review these techniques within a methodological categorization
and unveil possible research avenues for each category for the
social bot detection. For this purpose, Section II is reserved for
the literature review on the detection techniques. Then, open
problems for the social bot detection techniques are presented
to envision and motivate possible researchers in Section III.
Finally, the work is concluded with a small discussion on
current research directions in Section IV.
For all reasons outlined above (malicious usages of social
bots), computing community has been developing advanced
techniques to detect social bots accurately. Broadly, it is
possible to classify these detection techniques into three
classes: (A) bot detection systems based on social network
topology (i.e. structure-based) information, (B) systems based
on crowdsourcing on user posts and profile analysis, and (C)
systems based on feature-based machine learning methods.
A. Structure-Based (Social Network-Based) Bot Detection
Sybil accounts are the multiple accounts controlled by an
adversary. The naming of “Sybil” term is coming from the
subject of the book Sybil (a woman diagnosed with dissociative
identity disorder [14]). Structure-based detection techniques
focus on detecting Sybil accounts. These accounts are used to
infiltrate OSN, steal private data, disseminate misinformation
and malware. That’s why, Sybil attacks are fundamental threat
for social networks [15-17] . For instance, it was reported in
2015 that around 170 million fake Facebook accounts are
detected as Sybil accounts, then they are deleted [18]. Whereas
Sybils can be generated intentionally by users for benign
purposes such as preserving anonymity; we consider solely
malicious ones as Sybils from this point.
Knowing how Sybil accounts spread on the network is
crucial to detect them especially for this type of detection
techniques. Fundamental assumption underlying the structure-
based Sybil detection is that the social networks generally
shows a homophily tendency [17]. That is, two connected
accounts in OSN have a tendency of having similar
Figure 1. The social network with honest, trusted and Sybil nodes
attributes. Therefore, this assumption grounds the intuition in
Fig. 1. Here the honest and Sybil regions of graph are sparsely
connected and Sybils have small number of connections to
legitimate (honest) users. By large connections the Sybil
communities create a fake trustworthy impression on honest
members of the OSN. It may be useful to note that trusted nodes
in Fig. 1 are already honest and they are specified at
initialization as reference members.
There are many works to solve Sybil detection problem by
using topology (structure) of the network. The analysis of
network topology is a way for the detection of local
communities. Let’s summarize their works with respect to the
methods that they employ:
Random Walk [19]
Generally, the intuition behind leveraging random walks is
that social networks are fast mixing that helps to recognize
Sybils from honest accounts. Fast mixing in this context implies
that short random walks starting from an honest account
quickly reach other honest accounts, whereas it is hard for
random walks starting from Sybils to reach the honest accounts
[20]. At a high level, it can be said that the works that employ
random walks label the nodes as Sybil or honest in the network
from the perspective of a trusted node.
As one of the random walk-based method SybilInfer [21] ,
uses a combination of Bayesian inference and Monte-Carlo
sampling techniques to estimate the set of honest and Sybil
users. It detects a bottleneck cut between honest and Sybil
regions. SybilGuard [22] adopts the assumption that malicious
user can create many Sybils, but the Sybils can have few
connections to honest accounts like in Fig. 1. That is, the
number and sizes are bounded of the honest account. Similarly,
SybilLimit [23] attempts the isolate Sybils based on random
walks. It adopts the same insight with SybilGuard but offers
improved and near-optimality guarantees. SybilRank [24] ranks
the accounts according to their perceived likelihood (landing
probability of short random walks) of being Sybil. Because,
there is limited probability of escaping to Sybil region for a
short random walk starting from a trusted node.
Markov Random Field[25] and Loopy Belief
Propagation [26] .
The assumption that social networks are fast-mixing
presumes one big community or cluster to be valid. However,
Mohaisen et al. [27] show that OSN are not fast-mixing
generally. Similarly, Leskovec et al. [28] demonstrate that OSN
have many small periphery communities that do form small
communities instead of constructing one big cluster
(community). Therefore, Viswanath et al. [29] state that the
Sybil detection problem can be regarded as a community
detection problem. Besides, Boshmaf et al. [30] point out that
structure-based Sybil detection algorithms should be designed
to find local community structures around known honest (non-
Sybil) identities, while incrementally tracking changes in the
network by adding or deleting some nodes and edges
dynamically in some period for better detection performance.
Additionally, Viswanath et al. [29] discover that
dependency on community detection makes more vulnerable to
Sybil attacks where honest identities conform strong
communities. Because Sybils can infiltrate honest communities
by carefully targeting honest accounts. That is, Sybils can be
hidden as just another community on OSN by setting up a small
number of the targeted links. The targeted links are the links
given to the community which contains the trusted node. They
make an experiment by allowing Sybils to place their links
closer to the trusted node instead of random nodes, where
closeness is defined by ranking used by the community
detection algorithm they employ. Hence, Sybil nodes are high
ranked in the defence scheme. Naturally, it leads to Sybils being
less likely to be detected for that attack model because Sybils
are appeared as part of the local community of the trusted node.
Due to the limitations on the fast-mixing assumption, other
studies are done to handle. SybilBelief [17] and SybilFrame
[15] do not use random walks, instead they rely on the Markov
Random Fields and Loopy Belief Propagation to estimate
probabilities of users being honest. While SybilBelief can
incorporate information about known honest and known Sybil
nodes, SybilFrame uses a multi-stage classification mechanism
using local information of users and edges with global graph
structure. In this category, SybilFrame shows the best social bot
detection rate with maximum 68.2% [15].
Additionally, some structure-based Sybil detection systems
like SybilRank also employ “innocent by association”
paradigm [31]: if an identity has an interaction with an innocent
identity, then itself is innocent as well. This is a vulnerable
approach for a smart attacker mimics the structure of legitimate
community. The effectiveness of this paradigm is limited by the
refusal of innocent users to interact with unknown identities as
in the case of LinkedIn. Nevertheless, some real-world social
networks like Twitter and Renren (largest OSN in China) do not
represent strong trust network. Therefore, the detection
schemes employed the paradigm produce high false-negative
B. CrowdSourcing-Based Bot Detection
The success of structure-based Sybil detection schemes has
decreased over time whereas Sybils exploit the vulnerability by
which Viswanath et al. reveal (dependency on community
Cyborg accounts: human assisted bots or bot assisted human accounts
detection vulnerability), which is stated above. For example,
Jiang et al. [32] show that Sybils occasionally connect to other
Sybils. Instead, they target to infiltrate communities of trusted
users [33].
Wang et al. [34] proposed a new approach of applying
human effort (crowdsourcing) like Amazon’s Mechanical Turk
[35] to label accounts. Their insight is that careful users can
detect even slight inconsistencies in account profiles and posts.
They propose a two-layered system containing filtering and
crowdsourcing layer. They offer to use prior automation
techniques such as community detection and network-based
feature selection, and user reports in filtering layer to obtain
suspicious profiles. Then, they apply crowdsourcing for final
decision on classifying accounts either legitimate or Sybil.
According to the authors, their strategy exhibits false positive
and negative rates both below %1 for their simulated system
that contains 2000 profiles combination of 1000 legitimate and
1000 Sybil profiles.
There are three fundamental issues related to leverage of
this strategy. First, privacy of the users should be considered
and personal information of the OSN users should be hidden
before sharing the information with the crowd. Second, large
OSN companies need to hire expert analysts additionally and
small companies cannot afford it. Third, the strategy is not easy
to implement for large OSN because of the existence of huge
number of members. Crowd may need too much time during
decision process when labelling the accounts either bot or
C. Machine Learning-Based Bot Detection
The more social bots are sophisticated with the rise of
Artificial Intelligence (AI), the more they pose risk to even
political issues. That’s why, detecting the bots on OSN become
a challenge. For this reason, DARPA organized a competition
and social structure-based detection techniques are found useful
by none of the contestants [3]. The rise of AI leads to
transcendent machine learning methods as social bot detection
techniques as well. The main idea behind them is to find out
key characteristics of social bots to draw the border between a
human actor and a machine actor. The summary of some
selected works can be found below.
Chu et al. [2] make a study on profiling human, bot, and
. They observe the difference among them in terms of
tweet content, tweeting behaviour, and account properties like
external URL ratio. Lee et al. [36] present a study for social
honeypots for profiling and filtering of content polluters in
social media by using their profile features. Yang et al. [33]
collect Sybil accounts from Renren as ground-truth data set.
Then, they analyse it by using network-based and structured-
based features such as network clustering coefficient, incoming
and outgoing request rate.
SentiBot [37] is a framework for addressing the
classification of human versus social bots. It relies on tweet
syntax like average number of hashtags, semantics like average
topic sentiment, user behaviour like tweet spread, and network-
centric user features like in-degree. The authors of it regard the
number of sentiment related features as key to the identification
of the bots. Therefore, they also employ sophisticated sentiment
analysis techniques.
“Bot or Not?” [38] is the first social bot detection
framework publicly available for Twitter. Its first release is
published in 2014 which is similar to other feature based
detection systems. However, it analyses more than 1000
features and grouped them into 6 classes: network, user, friends,
temporal, content, and sentiment. The authors of the work
implement a detection algorithm heavily depends on these core
features. They state that the overall all accuracy of “Bot or
Not?” is 86% for simple and sophisticated social bots in 2017
It is useful to note that machine learning and the structure
information of OSN together give this detection result. The best
detection rate is achieved by “Bot or Not” with 86% success
rate for this category.
Detection of the bots on OSN are challenging issue. That’s
why, there are some research avenues for peculiar to each
category mentioned in related work.
Social networks contain big data within itself and they
dynamically grow in their nature. Structure-based detection
schemes usually have high running time cost even within a
static (i.e., non-real time) environment. The known best
computational cost for leveraging random-walk is 𝑂(𝑛𝑙𝑜𝑔𝑛).
That’s why, developing a computationally more efficient real-
time graph algorithm for big data processing can be a good
research avenue. Other issue is that the schemes give high false
positive rates with relatively low accuracy, yet. For example,
SybilFrame gives 4.2% false positives (FP), with a
classification accuracy of 95.4%, and the social bot detection
rate is maximum 68.2% as mentioned just above section. False
positives are detrimental to user experience because real users
can respond very negatively. That’s why, a new learning
approach can be employed algorithms to decrease FP and false
negatives (FN) rates on the graph topology. As for community
based-schemes, new approaches for determining trusted nodes
on-the fly is another open area for the researchers. Since,
structure of the network and trusted nodes are in the heart of the
success of structure-based approaches.
Crowdsourcing-based detection schemes leverage human
intelligence against sophisticated social bots equipped with AI
power. Protecting user privacy is a challenge for crowdsourced
detection techniques. That’s why, a work can be done for
increasing ethical awareness of the crowd. In addition, privacy
preserving data mining techniques can be employed for user
privacy. However, the schemes are neither effective nor
applicable in terms of both time and money costs for the crowd
when we regard that OSN are dynamic environments and that
they contain big data.
Bots are continuously evolving by gaining new human-like
behaviours with the rise of AI. As for feature-based machine
learning schemes, some additional features can be explored
employing the-state-of-art machine learning techniques like
deep learning to distinguish a human from a bot. Another issue
is if the Sybils are just controlled bots by an adversary, who the
master is. That’ is the big question: what is the source of these
Sybils? That is, source detection of the Sybils is one of the big
Social networks are powerful tools that connect the millions
of people over the world. Therefore, they are attractive for
social bots as well. Since the possible harm of social bots such
as identity theft, astroturfing, content polluter, follower fraud,
misinformation dissemination etc., there is a need of
recognition of bots and humans each other to avoid undesirable
situations based on false assumptions.
In Table 1, the detection techniques, related works,
limitations for each techniques and contingent research areas
are summarized. Since OSN are dynamic environment and
contains big data itself, possible future solutions need to handle
efficiently both big data processing and dynamic detection.
Besides, the solutions should decrease FN and FP of the
existing solutions while increasing accuracy as much as
possible. Since structure-based techniques needs at least one
trusted node, determining the trusted nodes on-the-fly can be a
possible research direction. The best success rate for these
techniques is maximum 68.2 %, which is achieved by
As it is seen from the table, no research avenue is proposed
since crowdsourcing-based techniques are expensive in both
time and cost of the crowd workforce. As for machine learning-
based techniques, the limitations of them are AI-boosted bots.
However, the remedy of those limitations is advanced AI
techniques like deep learning to determine the features to draw
a line between innocent accounts and Sybils as well. Also, these
techniques can be used to source detection of Sybils as a
research direction. The best success rate for these techniques on
the overall 86%, which is succeeded by “Bot or Not?”.
With the progress of the social bot detection techniques, it
is seen that the higher social bot detection rates (over 80%) are
obtained with the combination of the structure-based properties
of OSN and unsupervised machine learning methods. It is
useful to conduct research on some possible approaches to
increase the detection rate. The approaches may be (i) use of
autonomous-intelligent agent based and (ii) identification-
based approaches as the future directions of researches.
i. Use of autonomous - intelligent agent based
approaches: For example, detection and identification
of community members within the community should
be performed in a decentralized environment. That is,
the detection and analysis tasks are distributed to the
community according to the topology of OSN. In the
Table 1. Summary of detection techniques and research directions
Related Work
Open Research Areas
68.2 %
OSN contains big data inside.
OSN are a dynamic environment
Need of decreasing FN, FP rates
High running time cost
Need of at least one trusted node
Developing more efficient big data
processing algorithms
Dynamic or real-time detection methods
Considering new methods to decrease FN
and FP rates
Determining trusted nodes on-the fly
Wang et al.’s work [34]
Limited size analysis possibility
with sampling method
Privacy issues
High running time cost of the
Cost of crowd workforce
Privacy can be preserved via privacy
preserving data mining algorithms.
Chu et al. ’s work[2]
Lee et al. ’s work [36]
Yang et al. ’s work [33]
Bor or Not?
86 %
OSN contains big data inside.
OSN are a dynamic environment
AI-powered bots
Need of decreasing FN, FP rates
Unknown source of Sybils
Dynamic or real-time detection methods
Employing popular AI techniques (like
deep learning) to detect features to
distinguish a bot from an innocent account
holder and handle big data.
Considering new methods to decrease FN
and FP rates
Source detection of Sybils
environment, intelligent agents aware of their
community boundaries and members should be
present to monitor community activities based on
identified characteristics.
ii. Identification-based approaches: If identification
component of any type of entity is not present; any
technologically and methodologically developed
method to solve this problem can be exploited by
attackers. For example, intelligent agents developed
for automatic and dynamic detection of Sybils should
be supported by trusted identification mechanisms. If
this does not happen, intelligent agents will be targeted
and the attacker will not be detected, and innocent
accounts might be declared as Sybil. This possibly
result in another attack problem.
In this paper, three classes of social bot detection techniques
(i.e., structure-based, crowdsourcing-based and feature-based
machine learning detection techniques) on OSN, their
limitations and detection rates are reviewed. After examination,
it is seen that the most effective and popular one is feature-
based machine learning techniques among them. However, the
rise of AI for development of sophisticated bot creations, the
bottlenecks of real-time big data processing and the need of
source detection for a global identification system lead us to
find out a novel solution. Therefore, research avenues on social
bot detection techniques are reviewed, and prospective methods
to be able to increase the social bot detection rates are proposed
with the intention of opening the doors for researchers to
[1] S. K. Dehade and A. M. Bagade, "A review on detecting
automation on Twitter accounts," Eur. J. Adv. Eng. Technol,
vol. 2, pp. 69-72, 2015.
[2] Z. Chu, S. Gianvecchio, H. Wang, and S. Jajodia, "Detecting
automation of twitter accounts: Are you a human, bot, or
cyborg?," IEEE Transactions on Dependable and Secure
Computing, vol. 9, pp. 811-824, 2012.
[3] V. Subrahmanian, A. Azaria, S. Durst, V. Kagan, A. Galstyan,
K. Lerman, et al., "The darpa twitter bot challenge," arXiv
preprint arXiv:1601.05140, 2016.
[4] (2016, September 12). Wikipedia:Creating a bot. Available:
[5] C. Freitas, F. Benevenuto, S. Ghosh, and A. Veloso, "Reverse
engineering socialbot infiltration strategies in twitter," in
Proceedings of the 2015 IEEE/ACM International Conference
on Advances in Social Networks Analysis and Mining 2015,
2015, pp. 25-32.
[6] E. Ferrera, "The Rise of Social Bots," ed, 2016.
[7] D. Mail, "Syrian Electronic Army linked to hack attack on AP
Twitter feed that 'broke news' Obama had been injured in
White House blast and sent Dow Jones plunging," ed, 2013.
[8] A. Bienkov, "Astroturfing: what is it and why does it matter?,"
in The Guardian, ed, 2012.
[9] J. Ratkiewicz, M. Conover, M. Meiss, B. Gonçalves, A.
Flammini, and F. Menczer, "Detecting and Tracking Political
Abuse in Social Media," ICWSM, vol. 11, pp. 297-304, 2011.
[10] Y. Boshmaf, I. Muslukhov, K. Beznosov, and M. Ripeanu,
"The socialbot network: when bots socialize for fame and
money," in Proceedings of the 27th Annual Computer Security
Applications Conference, 2011, pp. 93-102.
[11] B. Schreckinger. (2016, September 30,2016) Inside Trump's
'cyborg' Twitter army. Available:
[12] Abokhodair, N., Yoo, D., & McDonald, D. W. (2015,
February). Dissecting a social botnet: Growth, content and
influence in Twitter. In Proceedings of the 18th ACM
Conference on Computer Supported Cooperative Work &
Social Computing (pp. 839-851). ACM.
[13] O. Goga, G. Venkatadri, and K. P. Gummadi, "The
doppelgänger bot attack: Exploring identity impersonation in
online social networks," in Proceedings of the 2015 ACM
Conference on Internet Measurement Conference, 2015, pp.
[14] Sybil attack. Available:
[15] P. Gao, N. Z. Gong, S. Kulkarni, K. Thomas, and P. Mittal,
"Sybilframe: A defense-in-depth framework for structure-
based sybil detection," arXiv preprint arXiv:1503.02985,
[16] D. Mulamba, I. Ray, and I. Ray, "SybilRadar: A Graph-
Structure Based Framework for Sybil Detection in On-line
Social Networks," in IFIP International Information Security
and Privacy Conference, 2016, pp. 179-193.
[17] N. Z. Gong, M. Frank, and P. Mittal, "Sybilbelief: A semi-
supervised learning approach for structure-based sybil
detection," IEEE Transactions on Information Forensics and
Security, vol. 9, pp. 976-987, 2014.
[18] J. Parsons. (2015, September 12). Facebook’s War Continues
Against Fake Profiles and Bots. Available:
[19] K. Pearson, "The problem of the random walk," Nature, vol.
72, p. 294, 1905.
[20] B. Carminati, E. Ferrari, and M. Viviani, "Security and trust in
online social networks," Synthesis Lectures on Information
Security, Privacy, & Trust, vol. 4, pp. 1-120, 2013.
[21] G. Danezis and P. Mittal, "SybilInfer: Detecting Sybil Nodes
using Social Networks," in NDSS, 2009.
[22] H. Yu, M. Kaminsky, P. B. Gibbons, and A. Flaxman,
"Sybilguard: defending against sybil attacks via social
networks," in ACM SIGCOMM Computer Communication
Review, 2006, pp. 267-278.
[23] H. Yu, P. B. Gibbons, M. Kaminsky, and F. Xiao, "Sybillimit:
A near-optimal social network defense against sybil attacks,"
in 2008 IEEE Symposium on Security and Privacy (sp 2008),
2008, pp. 3-17.
[24] Q. Cao, M. Sirivianos, X. Yang, and T. Pregueiro. (2016).
SybilRank. Available:
[25] G. R. Cross and A. K. Jain, "Markov random field texture
models," IEEE Transactions on Pattern Analysis and Machine
Intelligence, pp. 25-39, 1983.
[26] K. P. Murphy, Y. Weiss, and M. I. Jordan, "Loopy belief
propagation for approximate inference: An empirical study,"
in Proceedings of the Fifteenth conference on Uncertainty in
artificial intelligence, 1999, pp. 467-475.
[27] A. Mohaisen, A. Yun, and Y. Kim, "Measuring the mixing
time of social graphs," in Proceedings of the 10th ACM
SIGCOMM conference on Internet measurement, 2010, pp.
[28] J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney,
"Community structure in large networks: Natural cluster sizes
and the absence of large well-defined clusters," Internet
Mathematics, vol. 6, pp. 29-123, 2009.
[29] B. Viswanath, A. Post, K. P. Gummadi, and A. Mislove, "An
analysis of social network-based sybil defenses," ACM
SIGCOMM Computer Communication Review, vol. 40, pp.
363-374, 2010.
[30] Y. Boshmaf, K. Beznosov, and M. Ripeanu, "Graph-based
sybil detection in social and information systems," in Advances
in Social Networks Analysis and Mining (ASONAM), 2013
IEEE/ACM International Conference on, 2013, pp. 466-473.
[31] Y. Xie, F. Yu, Q. Ke, M. Abadi, E. Gillum, K. Vitaldevaria, et
al., "Innocent by association: early recognition of legitimate
users," in Proceedings of the 2012 ACM conference on
Computer and communications security, 2012, pp. 353-364.
[32] J. Jiang, C. Wilson, X. Wang, W. Sha, P. Huang, Y. Dai, et al.,
"Understanding latent interactions in online social networks,"
ACM Transactions on the Web (TWEB), vol. 7, p. 18, 2013.
[33] Z. Yang, C. Wilson, X. Wang, T. Gao, B. Y. Zhao, and Y. Dai,
"Uncovering social network sybils in the wild," ACM
Transactions on Knowledge Discovery from Data (TKDD),
vol. 8, p. 2, 2014.
[34] G. Wang, M. Mohanlal, C. Wilson, X. Wang, M. Metzger, H.
Zheng, et al., "Social turing tests: Crowdsourcing sybil
detection," arXiv preprint arXiv:1205.3856, 2012.
[35] (2016). Overview of Mechanical Turk. Available:
[36] K. Lee, B. D. Eoff, and J. Caverlee, "Seven Months with the
Devils: A Long-Term Study of Content Polluters on Twitter,"
in ICWSM, 2011.
[37] J. P. Dickerson, V. Kagan, and V. Subrahmanian, "Using
sentiment to detect bots on Twitter: Are humans more
opinionated than bots?," in Advances in Social Networks
Analysis and Mining (ASONAM), 2014 IEEE/ACM
International Conference on, 2014, pp. 620-627.
[38] C. A. Davis, O. Varol, E. Ferrara, A. Flammini, and F.
Menczer, "Botornot: A system to evaluate social bots," in
Proceedings of the 25th International Conference Companion
on World Wide Web, 2016, pp. 273-274.
[39] O. Varol, E. Ferrara, C. A. Davis, F. Menczer, and A.
Flammini, "Online human-bot interactions: Detection,
estimation, and characterization," arXiv preprint
arXiv:1703.03107, 2017.
... The tasks of social bots include the automatic sharing of news such as the weather or current events [7]. But also, the simple (re)posting of information is part of their work [31]. This kind of bots always try to perform their tasks in a human-like way, so that they are not recognized as a social bot [7]. ...
... This kind of bots always try to perform their tasks in a human-like way, so that they are not recognized as a social bot [7]. It allows them to interact with people within a network and create comments and likes [31]. ...
... Interactions of social bots on social media network sites have been observed for years [31,32]. The main goal of those bots is to distribute ideological content on a specific platform to the largest possible audience [33]. ...
Conference Paper
Full-text available
Social media have become major platforms of commerce and changed the way we communicate and consume. Phenomena such as social bots add new dynamics to discussions and the spreading of information with the possible aim to influence or shape opinions and decisions. This study examines the requirements under which organizations would use social bots for commercial purposes. Interviews with 12 experts yielded a collection of requirements, including limitations, ethical considerations, and potentials for possible uses in marketing, social commerce, and customer service. It can be concluded that using social bots can be beneficial for commercial organizations, but that there is still a need for clarification of legalities.
... Which particular algorithm is the best to use is difficult to predict. But as follows from the reviews [37,20], the most popular is Random Forest. Other methods include AdaBoost, Supported Vector Machine, K-Nearest Neighbor Search, and various types of neural networks (Long Short-Term Memory, Convolutional Neural Networks, Back Propagation Neural Networks). ...
... These results are comparable to those obtained by other researchers. As follows from the reviews [37,20], most of the classifiers developed by researchers have about 0.9 AUC ROC or higher. shows that it is possible to achieve very good performance (AUC ROC more than 0.9 and FPR lower than 0.1 for neural network) relying only on data about the friends graph. ...
Full-text available
In this paper, we propose a machine learning approach to detecting malicious bots on the VKon-takte social network. The advantage of this approach is that only the friends graph is used as the source data to detect bots. Thus, there is no need to download a large amount of text and media data, which are highly language-dependent. Besides, the approach allows one to detect bots that are hidden by privacy settings or blocked, since the graph data can be set indirectly.To do this, we calculate graph metrics using a number of graph algorithms. These metrics are used as features in classifier training.The paper evaluates the effectiveness of the proposed approach for detecting bots of various quality-from low quality to paid users. The resistance of the approach to changing the bot management strategy is also evaluated. Estimates are given for two classifiers-a random forest and a neural network.The study showed that using only information about the graph it is possible to recognize bots with high efficiency (AUC-ROC greater than 0.9). At the same time, the proposed solution is quite resistant to the emergence of bots having a new management strategy.
... Deep learning appears to be the most promising approach for textual content classification due to its automatic feature extraction and holistic language representation as demonstrated by empirical results in NLP [9]. The fact that the most advanced cyber bots have also used and benefited from the fast-paced improvements in machine learning [17] further supports our decision to focus on deep learning techniques to detect and defeat them, as they can leverage the same benefits. We employ a set of neural network architectures for distinguishing deep fake bot-generated Tweets from human-written Tweets. ...
... Their performance results are directly comparable with ours. For a more detailed review of social media bot detection techniques and related work, we refer the reader to [3] and [17]. ...
Full-text available
Social networks are playing an increasingly important role in modern society. Social media bots are also on the rise. Bots can propagate misinformation and spam, thereby influencing economy, politics, and healthcare. The progress in Natural Language Processing (NLP) techniques makes bots more deceptive and harder to detect. Easy availability of readily deployable bots empowers the attacker to perform malicious activities; this makes bot detection an important problem in social networks. Researchers have worked on the problem of bot detection. Most research focus on identifying bot accounts in social media; however, the meta-data needed for bot account detection is unavailable in many cases. Moreover, if the account is controlled by a cyborg (a bot-assisted human or human-assisted bot) such detection mechanisms will fail. Consequently, we focus on identifying bots on the basis of textual contents of posts they make in the social media, which we refer to as fake posts. NLP techniques based on Deep Learning appear to be the most promising approach for fake text detection. We employ an end-to-end neural network architecture for deep fake text detection on a real-world Twitter dataset containing deceptive Tweets. Our experiments achieve the state of the art performance and improve the classification accuracy by 2% compared to previously tested models. Moreover, our content-level approach can be used for fake posts detection in social media in real-time. Detecting fake texts before it gets propagated will help curb the spread of misinformation.
... The results are comparable to those obtained by other researchers. As follows from the reviews [OMAAK20,KŞ17], most of the classifiers developed by researchers have about 0.9 AUC ROC or higher. Standard test shows that it is possible to achieve very good performance (AUC ROC and precision more than 0.9 for neural network) relying only on data about the friends graph and profile statistical data of friends. ...
Full-text available
In this thesis, we propose machine learning techniques to detecting and characterizing malicious bots in social networks. The novelty of these techniques is that only interaction patterns of friends' of analysed accounts are used as the source data to detect bots. The proposed techniques have a number of novel advantages. There is no need to download a large amount of text and media data, which are highly language-dependent. Furthermore, it allows one to detect bots that are hidden by privacy settings or blocked, to detect cam- ouflages bots that mimic real people, to detect a group of bots, and to estimate quality and price of a bot. In the developed solution, we propose to extract the input data for the analysis in form of a social graphs, using a hierarchical social network model. After, to construct features from this graph we use statistical methods, graph algorithms, and methods that analyze graph embedding. And finally, we make a decision using a random forest model or a neural network. Based on this schema we propose 4 techniques, that allows one to perform the full cycle attack detection pipeline - 2 techniques for bot detection: individual bot detection, and group bot detection; and 2 techniques for characterization of bots: estimation of bot quality, and estimation of bot price. The thesis also presents experiments that evaluate the proposed solutions on the example of bot detection in VKontakte social network. For this, we developed the software prototype that implements the entire chain of analysis - from data collection to decision making. And in order to train the models, we collected the data about bots with different quality, price and camouflage strategies directly from the bot sellers. The study showed that using only information about the graphs of friends it is possible to recognize and characterize bots with high efficiency (AUC - ROC ˜ 0.9). At the same time, the proposed solution is quite resistant to the emergence of new types of bots, and to bots of various types - from automatically generated and hacked accounts to users that perform malicious activity for money.
... (I). Por un lado, los bots pueden diseñarse con buenas intenciones (Karatas y Sahin, 2017) En la Tabla 1 se presenta un reporte de 2016 donde se da cuenta de la magnitud de los bots, que juntando los que tienen buenas y malas intenciones llegan al 51.8% del tráfico de redes sociales como Twitter (Barabasi, 2018) Asimismo, en el proceso electoral de 2018 en México, el 53% de los seguidores de los 4 candidatos a la Presidencia de la República fueron catalogados como bots (Barabasi, 2018 ejercen las plataformas digitales, y dadas sus características no llega a consolidarse una determinación de lo que es real, y por lo tanto puede considerarse que alteran las tendencias de percepción pública. En las elecciones presidenciales de 2018 en México, los datos demuestran que el 97% de los usuarios buscaron información sobre la campaña electoral en redes sociales, (Tabla 1). ...
Full-text available
El presente texto tiene el objetivo de exponer las consecuencias sociales, económicas, políticas y culturales del poder de los modelos algorítmicos de control de la conducta humana en las redes sociales. Consecuencias como el incremento de teorías conspirativas, tendencias políticas, desinformación, caos económico, entre otras. Así como las posibilidades de contrapoder y reprogramación de esos modelos desde la sociedad.
In recent times, social networks are actively emerging as an encouraging platform for democratic discussions on social, media, and political issues. However, the growing popularity of social networks leads to a surge of concerns on the quality of the shared data. Initially, bots are created with good intentions. Nowadays, bad bots are created and powered by AI to perform malicious activities in social network sites. Statistics say that 94.2% of the entire websites in the world are being affected by the Bots. They are more sophisticated, intelligent enough to mimic like humans and can change their signatures dynamically. It's hard to detect such kind of bots with the existing techniques as the new bots keep evolving. Social networks are growing exponentially. As a result, big data is growing unconditionally. So, we need more computationally efficient algorithms running at a higher detection rate to handle big data. For our study, we used the real-time dataset from the bot repository released by Indiana University. In our work, we propose a new custom bot classifier algorithm to predict the bot accounts. The algorithm detects the bots with an accuracy rate of 96.13%. Furthermore, we made a comparative analysis of our algorithm with existing machine learning algorithms using various classification metrics.
Community detection is an important problem in network science that discovers highly clustered groups of nodes having similar properties. Label propagation algorithm (LPA) is one of the popular clustering techniques that has attracted much attention due to its efficiency and non-dependence on parameters. Despite its advantages, an important limitation of LPA is the randomness in grouping nodes that leads to instability and the formation of large communities. In this study, we propose four variants based on LPA to overcome the random community allocation problem, henceforth generating better clusters. These variants utilize link strength and node attribute information to enhance the quality of detected communities. Furthermore, the proposed variants require no parameter for categorical attributes and only one parameter (constant for all the networks) for continuous attributes. Finally, the best results are obtained by Variant I having a maximum Normalized Mutual Information score of 0.86, 0.88 on the two synthetic networks.
In today’s world, social network plays a major role for communicating all over the world. The available popular social networking platforms are Twitter and Facebook. In both the platforms, the users can communicate with any user irrespective of the location and time. But this platform also has a disadvantage which is the spreading of false information and also false communication between the users. The false information spread is due to the fake users and bot users. To overcome this drawback in the social network, several machine learning algorithms were proposed for identifying the malicious users. Hence, in this, the spam users are detected in the Twitter social network platform. To perform the classification of users as original and fake users, the user and social attribute features were extracted. Then, the features are reduced using the optimization approach called glow worm optimization. Then, the reduced features are trained using the generalized regression neural network. The reduction of features is based on minimizing the error rate of the classifier. This approach helps to improve the classification as compared to the deep Q-learning algorithm. The whole process is performed on the social honeypot data set, and it is implemented using MATLAB R2018a version. The proposed approach outperforms the conventional algorithm in terms of accuracy, sensitivity, specificity and F-measure.
This article aims to summarize the recent advancement in the fake account detection methodology on social networking websites. Over the past decade, social networking websites have received huge attention from users all around the world. As a result, popular websites such as Facebook, Twitter, LinkedIn, Instagram, and others saw an unexpected rise in registered users. However, researchers claim that all registered accounts are not real; many of them are fake and created for specific purposes. The primary purpose of fake accounts is to spread spam content, rumor, and other unauthentic messages on the platform. Hence, it is needed to filter out the fake accounts, but it has many challenges. In the past few years, researchers applied many advanced technologies to identify fake accounts. In the survey presented in this article, we summarize the recent development of fake account detection technologies. We discuss the challenges and limitations of the existing models in brief. The survey may help future researchers to identify the gaps in the current literature and develop a generalized framework for fake profile detection on social networking websites.
Full-text available
Increasing evidence suggests that a growing amount of social media content is generated by autonomous entities known as social bots. In this work we present a framework to detect such entities on Twitter. We leverage more than a thousand features extracted from public data and meta-data about users: friends, tweet content and sentiment, network patterns, and activity time series. We benchmark the classification framework by using a publicly available dataset of Twitter bots. This training data is enriched by a manually annotated collection of active Twitter users that include both humans and bots of varying sophistication. Our models yield high accuracy and agreement with each other and can detect bots of different nature. Our estimates suggest that between 9% and 15% of active Twitter accounts are bots. Characterizing ties among accounts, we observe that simple bots tend to interact with bots that exhibit more human-like behaviors. Analysis of content flows reveals retweet and mention strategies adopted by bots to interact with different target groups. Using clustering analysis, we characterize several subclasses of accounts, including spammers, self promoters, and accounts that post content from connected applications.
Conference Paper
Full-text available
While most online social media accounts are controlled by humans, these platforms also host automated agents called social bots or sybil accounts. Recent literature reported on cases of social bots imitating humans to manipulate discussions, alter the popularity of users, pollute content and spread misinformation, and even perform terrorist propaganda and recruitment actions. Here we present BotOrNot, a publicly-available service that leverages more than one thousand features to evaluate the extent to which a Twitter account exhibits similarity to the known characteristics of social bots. Since its release in May 2014, BotOrNot has served over one million requests via our website and APIs.
Full-text available
Twitter is a new web application playing dual roles of online social networking and microblogging. Users communicate with each other by publishing text-based posts. The popularity and open structure of Twitter have attracted a large number of automated programs, known as bots, which appear to be a double-edged sword to Twitter. Legitimate bots generate a large amount of benign tweets delivering news and updating feeds, while malicious bots spread spam or malicious contents. More interestingly, in the middle between human and bot, there has emerged cyborg referred to either bot-assisted human or human-assisted bot. To assist human users in identifying who they are interacting with, this paper focuses on the classification of human, bot, and cyborg accounts on Twitter. We first conduct a set of large-scale measurements with a collection of over 500,000 accounts. We observe the difference among human, bot, and cyborg in terms of tweeting behavior, tweet content, and account properties. Based on the measurement results, we propose a classification system that includes the following four parts: 1) an entropy-based component, 2) a spam detection component, 3) an account properties component, and 4) a decision maker. It uses the combination of features extracted from an unknown user to determine the likelihood of being a human, bot, or cyborg. Our experimental evaluation demonstrates the efficacy of the proposed classification system.
Full-text available
Sybil attacks are a fundamental threat to the security of distributed systems. Recently, there has been a growing interest in leveraging social networks to mitigate Sybil attacks. However, the existing approaches suffer from one or more drawbacks, including bootstrapping from either only known benign or known Sybil nodes, failing to tolerate noise in their prior knowledge about known benign or Sybil nodes, and being not scalable. In this work, we aim to overcome these drawbacks. Towards this goal, we introduce SybilBelief, a semi-supervised learning framework, to detect Sybil nodes. SybilBelief takes a social network of the nodes in the system, a small set of known benign nodes, and, optionally, a small set of known Sybils as input. Then SybilBelief propagates the label information from the known benign and/or Sybil nodes to the remaining nodes in the system. We evaluate SybilBelief using both synthetic and real world social network topologies. We show that SybilBelief is able to accurately identify Sybil nodes with low false positive rates and low false negative rates. SybilBelief is resilient to noise in our prior knowledge about known benign and Sybil nodes. Moreover, SybilBelief performs orders of magnitudes better than existing Sybil classification mechanisms and significantly better than existing Sybil ranking mechanisms.
Full-text available
As popular tools for spreading spam and malware, Sybils (or fake accounts) pose a serious threat to online communities such as Online Social Networks (OSNs). Today, sophisticated attackers are creating realistic Sybils that effectively befriend legitimate users, rendering most automated Sybil detection techniques ineffective. In this paper, we explore the feasibility of a crowdsourced Sybil detection system for OSNs. We conduct a large user study on the ability of humans to detect today's Sybil accounts, using a large corpus of ground-truth Sybil accounts from the Facebook and Renren networks. We analyze detection accuracy by both "experts" and "turkers" under a variety of conditions, and find that while turkers vary significantly in their effectiveness, experts consistently produce near-optimal results. We use these results to drive the design of a multi-tier crowdsourcing Sybil detection system. Using our user study data, we show that this system is scalable, and can be highly effective either as a standalone system or as a complementary technique to current tools.
Conference Paper
Online Social Networks (OSN) are increasingly becoming victims of Sybil attacks. These attacks involve creation of multiple colluding fake accounts (called Sybils) with the goal of compromising the trust underpinnings of the OSN, in turn, leading to security and the privacy violations. Existing mechanisms to detect Sybils are based either on analyzing user attributes and activities, which are often incomplete or inaccurate or raise privacy concerns, or on analyzing the topological structures of the OSN. Two major assumptions that the latter category of works make, namely, that the OSN can be partitioned into a Sybil and a non-Sybil region and that the so-called “attack edges” between Sybil nodes and non-Sybil nodes are only a handful, often do not hold in real life scenarios. Consequently, when attackers engineer Sybils to behave like real user accounts, these mechanisms perform poorly. In this work, we propose SybilRadar, a robust Sybil detection framework based on graph-based structural properties of an OSN that does not rely on the traditional non-realistic assumptions that similar structure-based frameworks make. We run SybilRadar on both synthetic as well as real-world OSN data. Our results demonstrate that SybilRadar has very high detection rate even when the network is not fast mixing and the so-called “attack edges” between Sybils and non-Sybils are in the tens of thousands.
Conference Paper
Social botnets have become an important phenomenon on social media. There are many ways in which social bots can disrupt or influence online discourse, such as, spam hashtags, scam twitter users, and astroturfing. In this paper we considered one specific social botnet in Twitter to understand how it grows over time, how the content of tweets by the social botnet differ from regular users in the same dataset, and lastly, how the social botnet may have influenced the relevant discussions. Our analysis is based on a qualitative coding for approximately 3000 tweets in Arabic and English from the Syrian social bot that was active for 35 weeks on Twitter before it was shutdown. We find that the growth, behavior and content of this particular botnet did not specifically align with common conceptions of botnets. Further we identify interesting aspects of the botnet that distinguish it from regular users.
Conference Paper
Sybil attacks in social and information systems have serious security implications. Out of many defence schemes, Graph-based Sybil Detection (GSD) had the greatest attention by both academia and industry. Even though many GSD algorithms exist, there is no analytical framework to reason about their design, especially as they make different assumptions about the used adversary and graph models. In this paper, we bridge this knowledge gap and present a unified framework for systematic evaluation of GSD algorithms. We used this framework to show that GSD algorithms should be designed to find local community structures around known non-Sybil identities, while incrementally tracking changes in the graph as it evolves over time.