Conference PaperPDF Available

Analysis of the Malicious Bots Market

Authors:
  • St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS)

Abstract and Figures

Social media bots can pose a serious threat by manipulating public opinion. Attempts to detect bots on social networks have resulted in bots becoming more sophisticated. A wide variety of types of bots has appeared, which must be taken into account when developing methods for detecting them. In this paper, we present the classification of the types of bots, which we made after analyzing the offers from bot traders in the market. We studied 1657 offers from 7 companies for 5 social networks: VKontakte, Instagram, Telegram, YouTube, and TikTok. Based on this information and descriptions from bot-traders, we are aggregating the types of bots and their key features. We also perform price analysis for different types of bots for the Russian Internet segment. The results show that the main pricing factors are bot action and bot quality. At the same time for different social networks, they affect pricing in different ways. Also, for messengers and social networks in which recommendation algorithms take into account complex actions more strongly, there is a tendency for higher quality bots to perform more complex actions. While in other social networks, the complexity of the action and the quality of bots are not correlating. The results of this study can be useful for developers of tools for detecting bots and determining the cost of an attack.
Content may be subject to copyright.
Analysis of the Malicious Bots Market
Maxim Kolomeets, Andrey Chechulin
St. Petersburg Federal Research Center of the Russian Academy of Sciences
Saint-Petersburg, Russia
{kolomeec, chechulin}@comsec.spb.ru
Abstract—Social media bots can pose a serious threat by
manipulating public opinion. Attempts to detect bots on social
networks have resulted in bots becoming more sophisticated. A
wide variety of types of bots has appeared, which must be taken
into account when developing methods for detecting them. In
this paper, we present the classification of the types of bots,
which we made after analyzing the offers from bot traders
in the market. We studied 1657 offers from 7 companies for
5 social networks: VKontakte, Instagram, Telegram, YouTube,
and TikTok. Based on this information and descriptions from
bot-traders, we are aggregating the types of bots and their key
features. We also perform price analysis for different types of
bots for the Russian Internet segment. The results show that
the main pricing factors are bot action and bot quality. At the
same time for different social networks, they affect pricing in
different ways. Also, for messengers and social networks in which
recommendation algorithms take into account complex actions
more strongly, there is a tendency for higher quality bots to
perform more complex actions. While in other social networks,
the complexity of the action and the quality of bots are not
correlating. The results of this study can be useful for developers
of tools for detecting bots and determining the cost of an attack.
I. INTRODUCTION
Social networks have changed the internet and our society.
As platforms for information exchange, they have brought the
whole world together so much that even a distant event is
perceived as something personal. According to the latest polls
[1], [2], a significant part of society draws information from
social networks, preferring them to television, newspapers, and
other classical media. This trend is growing from year to year
[1] and is not expected to reverse. Along with the popularity,
the trust in social networks as a source of information is
growing too. This phenomenon can be easily explained by the
”all-permeability” and ”speed of distribution” of information.
Anyone with a smartphone can become a source of informa-
tion at the click of a camera while disseminating information
takes minutes. This is an opportunity to receive news first-
hand, bypassing media companies, newspaper editors, and
censors.
Social media has influenced our world and the Internet
for the better by making it more transparent. But the same
information dissemination mechanisms can be used to manip-
ulate opinion, spread misinformation [3], spread rumors and
conspiracy theories [4], create a fake reputation, fraud, and
even suppress political competitors [5], [6]. The world has
not yet developed universal mechanisms for the dissemination
and identification of malicious information on social networks.
Damage can be done to anyone who acts on a social network
platform: social media, a third-party company, civil society, or
government.
At the same time, social networks have a simple, built-
in, and self-organizing defense mechanism institutional
reputation, which is based on social network metrics (views,
likes, etc.). Low-trust accounts cannot effectively disseminate
information. Therefore, to effectively spread misinformation,
the attacker either needs to enlist the support of someone with
a huge following (influencer) or use bots to simulate metrics
of social networks.
That’s why bot detection on social networks is one of the
most requested security functions from commercial companies
and law enforcement agencies.
Existing bot detection approaches are based on machine
learning [7], [8]. Machine learning models are trained on the
features that are extracted from bots and real users: informa-
tion from a profile; graph structures of friends; written texts;
uploaded photos and videos; etc. As a result, such bot detection
methods are highly dependent on the quality of the training
datasets. After all, different bots can use different strategies for
generating features. Therefore, for the development of high-
quality bot detection models, the training dataset must include
a variety of bot types.
Attempts to perform typing of bots have been made earlier.
Bot typing has already been done in several papers for bot-
detection [8]. For example, in paper [9] described strategies
for controlling bots by software, human and hybrid approach.
Bots classification by types of threats is presented in [10]. The
analysis of bot prices is well presented in the report [11].
In this paper, we propose the classification of bots through
the market. We have collected information on 1657 offers of
bot services from 7 companies for 5 social networks. Based
on this information and descriptions from bot-traders, we are
aggregating the types of bots and their key features. We also
perform price analysis for different types of bots.
Thus, the goal of the paper is to build a classification
of bot types by investigating offers from bot-traders. This
classification will be useful for understanding the diversity of
bots, which can be useful for obtaining high-quality training
datasets.
The paper consists of the following sections. In Classifica-
tion of bots threats we describe the types of threats that bots
can pose. In Classification bot types we describe the types of
bots, their characteristics, and strategies for bot creation and
bot management. In Methodology of data collection, pricing
analysis and implementation we’ll explore bot pricing for 5
preprint
FRUCT conf.
social networks: VKontakte, Instagram, Telegram, YouTube
and TikTok. In Discussion we discuss the correlations of bot
characteristics. In Conclusion we summarizing results and
presenting plans for future work.
The proposed bots threats and bots types we built based on
papers [8], [10], [11], and (to a greater extent) based on the
analysis of the bot market, which we performed ourselves.
II. CL AS SI FIC ATIO N OF B OTS T HR EATS
As part of our research, we consider bots that pose security
threats. Of course, not all bots are malicious. For example,
bots that provide weather forecasts, generate memes, provide
store services and so on are harmless.
In this paper, we only consider malicious bots. To do this,
we identify a malicious bot through the types of threats.
There are many threats on social media, including password
leaks, use of private data by third-party companies, compro-
mise of personal correspondence, etc. But in this paper, we
focus only on those threats that can be implemented using
bots. We distinguish 3 classes of threats:
1) Fraud deceiving social media users to get money or
private information. Fraud occurs through correspon-
dence with the user. For example, bots can collect private
photos on dating services for blackmail, or the data
needed to bypass the bank and mail security systems
based on security questions. If such bots have AI and
can automatically conduct conversations, this will cause
serious damage to tens of thousands of users. Even if
the success rate is low, they can fraud a large number
of people simply by scaling the botnet.
2) Promotion of harmful or censored information pro-
paganda of information that was prohibited by social
network platform (heat speech, trolling, etc.) or was
prohibited by the government (terrorism, incitement to
violence, etc.) Depending on the goal of the attacker
it can be integrity if the goal is to exacerbate the
conflict and violate the integrity of the community
or accessibility if the goal is to drown alternative
viewpoints with spam.
3) Rating manipulation overestimating the rating to in-
crease user confidence. For this, social network metrics
are faked - the number of likes, friends, reviews, etc.
The implementation of these threats by bots requires one
key quality from them - malicious bots pretend to be real
people because it is a key component of user trust.
A person will not believe a fraudster if the account does
not look like a real person. Real users will be skeptical about
rumors spread by bots. And, of course, the customer will be
suspicious of a store that has a lot of positive reviews from
bots.
As we will show below, the similarity of a bot with a real
person is the main property that the bots management is aimed
at. Besides, the similarity of a bot to a real person is one of
the main criteria for their pricing.
The complexity of creating bots that look like real people, as
well as the difference in strategies for creating and managing
them, creates many types of bots. We propose an analysis of
the bot market, the understanding of which is necessary for
people who are developing tools for detecting them.
III. CLA SS IFI CATI ON O F BOT S TY PE S
To classify bots by type, we analyzed a variety of sites that
trade bots. Bot traders provide 2 options buy an account and
manage the bot yourself, or rent an account (buy bot activity)
and the bot trader will perform the necessary actions.
We analyzed offers from 7 bot trader companies that provide
a bot rental service and 1 forum where bots are sold. The
companies were selected from the Russian-speaking segment
of the Internet, as we believe that this market looks more
saturated. We analyzed rental offers on 5 platforms: VKon-
takte, Instagram, Telegram, YouTube, and TikTok. We analyze
account sales only for one platform: Vkontakte (because there
is not enough sales data on other platforms compared to the
number of rental offers). A total of 1,657 rental offers and 45
sales offers were studied. We also bought 9 offers to analyze
bot metrics. More information about the dataset is available
on its page [12].
We propose several classification systems that will describe
the whole variety of commercial offers.
A. Characteristics of malicious bots
Each bot trader defines the bot classes differently over
multiple parameters.
Bot stores introduced their parameter systems, many of
which differed terminologically. For example, one store had
a quality scale with terms such as: start, premium+, ultima,
etc; and another store: low, high, medium; and the third store:
standard, good, best. Some stores determined the speed of
bots’ actions in quantitative form, other stores - in qualitative,
and third stores did not write about speed.
But the bot traders published detailed instructions on their
sites on what these or those characteristics of bots mean. We
have aggregated all of these instructions together to develop a
common terminology system.
For buying and renting account parameters are:
1) Activity. Do the real owner (if applicable) of this account
still use it? If yes, then the real owner can quickly notice
the suspicious activity. Thus can happen if accounts with
legal activity were compromised, but the attacker was
unable to change the password.
2) Registration date. When the account was registered?
Older accounts are more valuable and are less likely to
be blocked than newer ones. Also, people’s trust in new
accounts is lower as they are relatively easy to register.
3) Phone number. What is the phone number the account
binds to? If the account is not tied to the phone number,
then the bot is often forced to enter a captcha (what can
be a problem for an attacker if the bot is controlled by
a program).
4) Location. What is country/region the account tied to and
what the country/region of the person which it imitates
to be belong to? Bots imitating people from one location
may seem suspicious to real users from another one.
5) Owner. Is it the real person (e.g. if the account was
stolen) or a virtual phone number generated by the
program? Also the important parameter is the number
of owners how many people can buy and use this
account at the same time. An account can be obtained
by hacking or fraud through the:
mail service (if the attacker gained access to mail
and requests to restore access to the social network);
malware on PC or mobile phone;
spoofing through unsecured public networks (metro,
hotels, etc.);
brute force password attack on social networks or
mail service;
fraud, which involves the usage of social engineer-
ing through correspondence with the user.
For renting account bot traders also provide the next param-
eters:
1) Quality an integral ranking of bot that expresses how
much a bot looks like a human. Usually expressed as:
a) ”Low” the user can easily recognize the bot.
Usually, the profile is empty, with no photos, few
friends. It also includes active users who may
notice that they have been hacked and the account
is being used.
b) ”Middle” it is difficult for the user to recognize
the bot. Usually the profile not empty, some photos,
the average number of friends.
c) ”High” bot cannot be distinguished by profile
analysis. Usually, these are accounts of real people
who have lost or shared access to them (due
to hacking or sale). But the bot can be easily
recognized by its characteristics that change due
to unnatural activity (illogical messages or reposts,
unusual distribution of friends, etc.).
d) ”Live” accounts of real people (hacked and those
who act for money). Unnatural activity is the only
way to recognize a bot of this type.
2) Type of action what action the bot needs to perform.
We divided the types of activity according to the degree
of attracting attention:
leaving no public digital footprints e.g. views;
leaving public digital footprints that cannot be seen
by visiting the bot page e.g. likes;
leaving public digital footprints that can be seen by
visiting the bot page e.g. friends;
leaving a digital footprint of direct user interaction,
difficult to implement in automatic mode e.g.
comment;
3) Speed of action how quickly bots can take the required
action. For example, how quickly can 100 bots write a
comment. Usually measured in the number of activities
per day. A spike in activity can trigger social media
protection algorithms.
Fig. 1. The structure of friends is a small world for a real user, and the bot
simulating a similar small world to disguise itself
B. Strategies for management and creation
Different bot stores provide different management strate-
gies. Bots can be controller by:
1) Software the bot actions perform automatically by
some algorithms. For low, middle, and high quality bots.
Usually for actions that can be implemented in automatic
mode.
2) Operator the bot is manually controlled by the oper-
ator. For high and live quality bots. Usually for actions
that cannot be implemented in automatic mode.
3) Exchange platform the bot is controlled by a real
account owner who agrees to perform some actions for
money. For high quality bots. Usually for actions that
cannot be implemented in automatic mode.
4) Troll Fabric SMM agencies employing professionals.
The services of such companies are not public there-
fore we did not find them. But we believe that they
should be included in the list, as they are responsible
for many attacks according to a lot of evidence.
Of course, to know exactly the characteristics of the account
is possible only after its purchase. Nothing is stopping the bot
trader from deceiving you, since this market is not entirely
legal. Besides, different bot traders understand quality differ-
ently. But most of the parameters describe various aspects of
the bot’s similarity to a real person quality.
By investigating bots and comparing them to real people
on social media, we’ve identified the main ways bots try to
disguise themselves as real people:
1) Use privacy settings. An attacker can fill in just a
few fields (profile photo, name, and surname) and hide
the rest with privacy settings. Since many users prefer
to hide pages on social networks, this will not raise
suspicion. This technique is used by a wide variety
of bots - from low to high quality. According to our
observations, live bots use privacy settings much less
often.
2) Use account of a real person. An attacker can use a
real person’s account by hacking it or buying/renting it.
Bot have live quality if a person is an inactive and low
quality if active.
3) Generate profile. A difficult task that can include 3
techniques:
An attacker can try to generate profile fields. To do
this, an attacker can fill an account with photos from
another user who has this data open and randomly
generate numeric and string parameters.
Attacker can generate photos and text content using
neural networks. This approach allows the bot to
pass the duplicate check (when we are looking for
another account with the same photos). A neural
network for writing text allows attackers to automate
bots that work in chats. Depending on the filling of
the account can vary from the middle to live quality.
Attacker can generate friendslist graph structure.
Real users add people they already know to their
friends, forming a small world. The likelihood that
a real person will add an unfamiliar account is small.
Therefore, it is difficult for a bot to form a list of
friends from real users where everyone is connected.
He can add random users - then they will be less
likely to form at least some connected graph, or try
to form the small-world structure of the friendslist
from other bots (see Fig 1).
IV. MET HO DO LO GY O F DATA COL LE CT IO N,PRICING
ANA LYSIS AND IMPLEMENTATIO N
We carried out a pricing analysis and checked some correla-
tions of parameters. The purpose of this analysis is to see how
price, quality, and type of bot action correlate for various social
networks. The information about the prices and parameters of
the commercial offer will be sufficient for valid conclusions
because the bot market obeys the same laws of the regular
market, supply, and demand.
To do analysis, we have parsed HTML-pages with bots
rental offers from 7 companies. In each rental offer, we
determined:
1) the lot size bot activity is sold in batches, for example,
1, 100, 1000, etc.;
Fig. 2. Correlation between bot’s quality, action type and price for different
social networks
2) cost per unit in cases where the price was indicated
for a lot, we calculated the price for one bot;
3) social network VKontakte, Instagram, Telegram,
YouTube, or TikTok.
4) action view (viewing post, photo, video, etc.), like, poll
(cheat votes in a poll), repost, participate (subscribe to
channel, group, etc.), friend, alerts (massive alerts for
blocking content by a social network), comment.
5) quality –– Low, Middle, High, and Live.
To mark up the dataset by action and quality, we looked
for keywords in the description of the offer. For example, to
include an offer in the low quality class, we looked for the
words: ”low”, ”no avatars”, ”no guarantee”, ”slow”, ”possible
activity”, etc. We did the same for the rest of the quality and
action classes.
As the result we built 3 charts:
1) Figure 3 shows the bubble chart of bot pricing over
quality and actions for different social networks. The
size of the bubble indicates the price.
2) Figure 4 shows the scatter plot of bot pricing over
actions for different social networks.
3) Figure 5 shows the scatter plot of bot pricing over quality
for different social networks.
We also perform some correlation analyses.
To do this, we converted qualitative types to quantitative for
actions (see table I) and quality (see table II). The conversion
logic is based on the classifications that were presented before:
actions that leave a more visible digital footprint have bigger
values, and the quality increases linearly from low to live.
For each group of offers (the same social network, bot
trader, quality, and action), we calculated the median price
value. This is necessary because the same bot store usually
sells services in lots. Thus, these are the same set of bots, and
the price difference is due to the discount for buying a large
lot.
The results of the correlation between these groups of offers
are shown in Figure 2. We scaled the color scale from 0 to
0.7 (maximum correlation value excluding diagonal) for better
perception of results.
Correlation shows how social networks’ features affect bot
trading and bot diversity.
V. DISCUSSION
As expected, for all social networks the price of bots
depends on the quality and actions, but this dependence is
not the same for different social networks.
Comments and alerts, as the most complex and attention-
grabbing to the bot’s profile, have the highest price tag (Fig. 3
and Fig. 4). Views are the least expensive because they do not
leave digital traces and are easily automated. This confirms
the validity of the proposed classification.
For all social networks (except for Instagram), there is a
dynamics of price growth depending on quality (Fig. 5). These
dynamics are also noticeable in the correlation results. It can
be seen that the dependence of quality on price is different
TABLE I
QUAL ITATIV E SC ORE S TO QUA NT ITATIV E FO R BOTS A CTI ON
action quantitative
score
action’s type
(qualitative score)
comment
alert 4
leaving public footprints
which difficult to implement
in automatic mode
friend
participate
repost
3
leaving public footprints
that cannot be seen
by visiting the bot page
poll
like 2
leaving public footprints
that can be seen
by visiting the bot page
view 1 leaving no public footprints
TABLE II
QUAL ITATIV E SC ORE S TO QUA NT ITATIV E FO R BOTS Q UAL ITY
quality
(qualitative score)
quality
(quantitive score) description
Live 4 accounts of
real people
High 3 the bot cannot be
distinguished by profile
Middle 2 it is difficult for the
user to recognize the bot
Low 1 the user can easily
recognize the bot
for social networks. We attribute this to the effectiveness of
algorithms and measures to combat bots. The more efficient
the algorithms on a social network, the more valuable the
difference between low and high quality bots becomes.
There is also a noticeable correlation for Telegram and
YouTube that better bots are used for more complex actions.
For all other social networks, it is almost zero.
For Telegram, this is explained by the fact that Telegram is a
messenger, where the main function is participating in chats,
discussion, and comments. Therefore, Telegram has a clear
demand for human-controlled bots that can write complex text.
For YouTube, this can be explained by changes in the
promotion algorithms. YouTube has significantly increased the
role of comment activity to promote videos with its recommen-
dation systems. Thus, the demand for human-controlled bots
that can write complex text has also increased.
TikTok and Instagram also have a function for writing
comments. But in TikTok and Instagram, comments are meant
to express emotions, not discussion. Thus, bots can leave
emojis or some phrases from the dictionary of sentiments,
which low and mid quality bots can also do.
This analysis allows one to better understand which bots are
widespread in which networks, and speculate about possible
features. This can be taken into account for the development
of bot detection tools. For example, to detect bots on YouTube,
it must be taken into account that comments are likely to
be written by human-controlled bots. At the same time, on
Instagram or TikTok it will be a more mixed group of bots.
VI. CONCLUSION
In this paper, we present the classification of the types of
bots, which we made after analyzing the offers from bot traders
in the market.
We have presented the parameters that the bot traders
indicate and which affect bot pricing. We have presented a
classification by type of action and by quality. At the same
time, market analysis showed that the price of bots depends
on the complexity of the action performed by the bot and the
quality (similarity of the bot to a real person). But for some
social networks, this dependence may be stronger than for
others.
We also demonstrated that on some social networks, better
quality bots perform more complex actions. While in others
there is no such correlation.
This study makes it possible to obtain better datasets for
training machine learning models that are used to detect
malicious bots. To do this, training datasets must contain all
the variety of bots, as well as their characteristics.
We plan to continue our research and consider what specific
features (which are already used to train models) are most
useful for detecting bots of various classes.
The dataset collected and marked up for this paper is
available via the link [12].
ACKNOWLEDGMENT
This research was supported by the Russian Science Foun-
dation under grant number 18-71-10094 in SPC RAS.
REFERENCES
[1] Levada-Center, “Channels of information, 2017. [Online]. Available:
https://www.levada.ru/en/2018/10/12/channels-of- information/
[2] A. M. Elisa Shearer, “News use across social media platforms in
2020,” 2021. [Online]. Available: https://www.journalism.org/2021/01/
12/news-use-across-social-media-platforms-in-2020
[3] C. Shao, G. L. Ciampaglia, O. Varol, A. Flammini, and F. Menczer, “The
spread of fake news by social bots, arXiv preprint arXiv:1707.07592,
vol. 96, p. 104, 2017.
[4] E. Ferrara, “# covid-19 on twitter: Bots, conspiracies, and social media
activism, arXiv preprint arXiv:2004.09531, 2020.
[5] F. Pierri, A. Artoni, and S. Ceri, “Investigating italian disinformation
spreading on twitter in the context of 2019 european elections,” PloS
one, vol. 15, no. 1, p. e0227821, 2020.
[6] R. Faris, H. Roberts, B. Etling, N. Bourassa, E. Zuckerman, and
Y. Benkler, “Partisanship, propaganda, and disinformation: Online media
and the 2016 us presidential election,” Berkman Klein Center Research
Publication, vol. 6, 2017.
[7] C. A. Davis, O. Varol, E. Ferrara, A. Flammini, and F. Menczer,
“Botornot: A system to evaluate social bots, in Proceedings of the 25th
international conference companion on world wide web, 2016, pp. 273–
274.
[8] M. Orabi, D. Mouheb, Z. Al Aghbari, and I. Kamel, “Detection of
bots in social media: a systematic review,” Information Processing &
Management, vol. 57, no. 4, p. 102250, 2020.
[9] C. Grimme, M. Preuss, L. Adam, and H. Trautmann, “Social bots:
Human-like by means of human control?” Big data, vol. 5, no. 4, pp.
279–293, 2017.
[10] B. Oberer, A. Erkollar, and A. Stein, “Social bots–act like a human,
think like a bot,” in Digitalisierung und Kommunikation. Springer,
2019, pp. 311–327.
[11] S. Bay et al., “The black market for social media manipulation,” NATO
StratCom COE, 2018.
[12] M. Kolomeets, “Security datasets: Bot market, 2021. [Online].
Available: https://github.com/guardeec/datasets
Fig. 3. Offers for renting bots with a certain quality and a certain action. Price expressed as bubble size.
Fig. 4. Dependence of the action of bots on the price in rubles
Fig. 5. Dependence of the quality of bots on the price in rubles
... To be more precise, it is essential to note various bot definitions, as researchers from different areas mention social bots differently [4]. Therefore, when we refer to bots in this study, we do not only mean automated accounts controlled by software but any account that someone can purchase to carry out controlled malicious activities, including hacked accounts, fake identities that are controlled by human operators, genuine-users who are paid to perform malicious actions, and any other types of bots that may emerge on the bot market [5]. This broad definition captures the full range of potential threats posed by malicious actors, regardless of the specific means by which they are carried out. ...
... In this paper, we focus on analysing the efficiency of manual labelling, which has become less effective due to the evolution of social bots, when bots become so sophisticated that the ability of humans to detect a bot accurately is questionable. This is supported by bot-traders separating bots [5] into different categories according to their quality, assuming that bot detection systems cannot recognise high-quality bots operated by humans or partly generated by AI. Therefore, AI bot detection tools may inherit the errors of the annotators, which may lead to false negatives and false positives. ...
... The probability of success for humans is ≤0. 5. ...
Article
Full-text available
This paper aims to test the hypothesis that the quality of social media bot detection systems based on supervised machine learning may not be as accurate as researchers claim, given that bots have become increasingly sophisticated, making it difficult for human annotators to detect them better than random selection. As a result, obtaining a ground-truth dataset with human annotation is not possible, which leads to supervised machine-learning models inheriting annotation errors. To test this hypothesis, we conducted an experiment where humans were tasked with recognizing malicious bots on the VKontakte social network. We then compared the “human” answers with the “ground-truth” bot labels (‘a bot’/‘not a bot’). Based on the experiment, we evaluated the bot detection efficiency of annotators in three scenarios typical for cybersecurity but differing in their detection difficulty as follows: (1) detection among random accounts, (2) detection among accounts of a social network ‘community’, and (3) detection among verified accounts. The study showed that humans could only detect simple bots in all three scenarios but could not detect more sophisticated ones (p-value = 0.05). The study also evaluates the limits of hypothetical and existing bot detection systems that leverage non-expert-labelled datasets as follows: the balanced accuracy of such systems can drop to 0.5 and lower, depending on bot complexity and detection scenario. The paper also describes the experiment design, collected datasets, statistical evaluation, and machine learning accuracy measures applied to support the results. In the discussion, we raise the question of using human labelling in bot detection systems and its potential cybersecurity issues. We also provide open access to the datasets used, experiment results, and software code for evaluating statistical and machine learning accuracy metrics used in this paper on GitHub.
... At the same time, attempts to counteract such accounts led to evolution phenomena, and bots become more and more sophisticated. As we previously showed in [1], and is also considered in [2], bots became more sophisticated in creation, management, and camouflage (mimicking humans) techniques by hiring qualified personal for bot management and implementation of AI. Therefore, to cover all diversity of such accounts, we define bots not only as automated but as all accounts that generate "artificial activity". ...
... These characteristics may affect the efficiency of bot detection tools, ability to carry out more complex attacks, probability of attack success, and other attack parameters. As attackers continue to implement more and more complex technologies, such a problem was called "bot evolution", and it was raised several times [1,[3][4][5], as researchers are afraid that current protection mechanisms may not be able to combat future bot generation (and possibly to some already existing). ...
Article
Full-text available
In this paper, we propose metrics for malicious bots that provide a qualitative estimation of a bot type involved in the attack in social media: price, bot-trader type, normalized bot quality, speed, survival rate, and several variations of Trust metric. The proposed concept is that after one detects bots, they can measure bot metrics that help to understand the types of bots involved in the attack and estimate the attack. For that, it is possible to retrain existing bot-detection solutions with a metric label so that the machine-learning model can estimate bot parameters. For that, we propose two techniques for metrics labelling: purchase technique—labelling while purchasing an attack, and Trust measurement technique—labelling during the Turing test when humans try to guess which accounts are bots and which are not. In the paper, we describe metrics calculations, correlation analysis, and an example of a neural network which can predict bots’ properties. The proposed metrics can become a basis for developing social media attack detection and risk analysis systems, for exploring bot evolution phenomena, and for evaluating bot-detectors efficiency dependence on bot parameters. We have also opened access to the data, including bot offers, identifiers, and metrics, extracted during the experiments with the Russian VKontakte social network.
... According to recent surveys (Shearer and Mitchell 2022), a considerable segment of the population prefers social networks to TV, newspapers, and other traditional media when looking for information. Trust in social networks as a source of information is predicted to rapidly grow (Kolomeets and Chechulin 2021). As a result, social bots can pose significant security risks by influencing public opinion and disseminating false information (Shao et al. 2017), spreading rumors and conspiracy theories (Ferrara 2020), creating fake reputations, and suppressing political competitors (Pierri et al. 2020;Benkler et al. 2017). ...
... Bots can be employed to spread misinformation to promote a particular view of a public person, grow an account's following, and repost user-generated content. Bot detection on OSNs is therefore the most frequently requested security feature from businesses and law enforcement organizations (Kolomeets and Chechulin 2021). The dearth of publicly accessible datasets for OSNs such as Facebook, Instagram, and LinkedIn is one of the greatest obstacles in this research area. ...
Article
Full-text available
In today’s digitalized era, Online Social Networking platforms are growing to be a vital aspect of each individual’s daily life. The availability of the vast amount of information and their open nature attracts the interest of cybercriminals to create malicious bots. Malicious bots in these platforms are automated or semi-automated entities used in nefarious ways while simulating human behavior. Moreover, such bots pose serious cyber threats and security concerns to society and public opinion. They are used to exploit vulnerabilities for illicit benefits such as spamming, fake profiles, spreading inappropriate/false content, click farming, hashtag hijacking, and much more. Cybercriminals and researchers are always engaged in an arms race as new and updated bots are created to thwart ever-evolving detection technologies. This literature review attempts to compile and compare the most recent advancements in Machine Learning-based techniques for the detection and classification of bots on five primary social media platforms namely Facebook, Instagram, LinkedIn, Twitter, and Weibo. We bring forth a concise overview of all the supervised, semi-supervised, and unsupervised methods, along with the details of the datasets provided by the researchers. Additionally, we provide a thorough breakdown of the extracted feature categories. Furthermore, this study also showcases a brief rundown of the challenges and opportunities encountered in this field, along with prospective research directions and promising angles to explore.
... Common methods of password attacks include dictionary attacks, bruteforce attacks, rainbow table attacks, and social engineering (Lei et al., 2021). Passwords attacks can have a significant impact on digital marketing efforts, particularly for businesses that rely heavily on digital marketing campaigns (Kolomeets & Chechulin, 2021). Such attacks can cause a loss of customer confidence as customers may fear their data has been compromised, leading to a loss of sales or a drop in loyalty (Prasad et al., 2020). ...
Article
Full-text available
The current study aimed to examine marketing cyber-security (DDoS Attacks, Cross-Site Script-ing, SQL Attacks, and passwords attacks) in the digital age by presenting evidence from digital marketing platforms. Depending on the quantitative approach and utilizing a questionnaire as a tool, (133) marketing managers in digital marketing companies in Jordan responded to an online questionnaire. SPSS was used to screen and analyze the gathered data. Results of the study accepted the main hypothesis, and it appeared that marketing cyber-security has a statistically positive influence on marketing platforms, in addition to that, it appeared that the highest influence of sub-variables was for the benefit of Structured Query Language (SQL) Attacks explaining 35.8% of the variation. This result meant that SQL attacks-security does have a statistically positive influence on marketing platforms. This hypothesis could be tested through various methodologies, for example, surveys, interviews, focus groups, and/or experiments. The study recommended that marketers should use role-based access to limit the data employees can access and regularly review their permissions. Further recommendations were presented in the study.
... For consistency and clarity, we adopt the definition given in [7] and define bots as automated or human-operated accounts that engage in malicious activity on social networks, intending to distort social processes to achieve the attacker's goals. As previous research [8], [9] has shown, bots have become increasingly sophisticated in their creation, management, and camouflage techniques, which now encompass the use of AI and the recruitment of qualified personnel for bot management. Therefore, in this paper, we extend the definition of malicious bots beyond automated accounts, encompassing any accounts that orchestrate "malicious artificial activity" on command, thereby compromising the confidentiality, integrity, or availability of users or communities. ...
Preprint
Full-text available
In this research, we investigate the influence of utilising AI-generated photographs on malicious bots that engage in disinformation, fraud, reputation manipulation, and other types of malicious activity on social networks. Our research aims to compare the performance metrics of social bots that employ AI photos with those that use other types of photographs. To accomplish this, we analysed a dataset with 13,748 measurements of 11,423 bots from the VK social network and identified 73 cases where bots employed GAN-photos and 84 cases where bots employed Diffusion or Transformers photos. We conducted a qualitative comparison of these bots using metrics such as price, survival rate, quality, speed, and human trust. Our study findings indicate that bots that use AI-photos exhibit less danger and lower levels of sophistication compared to other types: AI-enhanced bots are less expensive, less popular on exchange platforms, of inferior quality, less likely to be operated by humans, and, as a consequence, faster and more susceptible to being blocked by social networks. We also did not observe any significant difference between GAN-based and Diffusion/Transformers based bots, indicating that Diffusion/Transformers models did not contribute to increased bot sophistication compared to GAN models. Our contributions include a proposed methodology for evaluating the impact of photos on bot sophistication, along with a publicly available dataset for other researchers to study and analyse bots. Our research findings suggest a contradiction to theoretical expectations: in practice, bots using AI-generated photos pose less danger.
... The ability of malicious social media bots to generate realistic comments raises doubts about the authenticity of all these comments and reviews, making it challenging to distinguish genuine patient feedback. (48)(49)(50). For this reason, the analysis of data from web platforms must be a matter of careful interpretation. ...
Preprint
Full-text available
Objective Healthcare websites allow patients to share their experiences with their treatments. Drug testimonials provide useful information for real-world evidence, particularly on the occurrence of side effects that may be underreported. We investigated the potential of large language models (LLMs) for detecting signals of body weight change as under-reported side effect of antidepressants in user-generated online content. Materials and Methods A database of 8,000 user-generated comments about the 32 FDA-approved antidepressants was collected from healthcare social websites. These comments were manually annotated under the supervision of drug experts. Several pre-trained LLMs derived from BERT were fine-tuned to automatically classify comments describing weight gain, weight loss, or the absence of reference to a weight change. Zero-shot classification was also performed. Performance was evaluated on a test set by measuring the weighted precision, recall, F1-score and the prediction accuracy. Results After fine-tuning, most of the BERT-derived LLMs showed weighted F1-scores above 97%. LLMs with higher number of parameters used in zero-shot classification almost reached the same performance. The main source of errors in predictions came from situations where the machine predicted falsely weight gain or loss, because the text mentioned these elements but for a different molecule than the one for which the comment was written. Conclusion Even fine-tuned LLMs with limited numbers of parameters showed interesting results for the detection of adverse events from online patient testimonials, suggesting they can be used at scale for real-world evidence.
... All major social media sites have a high prevalence of bots, and there is a substantial market for bots on TikTok [36]. We expect this will be an exceptionally prominent problem in the context of the Ukraine invasion, where many sides stand to gain or lose significantly from influencing public opinion. ...
Preprint
Full-text available
We present a dataset of video descriptions, comments, and user statistics, from the social media platform TikTok, centred around the invasion of Ukraine in 2022, an event that launched TikTok into the geopolitical arena. User activity on the platform around the invasion exposed myriad political behaviours and dynamics previously unexplored on this platform. To this end, we provide a mass-scale language and interaction dataset for further research into these processes. In this paper we conduct an initial investigation of language and social interaction dynamics, alongside an evaluation of bot detection on the platform. We have open-sourced the dataset and the library used to collect it to the public.
... The proposed taxonomy of bots is an extension of our previous papers [1,2], where we investigated the sources of threats in social networks, studied the Russian bot market, purchased the bot services of different bot quality levels, and analyzed their features using machine learning models. During the research, we collected information about how to create bots and scenarios for managing them. ...
Conference Paper
Full-text available
As the impact of social services is continuously growing, there increase in various security threats that can use social bots. At the same time, existing social threat taxonomies leverage suspicious actions of account or behavioral patterns. As the result, known threat classifications are almost always unrelated to bot classifications. In this paper, we propose a taxonomy of bots, which differs from the existing ones as it considers not only the functional bot features but also scenarios for bot creation and management. We also propose the classification of bot threats, which was validated with a survey method. For that, we made a survey with the participation of experts who study social bots. Based on their assessments, we created an integral score for each type of threat. The presented taxonomy can be used in bot detection models and risk analysis - to assess the risk of threats.
Article
Full-text available
Журнал "Труды учебных заведений связи" / Journal "Proceedings of Telecommunication Universities" / RU: В работе представлена параметризация вредоносных ботов с помощью метрик, которые могут быть основой для построения моделей распознавания параметров ботов и качественного анализа характеристик атак в социальных сетях. Предложен ряд метрик для описания характеристик ботов социальной сети ВКонтакте, а именно: доверие, выживаемость, цена, тип продавца, скорость и экспертное качество. Для извлечения данных метрик разработан подход, который основан на методиках контрольной закупки и теста Тьюринга. Основное преимущество данного подхода состоит в том, что он предлагает извлекать признаки из данных, полученных экспериментальным способом, и тем самым получить более обоснованную оценку в сравнении с экспертным подходом. Также работа содержит описание эксперимента по извлечению метрик вредоносных ботов социальной сети ВКонтакте с использованием предложенного подхода, и результаты анализа зависимости метрик. Эксперимент подтверждает возможность извлечения и анализа метрик. В целом, предложенные метрики и подход к их извлечению могут стать основой для перехода от бинарного обнаружения атаки в социальных сетях к качественному описанию атакующего и его возможностей, а также анализу эволюции ботов. / ENG: The paper considers the ability to describe malicious bots using their characteristics, which can be the basis for building models for recognising bot parameters and qualitatively analysing attack characteristics in social networks. The following metrics are proposed using the characteristics of VKontakte social network bots as an example: trust, survivability, price, seller type, speed, and expert quality. To extract these metrics, an approach is proposed that is based on the methods of test purchases and the Turing test. The main advantage of this approach is that it proposes to extract features from the data obtained experimentally, thereby obtaining a more reasonable estimation than the expert approach. Also, an experiment on extracting metrics from malicious bots of the VKontakte social network using the proposed approach is described, and an analysis of the metrics' dependence is carried out. The experiment demonstrates the possibility of metrics extracting and analysis. In general, the proposed metrics and the approach to their extraction can become the basis for the transition from binary attack detection in social networks to a qualitative description of the attacker and his capabilities, as well as an analysis of the evolution of bots.
Article
Full-text available
We investigate the presence (and the influence) of disinformation spreading on online social networks in Italy, in the 5-month period preceding the 2019 European Parliament elections. To this aim we collected a large-scale dataset of tweets associated to thousands of news articles published on Italian disinformation websites. In the observation period, a few outlets accounted for most of the deceptive information circulating on Twitter, which focused on controversial and polarizing topics of debate such as immigration, national safety and (Italian) nationalism. We found evidence of connections between Italian disinformation sources and different disinformation outlets across Europe, U.S. and Russia, featuring similar, even translated, articles in the period before the elections. Overall, the spread of disinformation on Twitter was confined in a limited community, strongly (and explicitly) related to the Italian conservative and far-right political environment, who had a limited impact on online discussions on the up-coming elections.
Article
Full-text available
The massive spread of fake news has been identified as a major global risk and has been alleged to influence elections and threaten democracies. Communication, cognitive, social, and computer scientists are engaged in efforts to study the complex causes for the viral diffusion of digital misinformation and to develop solutions, while search and social media platforms are beginning to deploy countermeasures. However, to date, these efforts have been mainly informed by anecdotal evidence rather than systematic data. Here we analyze 14 million messages spreading 400 thousand claims on Twitter during and following the 2016 U.S. presidential campaign and election. We find evidence that social bots play a key role in the spread of fake news. Accounts that actively spread misinformation are significantly more likely to be bots. Automated accounts are particularly active in the early spreading phases of viral claims, and tend to target influential users. Humans are vulnerable to this manipulation, retweeting bots who post false news. Successful sources of false and biased claims are heavily supported by social bots. These results suggests that curbing social bots may be an effective strategy for mitigating the spread of online misinformation.
Article
Full-text available
Social bots are currently regarded an influential but also somewhat mysterious factor in public discourse and opinion making. They are considered to be capable of massively distributing propaganda in social and online media and their application is even suspected to be partly responsible for recent election results. Astonishingly, the term `Social Bot' is not well defined and different scientific disciplines use divergent definitions. This work starts with a balanced definition attempt, before providing an overview of how social bots actually work (taking the example of Twitter) and what their current technical limitations are. Despite recent research progress in Deep Learning and Big Data, there are many activities bots cannot handle well. We then discuss how bot capabilities can be extended and controlled by integrating humans into the process and reason that this is currently the most promising way to go in order to realize effective interactions with other humans.
Conference Paper
Full-text available
While most online social media accounts are controlled by humans, these platforms also host automated agents called social bots or sybil accounts. Recent literature reported on cases of social bots imitating humans to manipulate discussions, alter the popularity of users, pollute content and spread misinformation, and even perform terrorist propaganda and recruitment actions. Here we present BotOrNot, a publicly-available service that leverages more than one thousand features to evaluate the extent to which a Twitter account exhibits similarity to the known characteristics of social bots. Since its release in May 2014, BotOrNot has served over one million requests via our website and APIs.
Technical Report
Full-text available
While most online social media accounts are controlled by humans, these platforms also host automated agents called social bots or sybil accounts. Recent literature reported on cases of social bots imitating humans to manipulate discussions, alter the popularity of users, pollute content and spread misinformation, and even perform terrorist propaganda and recruitment actions. Here we present BotOrNot, a publicly-available service that leverages more than one thousand features to evaluate the extent to which a Twitter account exhibits similarity to the known characteristics of social bots. Since its release in May 2014, BotOrNot has served over one million requests via our website and APIs.
Article
Social media bots (automated accounts) attacks are organized crimes that pose potential threats to public opinion, democracy, public health, stock market and other disciplines. While researchers are building many models to detect social media bot accounts, attackers, on the other hand, evolve their bots to evade detection. This everlasting cat and mouse game makes this field vibrant and demands continuous development. To guide and enhance future solutions, this work provides an overview of social media bots attacks, current detection methods and challenges in this area. To the best of our knowledge, this paper is the first systematic review based on a predefined search strategy, which includes literature concerned about social media bots detection methods, published between 2010 and 2019. The results of this review include a refined tax-onomy of detection methods, a highlight of the techniques used to detect bots in social media and a comparison between current detection methods. Some of the gaps identified by this work are: the literature mostly focus on Twitter platform only and rarely use methods other than supervised machine learning, most of the public datasets are not accurate or large enough, integrated systems and real-time detection are required, and efforts to spread awareness are needed to arm legitimate users with knowledge.
Chapter
A social bot is a piece of software designed to have a presence on the Internet, especially on social media. Bots are algorithms acting on social media networks, engineered to achieve some purpose, and programmed to appear as real people on social networks, tweeting, having followers and using matching Facebook accounts. They are designed to make something seem to be happening that is not, or looking like persons to promote specific messages.
The black market for social media manipulation
  • S Bay
S. Bay et al., "The black market for social media manipulation," NATO StratCom COE, 2018.
Security datasets: Bot market
  • M Kolomeets
M. Kolomeets, "Security datasets: Bot market," 2021. [Online]. Available: https://github.com/guardeec/datasets
News use across social media platforms in 2020
  • A M Elisa Shearer
A. M. Elisa Shearer, "News use across social media platforms in 2020," 2021. [Online]. Available: https://www.journalism.org/2021/01/ 12/news-use-across-social-media-platforms-in-2020