Conference PaperPDF Available

From Detection to Dissection: Unraveling Bot Characteristics in the VKontakte Social Network

Authors:
  • St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS)

Abstract and Figures

The increasing sophistication of social bots and their ability to mimic human behaviour online presents a significant challenge in distinguishing them from genuine users. This paper proposes a novel approach for detecting and estimating the parameters of such bots within the VKontakte social network. This method involves creating a dataset by using controlled attacks on 'honeypot' accounts to measure bot activity. This process allows us to assess these bots' cost, quality, and speed. Additionally, we evaluate how well users trust bots by using a Turing test, which tests users' ability to identify bots. This dataset is then used within conventional machine learning techniques, leveraging features extracted from interaction graphs, text content, and statistical distributions. The evaluation of the proposed approach shows that it effectively detects bots and predicts their behaviour with considerable accuracy. It can work well even with datasets with a skewed balance of bot and human data, can identify the majority of bot networks, and shows only a minor correlation with the primary characteristics of the bots. Our approach has significant implications for enhancing the ability to select countermeasures against bots by providing a deeper understanding of the capabilities of attackers. Furthermore, it offers important insights for the forensic analysis of botnet attacks, enabling not just to confirm the presence of a botnet but also to characterise the attack’s specifics.
Content may be subject to copyright.
From Detection to Dissection: Unraveling Bot
Characteristics in the VKontakte Social Network
1st Andrey Chechulin
International Digital Forensics Centre
St. Petersburg Federal Research Center of the Russian Academy of Science
St. Petersburg, Russia
0000-0001-7056-6972
2nd Maxim Kolomeets
School of Computing
Newcastle University
Newcastle upon Tyne, UK
0000-0002-7873-2733
Abstract—The increasing sophistication of social bots and their
ability to mimic human behaviour online presents a significant
challenge in distinguishing them from genuine users. This paper
proposes a novel approach for detecting and estimating the pa-
rameters of such bots within the VKontakte social network. This
method involves creating a dataset by using controlled attacks on
’honeypot’ accounts to measure bot activity. This process allows
us to assess these bots’ cost, quality, and speed. Additionally,
we evaluate how well users trust bots by using a Turing test,
which tests users’ ability to identify bots. This dataset is then
used within conventional machine learning techniques, leveraging
features extracted from interaction graphs, text content, and
statistical distributions.
The evaluation of the proposed approach shows that it effec-
tively detects bots and predicts their behaviour with considerable
accuracy. It can work well even with datasets with a skewed
balance of bot and human data, can identify the majority of bot
networks, and shows only a minor correlation with the primary
characteristics of the bots.
Our approach has significant implications for enhancing the
ability to select countermeasures against bots by providing a
deeper understanding of the capabilities of attackers. Further-
more, it offers important insights for the forensic analysis of
botnet attacks, enabling not just to confirm the presence of a
botnet but also to characterise the attack’s specifics.
Index Terms—social bot detection, VKontakte, machine learn-
ing, bot characteristics, social network attack
I. INTRODUCTION
Detecting social bots has emerged as a significant area of
research, given the conspicuous impact these entities have on
processes such as elections, product promotions, reviews, and
others. Moreover, the complexity of differentiating between
genuine users and bots has been exacerbated by the “bot evo-
lution” phenomenon [1]. This evolution has been particularly
accelerated by the advent of Generative Adversarial Networks
(GANs), which enable bots to create images [2], and Large
Language Models (LLMs) that facilitate the generation of
textual content [3]. As a result, these sophisticated bots can
often mimic human behaviour to a degree that renders them
indistinguishable from real users [4].
Within the scholarly community, the majority of research
on bot detection has been centred around Twitter [5]. In con-
The work of Andrey Chechulin was partially supported by the budget
project FFZF-2022-0007. Maxim Kolomeets did not receive specific funding
for this study.
trast, this paper introduces a novel solution tailored explicitly
for identifying bots within the VKontakte social network, a
Russia-based online social media and social networking ser-
vice with over 70 million active monthly users. Furthermore,
we present an approach that transcends mere detection it also
facilitates the estimation of various bot parameters [6]. These
parameters yield insights into the capabilities of the attacker,
including the price of the attack, the quality of the bots, the
level of user trust in the bots, the speed with which the attack
is executed, and the nature of the bot trader orchestrating the
attack.
The novelty of this paper lies in the approach which is able
to estimate bot parameters, as well as in the features con-
struction schema for VKontakte bot detection, which involves
analysing numerical distributions, network graphs, and textual
content. The proposed approach facilitates the detection of bot
attacks, distinguishing not merely the “binary presence” of
bots but characterising the bot intrusion through an analysis
of bot parameters. This detailed profiling could significantly
influence the selection of countermeasures and enhance the
qualitative monitoring of bot activity on the VKontakte.
The structure of this paper is as follows: Section II presents
an overview of the current state of VKontakte bot detection
research. In Section III, we present our methodology, includ-
ing the formation of datasets featuring bot metrics and the
training of models. In Section IV, we evaluate the prototype’s
detection capabilities, its performance variability across differ-
ent bot types, and the efficiency with which it can predict bot
parameters. The Section V discusses the outcomes of these
experiments, and the final Section provides our conclusions.
II. STATE OF THE ART &PREVIOUS RESEARCH
The majority of extant research on VKontakte social bot
detection utilises machine learning techniques, leveraging
features extracted from user profiles and their friends’ or
subscribers’ graphs.
In [10], researchers employed a feedforward neural network,
utilising features that represent general account information
(such as age, number of photos, etc.) in conjunction with
feature construction based on the analysis of blocklists com-
prising URLs and phrases mentioned in account descriptions.
This preprint has not undergone peer review or any post-submission improvements or
corrections. The Version of Record of this article is published in
16th Conf. on COMmunication Systems & NETworkS (COMSNETS), and is available online at
https://doi.org/10.1109/COMSNETS59351.2024.10427530
Another study [11], focused on the analysis of the informa-
tiveness of categorical features of bot accounts, employing a
Catboost classifier.
Research detailed in [12] utilised information regarding the
“completeness” of an account’s profile the extent to which
the analysed account’s fields are filled. This information and
data extracted from friends’ and subscribers’ lists served as
input for a bot detector based on a random forest algorithm.
In our previous research [13], we attempted to detect bots
based solely on friend lists, bypassing the analysis of the
profile itself. This approach could be beneficial for analysing
accounts that restrict access to their profile, as their friend list
can be indirectly established by searching the target account
in the friend lists of other users. However, such an approach
requires collecting information about the friend lists of all
VKontakte users.
A quite sophisticated approach is presented in [9], where
researchers proposed a stacked ensemble of several classifiers,
each leveraging different features, such as text, graph centrality
measures, and graph embeddings. The authors noted the ex-
ceptional efficiency of friendship graph features, arguing that
“the creation of a bot with a friendship graph similar to a
normal user is a complex and time-consuming task, and most
bot makers do not undertake it.
A notable limitation of the outlined research lies in its
reliance on account suspension-based labelling methods. This
approach involves sourcing information regarding which ac-
counts have been blocked by the social network and subse-
quently labelling them as bots. However, this approach cannot
exceed the effectiveness of VKontakte’s built-in bot detection
as it does not represent a ground truth detection mechanism.
Moreover, it potentially compounds errors, integrating any
inaccuracies from VKontakte’s detection system into the de-
veloped classifiers.
From another perspective, these approaches present binary
detection outcomes (bot / no bot), lacking any qualitative
analysis of the identified bots. In this paper, we present an
extended approach that is capable of not only effectively
detecting bots but also describing their characteristics and,
consequently, the specifics of the attack.
III. PROP OS ED A PP ROAC H
The objective of the proposed approach extends beyond
the mere detection of bots’ malicious activities it also
aims to characterise the threat level of these accounts using
quantitative metrics. This section describes the process of
dataset formation utilised for training, which encompasses (a)
labelling based on bot purchase, (b) labelling derived from
Turing tests, and details the subsequent model training process.
A. Datasets formation
The methodology for dataset formation utilised the approach
we previously presented in [6]. Here, we give a short descrip-
tion of it. The core concept of the said technique involves
the formation of datasets for model training through the
orchestration of a controlled attack (malicious bot activity) on
accounts explicitly created for this experiment “honeypots”.
These “honeypots” accounts serve as trap victims that attract
bot activity, thereby allowing the measurement of bot metrics
during the attack. Subsequently, a Turing test is employed
to determine a metric indicative of the users’ proficiency in
distinguishing these bots. Two distinct strategies were adopted
for dataset creation:
1) Purschase method with expert labelling. The method in-
volves setting up a honeypot in the form of a fake social
media community that looks real but is unattractive to
genuine users to prevent their interaction (e.g., a taxi
service in a non-existent city). This community is then
used as a victim we purchased bots from a curated
list of bot traders offering different bot qualities and
services while tracking the bots’ IDs and information
related to the seller (bot trader type,bot quality, and
price). As a purchase of bot activity, we ask the bot
traders to like posts (from 100 to 300 likes) in the
community, and the order fulfilment speed is recorded to
categorize the speed of bots. After evaluating the bots,
their activities are removed from the community, and
the process is repeated for different sellers and quality
levels to compare results. Using this method, we got bot
identifiers in sets, where each set is one attack, and all
bots in it have the following bot parameters: bot quality,
bot seller type, price, and speed.
2) The assessment of human ability in identifying bots
was conducted by calculating a “Trust” metric through
a Turing test. Annotators were tasked with marking
the bots, and the difference between their responses
and the actual classifications served as an indicator of
human detection capability. In the experiment, the real-
user accounts consisted of randomly selected VKontakte
users, individuals active across various communities,
and authenticated student accounts. As bots, we used
accounts collected in a previous step. For annotators,
we presented a collection of 101 accounts, which they
were instructed to categorise as either a bot, a genuine
user, or indeterminate. After that, we observe how many
bots from each set were recognised correctly and form
a “Trust” metric as a true-positive rate the ratio of
correctly identified bots to the overall number of bots in
a set.
This dataset is available via GitHub [7] and includes 18,444
unique bot identifiers from 69 offers of 29 bot traders with the
described bot metrics (see Table I).
TABLE I
BOT METRICS DESCRIPTION
Metric Metric Description
Price The price of a bot [0,)
NBQ Normalised bot quality [LOW=0, MID=1, HIGH=2]
BTT Bot trader type [SHOP=0, EXCHANGE=1]
Speed How fast is attack [MINUTE=0, HOUR=1, DAY=2]
Trust How well users recognize the bot [0,1]
B. Models training
The model training follows a conventional data analysis
pipeline, which includes a quite large feature engineering step.
We utilise a variety of data categories to extract features from
bot accounts, such as: (a) numerical and temporal distributions,
(b) interaction graphs, and (c) generated content data.
Regarding numerical distributions, we examined publicly
accessible data from accounts that can be represented as distri-
butions. This includes the distribution of likes, comments, and
the number of friends, subscriptions, posts, and likes among
friends and subscribers. After this analysis, we employed the
methods delineated in Table II.
For time distributions, which pertain to data with a times-
tamp (e.g., the distribution of posts over time), we applied the
methods listed in Table II B.
In constructing features from interaction graphs, we ex-
amine the lists of accounts that have interacted with the bot
account (graph outlined with red in Figure 1 A). These “inter-
acting accounts” serve as vertices and are derived from the bot
account’s associated data, such as (a) friend list, (b) subscriber
list, (c) accounts that gave likes, or (d) comments. The edges
between these accounts can represent one of the following
interactions: (a) mutual friendships, (b) subscriptions, (c) likes
exchanged, and (d) comments made to one another. Through
various combinations, we can generate 16 distinct types of
graphs (4 types of vertices X4 types of edges).
We also utilise expanded graphs, which not only incorpo-
rate the “interacting accounts” but also any account that shares
a connection with at least two “interacting accounts” (red on
Figure 1 B). Expanded graphs offer an alternative view of the
dynamics surrounding the bot account, providing graphs with
a stronger connection and different topologies.
Subsequently, we employed graph algorithms detailed in
Table II C to extract features from the graphs.
Regarding media data, we incorporated both text and
account photographs. For the text, we collated lists of posts
authored by the bot as well as comments made by it. We
employed the algorithms specified in Table II D to derive
text features. It is important to note that we abandon using
Fig. 1. Construcion of 2 versions of interaction graphs.
NLP/Transformer methods that could interpret the thematic
or semantic content of the text. This is because attackers
frequently use bots within particular contexts, such as targeting
a specific product, company, or person. Consequently, there ex-
ists a potential for the model to falsify all accounts discussing a
specific topic as bots if we used NLP/Transformers. Therefore,
we opted for methods founded on statistics and syntactic
analysis, which do not engage with the semantic content and
thus introduce less bias [8]. The text feature analysis methods
are utilised in tandem with distribution analysis methods, as
depicted in Figure 2. Given that text on social networks is
manifested as discrete entities (posts, comments, publications,
etc.), a vector representing text features is extracted from the
analysis of each entity. This procedure generates a matrix
where the rows correspond to the items and the columns to
the text features (illustrated in Figure 2, yellow section). To
compute account-level features instead of item-level features,
distribution-based features are calculated for each column of
this matrix (pertaining to a text feature) (refer to Figure 2,
grey section). These computed features constitute the finalised
vector of text features for an account, thereby converting the
entire matrix into a one-dimensional vector.
For the account photo, we utilised a single feature that
indicates whether the account employs the default photo.
The summarized framework for feature extraction is detailed
in Table II. To minimise the complexity of the model, we
employed Pearson’s correlation to discern features that exhib-
ited the strongest predictive relationship with the labels (as
discussed in section IV-A). The resulting set of features was
then used as input for a feed-forward neural network tasked
with a binary classification (bot/user) or a regression analysis
to predict specific bot metrics.
Fig. 2. Schema of text features construction in conjunction with distribution
analysis.
TABLE II
FEATU RE EX TR ACT ION M ETH OD S
Method’s group Method N features
A Numerical distributions:
Base statistics (BS) size, min/max, means (standard, geometric, harmonic), mode, stddev, variance, q1-q9 19
BS & removed tails size, min/max, means (standard, geometric, harmonic), mode, stddev, variance, q1-q9 19
Other Gini-index, Benfords’ law (q1-q9) & p-value 11
B Time distributions:
IP of activity time 0 4h ... 20 24h 6
IP of activity day Monday ... Sunday 7
Distribution’s BS of activity time BS(0 4h ... 20 24h) 19
Distribution’s BS of activity time BS(0 4h ... 20 24h) 19
Distribution’s BS of non-activity BS(non-activity length) 19
C Graphs:
Coefficients % isolates, K shell size, size of core, S metric, 15
(communities number/max/modularity) based on modularity/label propagation (LP),
% domination/independent set, degree assortativity, average clustering, N bridges, global efficiency
Centrality measures Distributions’ BS of: vertex id, degree, PageRank, VoteRank, 152
modularity/LP communities sizes, k cores sizes
D text & photo:
emoji Unicode count, grapheme count 2
sentiment negative, neutral, skip, positive, speech, 5
Frequency of parts of speech NOUN, ADJF, ..., INTJ 18
Frequency of symbols and words latin/cyrillic (characters/sentences/words/abbreviations/alphanumeric words) count 18
punctuation/hashtags/mentions/urls/emails/phones count
photo default photo or not 1
IV. EVALUATION
In this section, we evaluate the proposed approach from
four perspectives: (a) the informativeness of features, (b) the
efficiency of bot detection and (c) predicting bot metrics, and
(d) the capability of specific bots to evade detection.
A. Analysis of Feature Informativeness
Following the application of feature construction methods,
we identified 234,467 features. The feature selection process
was twofold: (a) elimination of multicollinear features where
the Spearman correlation between features exceeded 0.5, and
(b) identification of the top 1,000 features exhibiting the
strongest correlation with the label. The multicollinearity
analysis reduced the feature count to 21,574. We assessed
the informativeness of these features by calculating their
correlation with the label, where 0 denotes a genuine user,
and 1 represents a bot. Figure 3 illustrates the distribution of
feature informativeness by the data source for the collated bot
sets compared with 100,000 random VKontakte users.
It is evident that the most efficacious features for detecting
bots were those derived from graphs and extended graphs.
Conversely, features based on distributions were the least
informative, which aligns with empirical observations. This
is because bots frequently imitate the numerical data of their
profiles in an attempt to mislead users.
B. Bot detection efficiency
To evaluate the classifier’s overall efficacy in identifying
individual bots, we employed active accounts from a sample of
100,000 random VKontakte social network users as negatives
(True Negatives [TN] and False Positives [FP]). For the
positives (True Positives [TP] and False Negatives [FN]), we
Fig. 3. Feature informativeness distributions. Y axe represents Spearman
correlation of feature with the label.
TABLE III
BOT DE TEC TI ON EFFI CI ENC Y
TP TN FP FN Prec. Rec./TPR TNR AUC
2500 7966 160 819 0.940 0.753 0.980 0.949
utilised active accounts from among the 18,444 bots in the
collected datasets. The model underwent training with 70% of
the accounts and was tested on the remaining 30%.
The metrics describing the efficiency of bot detection are
listed in Table III, with the AUC-ROC depicted in Figure 4 A.
To assess the variation in the efficiency of the bot detector
across different bot types, we also computed the True Positive
Rate (TPR) for each bot set. The distribution of TPR per set,
along with its mean and standard deviation, is presented in
Figure 4 B.
Fig. 4. (A) AUC-roc when each data example is an account, (B) TPR
distribution, where the bar is a set of accounts.
C. Bot metrics prediction efficiency
Regarding the bot metrics, we evaluated the Mean Absolute
Error (MAE) of our model’s predictions (as detailed in Table
IV), as well as the error distribution across various sets of bot
types (illustrated in Figure 5).
TABLE IV
MEA N ABSOLUTE ERROR O F ME TRI CS PREDICTION
Metric NBQ Trust Speed Price BTT
MAE 0.297 0.126 0.226 0.186 0.136
D. Ability of bots to bypass detector
Furthermore, we investigated the relationship between the
True Positive Rate (TPR) and bot metrics to identify which
bots possess a greater capacity to bypass our detection system.
The correlation between the TPR and the attributes of the bot
sets is depicted in Figure 6. Concurrently, Spearman’s correla-
tion coefficient, which elucidates the association between TPR
and the various metrics, is enumerated in Table V.
TABLE V
SPE ARM AN S COR R.BETWEEN TPR AN D ME TRI CS F ROM FIGURE 6
Metric NBQ Trust Speed Price BTT
Spearman’s corr. -0.172 0.477 -0.140 -0.021 -0.245
V. DISCUSSION
The evaluation results suggest that employing traditional
machine learning techniques, in conjunction with a judicious
selection of features from graphs, text, and distributions, as
well as dataset formation predicated on the purchase method,
enables the effective detection of bots and the estimation of
their parameters. The key findings from the evaluation can be
summarised as follows:
1) The detection model’s AUC-ROC is 0.949, with a TPR
of 0.753, as illustrated in Table III and Figure 4 A.
Furthermore, the model exhibits a high TNR of 0.980,
suggesting that the detector maintains accuracy even
with imbalanced inputs when there are significantly
more genuine users in comparison to bots.
2) An examination of the average TPR across different
bot sets, as shown in Figure 4 B, reveals an average
Fig. 5. Efficiency of bot metric prediction distribution of MAE and
true/predicted plots.
Fig. 6. TPR dependency on bot metrics describe the ability of bots with
specific properties to bypass the detector.
TPR of 0.738. This indicates variability in detection
rates, with only 10 bot sets having a TPR around 0.2,
signifying that certain bot types predominantly evade
detection. Conversely, for the majority of bot sets, the
TPR approximates 0.8.
3) As per Table IV, the prediction of bot metrics, and
consequently the characterisation of attack parameters,
is generally accurate. Figure 5 A shows the main
classification challenge lies between low and medium-
quality bots; however, high-quality bots can still be
distinguished from the rest.
4) Trust exhibits the most notable correlation (approxi-
mately 0.5) with bot detection difficulty, as indicated
in Figure 6 and Table V. This suggests that the more
challenging a bot is for a human to identify, the more
complex it is for the detection prototype to recognise.
The type of bot trader (BTT) also exerts influence
(correlation approximately 0.2), with bots originating
from exchange platforms being more challenging to
detect. The correlation with the bot’s price, quality, and
speed is considered insignificant.
Consequently, the proposed model is suitable for detecting
bots, although the detection efficiency is weakly correlated
with the main bot characteristics price, quality, speed, and
the type of bot trader. These results indicate that the proposed
model is capable not only of accurately detecting a significant
proportion of bots but also of predicting their characteristics,
thereby profiling an attacker. This includes determining the
rapidity of a bot attack, the associated costs, the quality of
the bots, the degree of user trust in these bots, and the nature
of the bot trader. Furthermore, the evaluation suggests that
the detector’s efficacy is not highly dependent on the types
of bots it can identify diverse bot types regardless of their
specific parameters. Nonetheless, the detection rates for some
sets of bots remain low, as shown in Figure 4 B, hinting at
the possibility of unaccounted “hidden” parameters that may
affect the detector’s efficiency.
VI. CONCLUSIONS
In this paper, we have introduced a methodology for de-
tecting bots on VKontakte and estimating their parameters.
By leveraging traditional machine learning techniques, coupled
with a feature selection from graphs, text, and distributions,
and the formation of datasets via a purchasing method, we
have achieved effective bot detection and parameter estima-
tion. Our experiments have demonstrated that the proposed
detection system can accurately identify bots even with im-
balanced datasets, and it can detect the majority of bot sets
and predict bot metrics with a reasonable degree of accuracy.
Moreover, the efficiency of detection exhibits only a weak
correlation with the primary characteristics of bots. The fusion
of our bot detection method with the assessment of bot metrics
holds promise for enhancing the selection of countermeasures,
thereby granting defenders a deeper insight into attackers and
their strategies.
REFERENCES
[1] S. Cresci, “A decade of social bot detection, Commun. ACM, vol. 63,
no. 10, pp. 72–83, 2020.
[2] S. A. Samoilenko, and I. Suvorova, Artificial intelligence and deepfakes
in strategic deception campaigns: The US and Russian experiences,
in The Palgrave Handbook of Malicious Use of AI and Psychological
Security, Cham: Springer International Publishing, 2023, pp. 507–529.
[3] K. C. Yang, and F. Menczer, “Anatomy of an AI-powered malicious
social botnet,” arXiv preprint arXiv:2307.16336, 2023.
[4] Z. Gilani, R. Farahbakhsh, G. Tyson, L. Wang, J. Crowcroft, and “Of
bots and humans (on twitter),” in Proc. 2017 IEEE/ACM Int. Conf. Adv.
Soc. Netw. Anal. Mining, 2017.
[5] M. Orabi, D. Mouheb, Z. Al Aghbari, and I. Kamel, “Detection of bots
in social media: a systematic review,” Inf. Process. Manag., vol. 57, no.
4, 2020, Art. no. 102250.
[6] M. Kolomeets and A. Chechulin, “Social bot metrics, Soc. Netw. Anal.
Min., vol. 13, no. 1, 2023.
[7] M. Kolomeets, “MKMETRIC2022 dataset with VKontakte
bot identifiers and their metrics,” 2022. [Online]. Available:
https://github.com/guardeec/datasetsmkmetric2022
[8] Z. Zhou, H, Guan, M. Bhat, and J. Hsu, “Detecting Fake News with
NLP: Challenges and Possible Directions,” preprint, 2018.
[9] K. Skorniakov, D. Turdakov, and A. Zhabotinsky, “Make Social Net-
works Clean Again: Graph Embedding and Stacking Classifiers for Bot
Detection,” in CIKM Workshops, 2018.
[10] P. D. Zegzhda, E. V. Malyshev, and E. Yu Pavlenko, “The use of an
artificial neural network to detect automatically managed accounts in
social networks,” Autom. Control Comput. Sci., vol. 51, no. 8, pp.
874–880, 2017.
[11] D. I. Samokhvalov, “Machine learning-based malicious users’ detection
in the VKontakte social network, Proc. Inst. Syst. Program. RAS, vol.
32, no. 3, pp. 109–117, 2020.
[12] A. D. Kaveeva and K. E. Gurin, Artificial VKontakte profiles and their
impact on the social network of users,” J. Sociol. Soc. Anthropol., vol.
21, no. 2, pp. 214–231, 2018.
[13] M. Kolomeets, O. Tushkanova, D. Levshun, and A. Chechulin, “Cam-
ouflaged bot detection using the friend list,” in Proc. 29th Euromicro
Int. Conf. Parallel, Distributed and Network-Based Processing (PDP),
IEEE, 2021.
ResearchGate has not been able to resolve any citations for this publication.
Chapter
Full-text available
Strategic communication campaigns using artificial intelligence (AI) are rapidly gaining notoriety for their use of manipulated videos involving humans. Strategic deception campaigns are detrimental to society when they promote fake political statements made by imposters or disseminate revenge porn that targets investigative journalists. Deepfake technology is especially virulent when it comes to persuading mass audiences. In recent years, deepfakes have been weaponized as a tool of strategic deception in political power contests, character assassination, efforts to fight the opposition, and information warfare. This chapter seeks to contribute to the growing body of literature investigating the perils of the malicious use of artificial intelligence (MUAI). The chapter provides a strong conceptual framework for addressing manipulation and deception strategies in the new era of political standoffs and psychological warfare operations between the United States and Russia. The authors discuss several novel concepts, including trolling, pranking, visual manipulation, and computational propaganda. The chapter explores and accentuates the strategic applications of deepfake technology across various scenarios. It concludes by reflecting on the effects of MUAI on society, new detection approaches, and potential measures against online deception.KeywordsStrategic deception campaignMalicious useArtificial intelligenceInformation warfareManipulationComputational propaganda
Article
Full-text available
In this paper, we propose metrics for malicious bots that provide a qualitative estimation of a bot type involved in the attack in social media: price, bot-trader type, normalized bot quality, speed, survival rate, and several variations of Trust metric. The proposed concept is that after one detects bots, they can measure bot metrics that help to understand the types of bots involved in the attack and estimate the attack. For that, it is possible to retrain existing bot-detection solutions with a metric label so that the machine-learning model can estimate bot parameters. For that, we propose two techniques for metrics labelling: purchase technique—labelling while purchasing an attack, and Trust measurement technique—labelling during the Turing test when humans try to guess which accounts are bots and which are not. In the paper, we describe metrics calculations, correlation analysis, and an example of a neural network which can predict bots’ properties. The proposed metrics can become a basis for developing social media attack detection and risk analysis systems, for exploring bot evolution phenomena, and for evaluating bot-detectors efficiency dependence on bot parameters. We have also opened access to the data, including bot offers, identifiers, and metrics, extracted during the experiments with the Russian VKontakte social network.
Conference Paper
Full-text available
The paper considers the task of bot detection in social networks. Study is focused on the case when the account is closed by the privacy settings, and the bot needs to be identified by the friend list. The paper proposes a solution that is based on machine learning and statistical methods. Social network VKontakte is used as a data source. The paper provides the review of data, that one needs to get from the social network for bot detection in the case when the profile is closed by privacy settings. The paper includes a description of features extraction from VKontakte social network and extracting complexity evaluation; description of features construction using statistics, Benford’s law and Gini index. The paper describes the experiment. To collect data for training, we collect bots and real users datasets. To collect bots we made fake groups and bought bots of different quality for them from 3 different companies. We performed two series of experiments. In the first series, all the features were used to train and evaluate the classifiers. In the second series of experiments, the features were preliminary examined for the presence of strong correlation between them. The results demonstrated the feasibility of rather high- accuracy private account bot detection by means of unsophis- ticated off-the-shelf algorithms in combination with data scaling. The Random Forest Classifier yields the best results, with an ROC AUC more than 0.9 and FPR less than 0.3. In paper we also discuss the limitations of the experimental part, and plans for future research.
Article
Full-text available
On the morning of November 9th 2016, the world woke up to the shocking outcome of the US Presidential elections: Donald Trump was the 45th President of the United States of America. An unexpected event that still has tremendous consequences all over the world. Today, we know that a minority of social bots – automated social media accounts mimicking humans – played a central role in spreading divisive messages and disinformation, possibly contributing to Trump's victory [16, 19]. In the aftermath of the 2016 US elections, the world started to realize the gravity of widespread deception in social media. Following Trump's exploit, we witnessed to the emergence of a strident dissonance between the multitude of efforts for detecting and removing bots, and the increasing effects that these malicious actors seem to have on our societies [27, 29]. This paradox opens a burning question: What strategies should we enforce in order to stop this social bot pandemic? In these times – during the run-up to the 2020 US elections – the question appears as more crucial than ever. Particularly so, also in light of the recent reported tampering of the electoral debate by thousands of AI-powered accounts. What stroke social, political and economic analysts after 2016 – deception and automation – has been however a matter of study for computer scientists since at least 2010. In this work, we briefly survey the first decade of research in social bot detection. Via a longitudinal analysis, we discuss the main trends of research in the fight against bots, the major results that were achieved, and the factors that make this never-ending battle so challenging. Capitalizing on lessons learned from our extensive analysis, we suggest possible innovations that could give us the upper hand against deception and manipulation. Studying a decade of endeavors at social bot detection can also inform strategies for detecting and mitigating the effects of other – more recent – forms of online deception, such as strategic information operations and political trolls.
Article
Full-text available
The problem of detection of automatically managed accounts (bots) in social networks has been considered. The method of their detection based on machine learning methods is proposed. The paper describes an example of a method based on artificial neural network learning. The parameters of a page in a social network used to detect bots have been presented. An experimental evaluation of the proposed system performance is given that demonstrates a high level of detection of bots in social networks.
Conference Paper
Full-text available
Recent research has shown a substantial active presence of bots in online social networks (OSNs). In this paper we utilise our previous work (Stweeler) to comparatively analyse the usage and impact of bots and humans on Twitter, one of the largest OSNs in the world. We collect a large-scale Twitter dataset and define various metrics based on tweet metadata. Using a human annotation task we assign 'bot' and 'human' ground-truth labels to the dataset, and compare the annotations against an online bot detection tool for evaluation. We then ask a series of questions to discern important behavioural characteristics of bots and humans using metrics within and among four popularity groups. From the comparative analysis we draw differences and interesting similarities between the two entities, thus paving the way for reliable classification of bots, and studying automated political infiltration and advertisement campaigns.
Article
Social media bots (automated accounts) attacks are organized crimes that pose potential threats to public opinion, democracy, public health, stock market and other disciplines. While researchers are building many models to detect social media bot accounts, attackers, on the other hand, evolve their bots to evade detection. This everlasting cat and mouse game makes this field vibrant and demands continuous development. To guide and enhance future solutions, this work provides an overview of social media bots attacks, current detection methods and challenges in this area. To the best of our knowledge, this paper is the first systematic review based on a predefined search strategy, which includes literature concerned about social media bots detection methods, published between 2010 and 2019. The results of this review include a refined tax-onomy of detection methods, a highlight of the techniques used to detect bots in social media and a comparison between current detection methods. Some of the gaps identified by this work are: the literature mostly focus on Twitter platform only and rarely use methods other than supervised machine learning, most of the public datasets are not accurate or large enough, integrated systems and real-time detection are required, and efforts to spread awareness are needed to arm legitimate users with knowledge.
Anatomy of an AI-powered malicious social botnet
  • Yang
K. C. Yang, and F. Menczer, "Anatomy of an AI-powered malicious social botnet," arXiv preprint arXiv:2307.16336, 2023.
Detecting Fake News with NLP: Challenges and Possible Directions
  • Z Zhou
  • H Guan
  • M Bhat
  • J Hsu
Z. Zhou, H, Guan, M. Bhat, and J. Hsu, "Detecting Fake News with NLP: Challenges and Possible Directions," preprint, 2018.