Conference PaperPDF Available

DeBot: Twitter Bot Detection via Warped Correlation

Authors:

Abstract and Figures

We develop a warped correlation finder to identify correlated user accounts in social media websites such as Twitter. The key observation is that humans cannot be highly synchronous for a long duration; thus, highly synchronous user accounts are most likely bots. Existing bot detection methods are mostly supervised, which requires a large amount of labeled data to train, and do not consider cross-user features. In contrast, our bot detection system works on activity correlation without requiring labeled data. We develop a novel lag-sensitive hashing technique to cluster user accounts into correlated sets in near real-time. Our method, named DeBot, detects thousands of bots per day with a 94% precision and generates reports online everyday. In September 2016, DeBot has accumulated about 544,868 unique bots in the previous one year. We compare our detection technique with per-user techniques and with Twitter’s suspension system. We observe that some bots can avoid Twitter’s suspension mechanism and remain active for months, and, more alarmingly, we show that DeBot detects bots at a rate higher than the rate Twitter is suspending them.
Content may be subject to copyright.
DeBot: Twitter Bot Detection via Warped Correlation
Nikan Chavoshi, Hossein Hamooni, Abdullah Mueen
Department of Computer Science, University of New Mexico
{chavoshi, hamooni, mueen}@unm.edu
Abstract—We develop a warped correlation finder to identify
correlated user accounts in social media websites such as Twitter.
The key observation is that humans cannot be highly synchronous
for a long duration; thus, highly synchronous user accounts
are most likely bots. Existing bot detection methods are mostly
supervised, which requires a large amount of labeled data to
train, and do not consider cross-user features. In contrast, our bot
detection system works on activity correlation without requiring
labeled data. We develop a novel lag-sensitive hashing technique
to cluster user accounts into correlated sets in near real-time.
Our method, named DeBot, detects thousands of bots per day
with a 94% precision and generates reports online everyday. In
September 2016, DeBot has accumulated about 544,868 unique
bots in the previous one year.
We compare our detection technique with per-user techniques
and with Twitter’s suspension system. We observe that some bots
can avoid Twitter’s suspension mechanism and remain active for
months, and, more alarmingly, we show that DeBot detects bots
at a rate higher than the rate Twitter is suspending them.
I. INTRODUCTION
Social media websites allow users to communicate and share
ideas. Automated accounts, called bots, abuse social media by
posting unethical content [1], supporting sponsored activities
[12], and selling accounts [17]. Social media sites, like Twitter,
frequently suspend abusive bots [19]. However, current bot
detection methods consider accounts independent of each other
(i.e. per-user detection), and are mostly supervised [20][8].
We propose a novel unsupervised approach that identifies bots
using correlated user activities. Figure 1 shows two Twitter
accounts who do not follow each other but are correlated
in tweeting activities. More examples of correlated accounts
and a video capture of two correlated accounts are available
in [2]. An analysis of significance of correlation in detecting
bots is published in [4]. In this paper, we describe the DeBot
system architecture and experimentally compare bot detection
performance with respect to existing per-user methods.
On the data mining front, we develop a system called
DeBot which correlates millions of users in near real-time
to identify bot accounts. Traditional correlation coefficients
such as Pearson’s are non-elastic and are not suitable for
Twitter activity time series because of warping induced by
various factors: bot controllers, network delays, and internal
processing delays in Twitter. An example of warping in activity
time series is shown in Figure 1(bottom), where two users,
alan26offical and FiIosofei, tweeted and retweeted
many identical pairs of tweets with exactly ten seconds of lag
and occasional warping. We allow time-warping by calculating
warped correlation using the Dynamic Time Warping (DTW)
distance for time series [10]. In our example, the warped
0
1
2
40 Seconds
Alan
Filosofei
Time
Warping
Fig. 1. (top) Two highly correlated Twitter accounts: Alan (left) and Filosofei
(right). (bottom) Six-minutes of correlated activities from Alan and Filosofei.
correlation between the two users is 0.99, which is much higher
than cross-correlation (0.72) and Pearson’s correlation (0.07).
To perform a large number of pair-wise warped correlation
calculations efficiently, we develop a lag-sensitive hashing
technique which hashes the users of the tweets into buckets
of suspiciously correlated users. We show that lag-sensitive
hashing is advantageous over regular random projection-based
hashing in capturing warping correlations. The system then
validates the correlations between the suspected users with
account-specific listeners and outputs the valid bots. Our
system collects tweets from the Twitter API at 48 tweets
per seconds, which is the maximum rate allowed, and reports
correlated accounts in daily batches.
Our contribution in this work is mainly twofold: developing
the warped correlation finder and using this finder to detect
bots. Specifically:
We develop a near real-time system, DeBot, which is the
first (to our knowledge) unsupervised method to detect
bots in social media. Our system detects more bots than
existing supervised techniques.
We develop a novel lag-sensitive hashing technique to
quickly group correlated users based on their warping
correlations. This allows us to cross-match millions of
activity series under time warping, which has never been
attempted before.
We empirically show that DeBot has 94% precision and
is able to detect bots that other methods fail to spot. We
find that bots can be functionally grouped and that their
number is growing at a high rate.
The rest of the paper is organized as follows: We start with a
quick background on correlation computation in Section II. We
describe our core techniques in Section III, including the never-
ending correlation tracker and bot clustering algorithm. We
perform a comprehensive evaluation of our method in Section
IV and finally conclude in Section V. An expanded version of
our paper is available in [2] containing more experiments and
discussion of related work.
II. BACKGROU ND A ND NOTATI ON
The activity signal of a user in social media consists of all
the actions the user performs in a sequence of observations.
Actions include posting, sharing, liking, tweeting, retweeting,
and deleting. The sequence of timestamps of the activities
of a user-account (or simply, a user) typically forms a time
series with zero values indicating no action, and occasional
spikes representing number of actions in that specific second.
Throughout the paper, we assume a one-second sampling rate,
though the method does not have any specific sparsity or
sampling rate requirements. We define the problem we solve
as follows.
Problem: Find warped-correlated groups of users from
activity signals at every Thours.
The core of the above problem is comparing pairs of users
to determine correlated groups, which is an unsupervised,
pair-wise (i.e. quadratic) matching process. We first define
terms and functions that provide necessary background for the
algorithm.
Correlation: To capture users with highly synchronous post-
ing activities, the correlation coefficient between two activity
time series is a strong indicator. There are several measures of
correlation. The most commonly used coefficient is Pearson’s
coefficient. For a time series xand yof length m, Pearson’s
correlation coefficient can be defined using the Euclidean
distance between z-normalized time series ˆxand ˆy[13].
C(x, y)=1d2x,ˆy)
2m=Pxyxµy
xσy
Cross-correlation: Cross-correlation between two signals
produces the correlation coefficients at all possible lags. For
two signals xand yof length m, cross-correlation takes
O(mlog m)time to compute 2m1coefficients at all lags.
For τ[m, m], a discrete version of cross-correlation ρxy
is defined for integer lag τas follows,
ρxy(τ) = C(x1:mτ, yτ+1:m), τ 0
C(x|τ|+1:m, y1:m−|τ|), τ < 0
Here the :operator is used to represent an increment-by-one
sequence. Note that ρxy(τ) = ρyx(τ).
Typically, for large lag (τ), cross-correlation is meaningless
for lack of data. In reality, every domain has a range of
interesting lags.
Dynamic Time Warping: Correlation can capture high
synchronicity among users, however, real bots often show
warping. Dynamic time warping (DTW) distance is a well-
studied distance measure for time series data mining. DTW
allows signals to warp against one another, and constrained
DTW allows warping within a window of wsamples. Similar
to lag in cross-correlation, the constraint window (w) for bot
detection should not be more than a few seconds.
Warped Correlation: We extend the notion of warping to
correlation. If xand yare z-normalized, DTW distance can be
converted to a warped correlation measure with a range of [-
1,1]. If the number of squared errors that are added to obtain a
distance (i.e. the path length), is Pthen the warped correlation
is defined as below.
wC(x, y)=1DT W 2x, ˆy)
2P
The finite range of warped correlation is useful in measuring
the significance of a match. A very strong warped correlation
of 0.995 indicates almost identical behavior from two users.
In this paper, we use a threshold of 0.995 warped correlation
to identify bots.
Random Projection: Random projection has been used
in high dimensional K-nearest neighbor search for over a
decade [3]. Random projection has also been shown to work
for time series similarity search in real time [6]. The key
idea is to project each high dimensional time series on k
random directions. By Johnson-Lindenstrauss lemma, it is
probabilistically guaranteed that the points that are similar
in the high dimensional space will be close/similar in the
projected space, and that dissimilar points will be far/dissimilar
[3]. In this project, we use cross-correlation-based random
projection. Simply put, we generate one random vector and
rotate the dimensions in both clockwise and anti-clockwise
manners to produce the remaining random vectors.
III. DEBOT COR RE LATI ON FINDER
A. DeBot Architecture
In this section, we describe the architecture of the DeBot
system which detects bots every Thours. The system consists
of four components which are shown in Figure 2.
The four components of the process are: collector, indexer,
listener, and validator. The collector collects tweets that
match with a certain set of keywords for Thours using the
filter method in the API. The matching process in the
Twitter API is quoted from the Twitter’s developer’s guide
for clarity. “The text of the Tweet and some entity fields are
considered for matches. Specifically, the text attribute of the
Tweet, expanded url and display url for links and media, text
for hashtags, and screen name for user mentions are checked
for matches.” The collector forms the time series of the number
of activities at every second for all of the user-accounts. The
collector filters out users with just one activity, as correlating
one activity is meaningless. The collector then passes the time
series to the indexer.
Note that, as we are using the filter method, we may
not receive all the activities of a given user in the Thour
period. This clearly challenges the efficacy of our method, as
subsampled time series may add false negatives. Even though
U3
U1
U2
U4
Un
Collector
Keyword
swarmapp
https-www-@
Youtube
instagram
Indexer U1
×3
a
b
c
d
e
f
g
U5
×3 U6
×2
U2
×2 U9
×3
U1
×1
h
U8
×1
U3
×1
U7
×1
U2
×5 U3
×1 U4
×1 U7
×1
U1
×3 U2
×3 U5
×3 U9
×1
U3
×1 U5
×1 U6
×2
U3
×2 U8
×3
U3
×1 U4
×2 U7
×1 U8
×2 U9
×3
U8
×1 U1
U5
U9
U2
Listener
Validator
U9
U1
U5
U2
Hash Table
U5U9U1U2
Fig. 2. Four phases of our bot detection process. The system takes a stream of activities (e.g. Twitter Firehose) as input and produces groups of correlated
users in a pipelined manner. Collision scenario: Assume w= 12, then each user is hashed into buckets (a to h) 25 times. The number of occurrences of a user
is denoted by the superscript. We need bw
4c=3 occurrences of a user in the same bucket to qualify, e.g. U2is a qualified user in bucket d. Qualified users are
marked with green ellipses. However, bucket dis not a qualified bucket as it does not have three qualified users. Buckets aand eare qualified. Thus from the
hash table, we extract suspicious users: U1,U5,U9, and U2which are circled with solid line.
we may have false negatives, our method outperforms existing
bot detection techniques by far (see Section IV). Moreover,
this issue disappears when site-owners use our method on the
complete set of user activities.
The indexer takes the activity time series of all the users
as input, hashes each of them into multiple hash buckets, and
reports sets of suspicious users that collide in the same hash
buckets. In order to calculate the hash buckets for a given set
of time series, the indexer uses a pre-generated random time
series r, calculates the cross-correlation between each time
series, and r, and finally calculates 2w+ 1 hash indexes for
different lags. Here, wis a user-given parameter representing
the maximum allowable lag.Once hashed, the indexer finds a
list of suspicious users which are qualified users in qualified
buckets. Qualified users are those who have more than bw
4c
occurrences in a specific bucket. Similarly, qualified buckets
have more than bw
4cqualified users. We go through each
qualified bucket and pick qualified users in them to report
as suspicious users. The minimum number of occurrences
(bw
4c) to qualify is made dependent on wto avoid an explicit
parameter setting. We test the sensitivity of the parameter w
in the experiment section.
The listener listens to the suspicious users exclusively. In
this step, instead of using keywords, the Twitter stream is
filtered on suspicious user accounts. The listener is different
from the collector in a principled way. The listener receives
all the activities of a suspicious user over a period of Thours,
while the collector obtains only a sample of the activities. The
listener will form the activity time series of the suspicious
users and send them to the validator. The listener filters out
users with less than forty activities as discussed in [4].
The validator reads the suspicious time series from the
listener and verifies their validity. The validator calculates a
pairwise warped correlation matrix over the set of users and
clusters the users hierarchically up to a very restrictive distance
cutoff. A sample of hierarchical clusters is shown in Figure
2. After clustering, every singleton user is ignored as a false
positive, and the tightly connected clusters are reported as bots.
For clarity we describe the clustering process separately in the
next subsection.
B. Lag-sensitive Hashing
In this section, we describe our novel lag-sensitive hashing
technique. For this techinque, we adopt the concept of struc-
tured random projection. We project activity signals in 2w+ 1
directions, which are lagged vectors of a random vector r.
We achieve this by simply calculating the cross-correlation
between rand a given signal and picking the 2w+ 1 values
around the symmetry. The following theorem describes the
best-case hashing scenario.
Theorem 1. If two infinitely long time series xand yare
exactly correlated at a lag lwthen they must collide in
exactly 2wlbuckets.
Proof: Let us assume ris the reference object of the same
length as of xand y. Without losing generality, let us assume
ρxy(l) = 1.0and l0(if l < 0, we can swap xand y).
Every alignment of rwith xhas a corresponding alignment of
rwith yat lag l. Both of these alignments produce the same
correlation and result into a collision in the hash structure.
Formally, ρxr (i) = ρyr(il)for any i[w, w]. There are
exactly three ways that this can happen.
If i < 0,ρxr(i) = ρr x(i)and ρyr (il) = ρry (i+l)
are equal because riis aligned with xiand yi+l.
If 0< i < l,ρxr (i)and ρyr (il) = ρry (li)are equal
because riis aligned with x1and yl.
If i>l,ρxr(i) = ρyr (il)is trivially true because riis
aligned with x1and yl.
For i < (wl),ρyr (il)is not calculated by our hash
function. Therefore, the only valid range for iis [(wl), w],
which gives us 2wlcollisions.
How well do cross-correlation coefficients capture DTW
distance or warped correlation? We calculate the DTW dis-
tances and cross-correlation between 5000 pairs of random
walks. We use the same was the constraint window size and
the maximum allowable lag. We plot the maximum cross-
correlation (for lags in [w, w]) against the DTW distance
in Figure 3(left), which shows the reciprocal relationship that
we exploit in our lag-sensitive hashing scheme.
We analyze the goodness of lagged projection compared to
classic random projection when hashing signals. There are two
0 5 10 15 20 25 30
20
30
40
0
10
50
60
70
80
90
100
Pruning Rate
Neighbor Rate
Percentage
BinFactor, (Bin Count = n*BF)
0
-0.5
0.5
1.0
2 4 6 8 10 12 14 16 18 20
0DTW distance
Maximum Cross-Correlation
Lagged Random Projection
Classic Random Projection
Fig. 3. (left) Maximum cross-correlation shows reciprocity with DTW
distance.(right) Comparison with classic random projection based on pruning
and neighbor rates.
important metrics that we consider: The pruning rate is the
percentage of similarity comparisons that we can avoid while
finding any of the top-5 nearest neighbors without the hash
structure. The neighbor rate is the percentage of times we
can retrieve any of the top-5 nearest neighbors under warped
correlation using the hash structure. Ideally we want both the
metrics to be close to 100%.
We evaluate the neighbor rate and pruning rate on several
datasets from the UCR archive [11] for various bin counts.
We show a representative chart for the Trace dataset in Figure
3(right). This chart identifies the trade-off between the two
techniques. Classic random projection is better in pruning (i.e.
speed) while our proposed lagged projection is better in finding
the neighbors (i.e. accuracy). This is an understandable trade-
off between structured and true random projections. Since we
would like to find highly correlated groups, accurate lagged
projection is our method of choice.
C. Clustering
The validator calculates the pairwise constrained warped
correlation for all of the suspicious users. We use the maximum
allowable lag (i.e. w) from the indexer as the constraint size.
The validator then performs a hierarchical clustering on the
pairwise DTW distances using the “single” linkage technique,
which merges the closest pairs of clusters iteratively. A sample
dendrogram is shown in Figure 4, which shows the strong
clusters and the numerous false positives that we extract from
the time series.
We use a very strict cutoff threshold to extract highly dense
clusters and ignore all the remaining singleton users. For
example, in Figure 2, U1,U5and U9are clustered together
and U2is left out as false positive. The cutoff we use is 0.995
warped correlation. The extracted clusters, therefore, contain
significant bot accounts.
As we pass more rounds of Thours, we can merge these
clusters to form bigger clusters. This is an important step,
because bots form correlated groups and may disband them
dynamically. Therefore, an already detected bot can reveal a
new set of bots in the next round of Thours. While merging
these clusters, we use a simple friend-of-friend technique.
If two clusters share one user in common, we merge them.
Although it may sound very simple, we see that such a simple
method can retain high precision because of the overwhelming
number of existing bots.
1
0.9
0.6
0.4
Clusters
False Positives
Cutoff=0.995
0.8
0.7
0.5
Fig. 4. A sample dendrogram of suspicious users’ activities. Only a few
users fall below the restricted cutoff. The rest of the users are cleared as false
positives.
IV. EMPIRICAL EVALUATION
All of the experiments in this section are exactly repro-
ducible with code and data provided on the supporting page
[2]. DeBot produces a daily report of the bot accounts by
analyzing the activities of the previous day. The daily reports
and detected bots are available at [2]. We have three inter-
dependent parameters: the number of buckets (B=5000) in the
hash table, the base window (T=2 hours), and the maximum
lag (w=20 seconds). Unless otherwise specified, the default
parameters are used for all experiments.
A. Bot Quality: Precision
Our method produces a set of clusters of highly correlated
users based solely on their temporal similarity. As mentioned
earlier, we find correlated users (>0.995) who have more than
forty synchronous activities in Thours. In this subsection we
empirically validate the quality of the bots that we detect using
several methods.
1) Comparison with existing methods: Typically, there are
three approaches to evaluating the detected bots. The first
approach is to sample and evaluate the accounts manually
[9]. The second approach is to set up “honeypot” in order to
produce labeled data by attracting bots, and then to evaluate
a method by cross validation [15]. The last approach is to
check whether or not the accounts are suspended by Twitter
at a later time [16]. The first two approaches are suitable for
supervised methods and only produce static measurements at
one instance of time. Our major evaluation is done against
Twitter over three months. We also compare DeBot with two
other static techniques from the literature.
Comparison with Twitter and Bot or Not?: Twitter
suspends accounts that do not follow its rules [18]. To compare
the results of DeBot with Twitter’s suspension process, we
design two segments: static and dynamic. In the static segment,
we ran DeBot every 4 hours for sixteen days (May 18 - June
3, 2015) and linked all the clusters into one integrated set of
clusters using the friend-of-friend technique. We picked the top
ten clusters (9,134 bot accounts) to form our base set. From
June to August 2015, we probed the Twitter API every few
days to check if these detected accounts were suspended. As
you can see in Figure 5 (left), the number of bots suspended
by Twitter increased over time, and by the end of 12 weeks
45% of the bot accounts identified by DeBot were suspended.
In addition to this static segment, where we kept the set of
bots detected by DeBot fixed and probed Twitter over time,
we also performed dynamic detection. On August 28, 2015,
we started running DeBot every week and added the newly
Fig. 5. Comparison between the number of bots detected by DeBot, Twitter,
and Bot or Not? project over time. (Note that we probed Twitter and Bot or
Not? only for the accounts in the base set.)
discovered bots to the base set of bots. In every run, we
listened to Twitter for 7 successive days. The results are shown
in Figure 5 (right). DeBot consistently found new bots every
week. We then continuously probed Twitter to check the status
of the newly detected bots. The results clearly show that the
number of bots we detect is increasing at a higher rate than
Twitter’s suspension rate. It is very likely that Twitter detects
more bots than they suspend, and that several relevant factors
such as country specific laws, need for graceful action etc.
may contribute to Twitter’s suspension system. The outcome
of our experiments is a reminder that Twitter may need to be
more aggressive in their suspension process. At the time this
is written, DeBot has accumulated a set of close to 500,000
bots (at a rate of close to 1500 bots per day!). The identities
of these bots are available in our supporting page [2].
An obvious question one might ask is: how many bots that
are not suspended by Twitter are worth detecting? We answer
this question by comparing our method with a successful
existing technique developed in the Truthy project [8], Bot
or Not?.BotOrNot is a supervised method that estimates the
probability of “being bot” for a given account. Having 50% or
more as the threshold, we got 59% relative support from Bot
or Not? in June 2015. We probed Bot or Not? two more times
in the static segment (see Figure 5) and notice no significant
change in detection performance. We also probed Bot or Not?
twice in the dynamic segment and observe that Bot or Not?
detected increasingly more bots as DeBot was growing the
base set. This supports our original argument that Twitter is
falling behind in detecting bots.
The reason why Bot or Not? is only half as accurate as
DeBot is that the method was trained for English-language
tweets, while DeBot catches all languages just based on
temporal synchronicity. Another reason is that Bot or Not? is
a supervised technique trained periodically. In contrast, DeBot
detects bots every day in a completely unsupervised manner.
Bot or Not? probably misses some recent dynamics of bots,
resulting in a smaller overlap.
A complementary question is: which method (DeBot or Bot
or Not?) produces bots that Twitter preferentially suspends?
We calculate the fraction of accounts that Twitter suspends
for Bot or Not? and DeBot exclusively. We see that Twit-
ter suspends more bots that are supported by Bot or Not?
(37.43%) than are supported by DeBot (21.06%). This bias
to a feature-based supervised method possibly indicates that
temporal synchronicity should be used by Twitter’s detection
and suspension mechanism.
alan26oficial
CELEBRIRO
610azuha
ONIGASHIMAch
0
10
20
30
40
50
60
0 10 20 30 40 50 60
Seconds-of-Minute
Minutes-of-Hour
Base Window (T)
40
60
80
100
120
140
160
180
200
Number of Accounts
10 20 30 40 50 60
5
6
7
8
9
10
11
12
13
14
Number of Clusters
Number of Clusters
Number of Accounts
Fig. 6. (left) Four bots showing different patterns in the minutes-of-hour
vs. seconds-of-minute plot. (right) Effect of base window on the detection
performance .
Comparison with per-user method: Per-user methods are
being developed actively by researchers. We compare DeBot
to a per-user method in [21], which tests the independence
of minute-of-an-hour and second-of-a-minutes with χ2test.
Figure 6 (left) shows a set of bots and their second-of-minute
vs. minute-of-hour plots. The test cannot detect bot accounts
with uniformly distributed activities. 76% of the detected bots
by DeBot are supported by the χ2test on average. There are
other per-user methods [5][15][7] that use machine-learned
classifiers to detect bots. The method in [7] is similar to
ours in considering temporal behavior. However, the method
is a supervised per-user method trained on a small dataset of
around a few thousand accounts. We do not compare DeBot
with this method since DeBot is unsupervised, works in real
time, and identifies several hundred bots every day.
2) Contextual Validation: One-quarter of the bots detected
by DeBot are not yet supported by Twitter, or Bot or Not?, or
the χ2test. Are they worth finding? An exact answer to this
question really does not exist, due to the lack of ground truth.
To alleviate this concern, we evaluate the bots using contextual
information, such as tweet content, and get an average of
78.5% relative support. We also employ human judges to
compare the content of our bots against each other and achieve
94% support. You can find the details of these experiments and
recall estimation in [4].
B. Parameter Sensitivity
We have three inter-dependent parameters that we analyze in
this section. We iterate over each parameter, while keeping the
remaining parameters fixed. For the experiments in this section,
we use the keywords (swarmapp |youtube |instagram)
as our filter strings.
Base Window (T): We change the size of the base window,
T, to observe the change in detection performance. We see
consistent growth in number of clusters and bot accounts as T
increases. A larger base window ensures that more correlated
users can show up and be hashed. The end effect is that we
have higher quality clusters at the cost of a longer wait. Figure
7 (left) shows the results.
Number of Buckets (B): We change the number of buckets
in the hash structure. Too few buckets will induce unnecessary
collisions, while too many buckets spread users sparsely.
Figure 7 shows that the maximum number of clusters and bot
accounts can be achieved by using 2000 to 4000 buckets.
50
100
150
200
250
300
350
Number of Accounts
8
10
12
14
16
18
20
22
Number of clusters
1000 2000 4000 8000 16000 32000
Number of Buckets (B)
Number of Clusters
Number of Accounts
20 155
160
165
170
175
180
185
190
Number of clusters
Maximum Lag (w)
0 5 10 30
340
360
380
400
420
440
460
480
Number of accounts
Number of Clusters
Number of Accounts
Fig. 7. Effect of parameters on the detection performance, (left) number of
buckets and (right) maximum lag in seconds.
Maximum Lag (w): We check the impact of maximum
lag over detection performance. As previously described, user
activities require lag and warping sensitive correlation mea-
sures. For zero lag (essentially Euclidean distance), we obtain
significantly fewer clusters and bot accounts. For the lag of 30
seconds, the number of clusters is again low because the hash
structure is crowded with copies of each user, resulting in lots
of spurious collisions. Results are shown in Figure 7.
C. Scalability
Online methods depend on several degrees of freedom.
This makes analyzing and comparing scalability difficult. Two
quantities are most important: data rate and window size. The
Twitter streaming API has a hard limit on the data rate; we
receive tweets at a 48 tweet-per-second rate at the most. Even
if we increase the generality of the filter string, we cannot
increase the data rate.
Therefore, scalability depends on much of a user’s history
we can store and analyze. This is exactly the parameter T
in our problem definition. We set our largest experiment to
collect 1 million user accounts. This is a massive number of
time series to calculate the warping-invariant correlation for
all pairs. Note that it is easier to do trillions of subsequence
matching [14] in a streaming fashion at a very high data rate by
exploiting overlapping segments of successive subsequences.
Calculating pairwise DTW distances for a million users is
equivalent to a trillion distance calculation without overlapping
substructures. We exploit the efficiency of cross-correlation,
which enables our hashing mechanism to compute the clusters
and identify bots.
It takes T= 9.5hours to collect 1 million users. The indexer
then takes 40 minutes to hash all the users. 24,000 users are
qualified for the listener, and the validator detects 93 clusters
of 1,485 accounts.
V. CONCLUSION
We illustrate that the presence of highly synchronous cross-
user activities reveals abnormalities and is a key to detecting
automated accounts. We develop an unsupervised method
which calculates cross-user activity correlations to detect bot
accounts in Twitter. We evaluate our method with per-user
method and Twitter suspension process. The evaluation shows
that Twitter suspends automated accounts with lower rate than
our method finds them. DeBot also detects more bots when
compared to per-user methods. DeBot is running and detecting
thousands of bot daily. Our future goal is to extend this work
to further understand bot behavior in social media to improve
trustworthiness and reliability of online data.
REFERENCES
[1] How twitter bots fool you into thinking they are real people. http://www.
fastcompany.com/3031500/how-twitter-bots-fool-you-into-thinking/
/-they-are-real-people.
[2] Supporting web page containing video, data, code and daily report.
www.cs.unm.edu/chavoshi/debot.
[3] E. Bingham and H. Mannila. Random projection in dimensionality
reduction: applications to image and text data. ACM SIGKDD 2001.
[4] N. Chavoshi, H. Hamooni, and A. Mueen. Identifying Correlated Bots
In Twitter. SocInfo 2016.
[5] Z. Chu, S. Gianvecchio, H. Wang, and S. Jajodia. Detecting Automation
of Twitter Accounts: Are You a Human, Bot, or Cyborg? IEEE
Transactions on Dependable and Secure Computing, 9(6):811–824,
Nov. 2012.
[6] R. Cole, D. Shasha, and X. Zhao. Fast window correlations over
uncooperative time series. ACM SIGKDD 2005.
[7] A. F. Costa, Y. Yamaguchi, A. J. M. Traina, C. T. Jr., and C. Faloutsos.
RSC: mining and modeling temporal activity in social media. ACM
SIGKDD 2015.
[8] C. A. Davis, O. Varol, E. Ferrara, A. Flammini, and F. Menczer.
Botornot: A system to evaluate social bots. In Proceedings of the 25th
International Conference Companion on World Wide Web, 2016.
[9] J. Jiang, C. Wilson, X. Wang, P. Huang, W. Sha, Y. Dai, and B. Y.
Zhao. Understanding latent interactions in online social networks. IMC
2010.
[10] E. Keogh. Exact indexing of dynamic time warping. In Proceedings
of the 28th international conference on Very Large Data Bases, VLDB
2002.
[11] E. Keogh, X. Xi, L. Wei, C. A. Ratanamahatana, T. Folias, Q. Zhu,
B. Hu, and H. Y. The UCR time series classification/clustering
homepage, 2011.
[12] H. Li, A. Mukherjee, B. Liu, R. Kornfield, and S. Emery. Detecting
Campaign Promoters on Twitter Using Markov Random Fields. ICDM
2014.
[13] A. Mueen and E. Keogh. Online discovery and maintenance of time
series motifs. ACM SIGKDD 2010.
[14] T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover,
Q. Zhu, J. Zakaria, and E. Keogh. Searching and mining trillions of
time series subsequences under dynamic time warping. ACM SIGKDD
2012.
[15] G. Stringhini. Stepping Up the Cybersecurity Game: Protecting Online
Services from Malicious Activity. Thesis, UNIVERSITY OF CALI-
FORNIA Santa Barbara, 2014.
[16] K. Thomas, C. Grier, D. Song, and V. Paxson. Suspended accounts in
retrospect: an analysis of twitter spam. IMC 2011.
[17] K. Thomas, V. Paxson, D. Mccoy, and C. Grier. Trafficking Fraudulent
Accounts : The Role of the Underground Market in Twitter Spam
and Abuse Trafficking Fraudulent Accounts. In USENIX Security
Symposium, 2013.
[18] Twitter. About suspended accounts. https://support.twitter.com/articles/
15790.
[19] Twitter. The Twitter Rules. https://support.twitter.com/articles/18311.
[20] A. Wang. Detecting Spam Bots in Online Social Networking Sites: A
Machine Learning Approach. In Data and Applications Security and
Privacy XXIV, volume 6166 of Lecture Notes in Computer Science,
pages 335–342. Springer Berlin Heidelberg, 2010.
[21] C. M. Zhang and V. Paxson. Detecting and analyzing automated
activity on twitter. In Lecture Notes in Computer Science (including
subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bio informatics), volume 6579 LNCS of PAM, 2011.
... To combat this issue, several bot detection tools have been developed. In this section, we will analyze some of the most common bot detection tools, precisely Debot [46], Tweetbotornot [40], SocialBotHunter [44], BotSentinel [50], and Botmeter [32]. ...
Chapter
Full-text available
Bot detection in social media, particularly on Twitter, has become a crucial issue in recent years due to the increasing use of bots for malicious uses such as the spreading of false information in order to manipulate public opinion. In this paper, we review the most widely available tools for bot detection and the categorization models that exist in the literature. This paper put focus on providing a concise and informative overview of state-of-the-art bot detection on Twitter. This overview can be useful for developing more effective detection methods. Overall, our paper provides valuable insights into the current state of bot detection in social media, suggesting new challenges and possible future trends and research.Keywordsbot detectionTwittersocial mediabotnetmisinformation spread
... Many methods have been developed for bot identification. These algorithms use the user's account features such as temporal frequency of tweets (Chavoshi et al., 2016), tweet content (Ng and Carley, 2023a), or even network features (Feng et al., 2021a), to construct bot/human classifiers through the use of supervised machine learning methods, to deep neural network methods (Fazil et al., 2021;Wu et al., 2021) or graph convolutional networks methods (Feng et al., 2021b;Li et al., 2022). In this study, we adopt the BotHunter algorithm (Beskow and Carley, 2018) to classify users into bots and humans. ...
Article
Full-text available
Introduction France has seen two key protests within the term of President Emmanuel Macron: one in 2020 against Islamophobia, and another in 2023 against the pension reform. During these protests, there is much chatter on online social media platforms like Twitter. Methods In this study, we aim to analyze the differences between the online chatter of the 2 years through a network-centric view, and in particular the synchrony of users. This study begins by identifying groups of accounts that work together through two methods: temporal synchronicity and narrative similarity. We also apply a bot detection algorithm to identify bots within these networks and analyze the extent of inorganic synchronization within the discourse of these events. Results Overall, our findings suggest that the synchrony of users in 2020 on Twitter is much higher than that of 2023, and there are more bot activity in 2020 compared to 2023.
... Appropriate definitions of similarity measures are subjective and vary across different studies. A common choice is to focus on the temporal dimension, with the action time series of different accounts compared directly [33,34] or modeled using temporal point processes [35]. Other similarity measures focus on duplicated or partially matched text [36,37] or on shared retweets [38]. ...
Article
Full-text available
Malicious actors exploit social media to inflate stock prices, sway elections, spread misinformation, and sow discord. To these ends, they employ tactics that include the use of inauthentic accounts and campaigns. Methods to detect these abuses currently rely on features specifically designed to target suspicious behaviors. However, the effectiveness of these methods decays as malicious behaviors evolve. To address this challenge, we propose a language framework for modeling social media account behaviors. Words in this framework, called BLOC, consist of symbols drawn from distinct alphabets representing user actions and content. Languages from the framework are highly flexible and can be applied to model a broad spectrum of legitimate and suspicious online behaviors without extensive fine-tuning. Using BLOC to represent the behaviors of Twitter accounts, we achieve performance comparable to or better than state-of-the-art methods in the detection of social bots and coordinated inauthentic behavior.
... This was also investigated by Fan et al. (2020) and Cresci et al. (2019). Therefore, the detection and filtering of bot messages could improve the proposed method here (Alothali et al. (2018), Chavoshi et al. (2016)). Especially since Twitter, one of the most influential social media platforms was acquired and privatized, leaving the potential for the instrumentalization of these platforms for stock market manipulation. ...
Preprint
Full-text available
This research investigates the growing trend of retail investors participating in certain stocks by organizing themselves on social media platforms, particularly Reddit. Previous studies have highlighted a notable association between Reddit activity and the volatility of affected stocks. This study seeks to expand the analysis to Twitter, which is among the most impactful social media platforms. To achieve this, we collected relevant tweets and analyzed their sentiment to explore the correlation between Twitter activity, sentiment, and stock volatility. The results reveal a significant relationship between Twitter activity and stock volatility but a weak link between tweet sentiment and stock performance. In general, Twitter activity and sentiment appear to play a less critical role in these events than Reddit activity. These findings offer new theoretical insights into the impact of social media platforms on stock market dynamics, and they may practically assist investors and regulators in comprehending these phenomena better.
... Feature vectors generated in this way are then compared with one another via an Euclidean distance measure. Chavoshi et al. [22] developed an unsupervised method, named DeBot, which calculates cross-user activity correlations to detect bot accounts in Twitter. DeBot detects thousands of bots per day with a 94% precision and generates reports online everyday. ...
Article
Full-text available
Twitter is a web application playing the dual role of online social networking and micro-blogging. The popularity and open structure of Twitter have attracted a large number of automated programs, known as bots. In this article, we propose a Twitter bot detection model using recurrent neural networks, specifically bidirectional lightweight gated recurrent unit (BiLGRU), and linguistic embeddings. To the best of our knowledge, our Twitter bot detection model is the first that does not require any handcrafted features, or prior knowledge or assumptions about account profiles, friendship networks or historical behavior. The proposed model uses only textual content of tweets and linguistic embeddings to classify bot and human accounts on Twitter. Experimental results show that the proposed model performs better or comparably to state-of-the-art Twitter bot detection models while requiring no feature engineering, making it faster and easier to train and deploy in a real network. We also present experimental results that show the performance and computational costs of different types of linguistic embeddings and recurrence network variants for the task of Twitter bot detection. The results will potentially help researchers design high-performance deep-learning models for similar tasks.
... This unsupervised approach typically requires calculating the similarity of different accounts and subsequently clustering them into groups (Pacheco et al., 2021). Key signals include temporal activities (Chavoshi et al., 2016;Keller et al., 2017), common retweets (Nizzoli et al., 2021), and URLs shared in tweets (Pacheco et al., 2021;Giglietto et al., 2020). Definitions of the similarity measure vary across studies. ...
Preprint
Large language models (LLMs) exhibit impressive capabilities in generating realistic text across diverse subjects. Concerns have been raised that they could be utilized to produce fake content with a deceptive intention, although evidence thus far remains anecdotal. This paper presents a case study about a Twitter botnet that appears to employ ChatGPT to generate human-like content. Through heuristics, we identify 1,140 accounts and validate them via manual annotation. These accounts form a dense cluster of fake personas that exhibit similar behaviors, including posting machine-generated content and stolen images, and engage with each other through replies and retweets. ChatGPT-generated content promotes suspicious websites and spreads harmful comments. While the accounts in the AI botnet can be detected through their coordination patterns, current state-of-the-art LLM content classifiers fail to discriminate between them and human accounts in the wild. These findings highlight the threats posed by AI-enabled social bots.
... Indeed, bots typically work in a coordinated way and are not usually suspicious when considered individually. Hence, bot detection requires combining information about multiple bots and analyzing them together [11]. This often is very challenging, both conceptually as well as computationally, as it requires to consider at least a quadratic number of pairs of nodes. ...
Article
Full-text available
Users on social networks such as Twitter interact with each other without much knowledge of the real-identity behind the accounts they interact with. This anonymity has created a perfect environment for bot accounts to influence the network by mimicking real-user behaviour. Although not all bot accounts have malicious intent, identifying bot accounts in general is an important and difficult task. In the literature there are three distinct types of feature sets one could use for building machine learning models for classifying bot accounts. These feature-sets are: user profile metadata, natural language features (NLP) extracted from user tweets and finally features extracted from the the underlying social network. Profile metadata and NLP features are typically explored in detail in the bot-detection literature. At the same time less attention has been given to the predictive power of features that can be extracted from the underlying network structure. To fill this gap we explore and compare two classes of embedding algorithms that can be used to take advantage of information that network structure provides. The first class are classical embedding techniques, which focus on learning proximity information. The second class are structural embedding algorithms, which capture the local structure of node neighbourhood. We show that features created using structural embeddings have higher predictive power when it comes to bot detection. This supports the hypothesis that the local social network formed around bot accounts on Twitter contains valuable information that can be used to identify bot accounts.
... Over the years, many Twitter bots detection approaches have become available for researchers , most of which offer binary classifiers (Orabi et al., 2020). Popular tools such as DeBot (Chavoshi et al., 2016), Tweetbotornot, and "heavy automation" (Martini et al., 2021) use such binary criterion. Some attempts have been made recently to develop ways to detect specific types of bots. ...
Article
As concerns about social bots online increase, studies have attempted to explore the discourse they produce, and its effects on individuals and the public at large. We argue that the common reliance on aggregated scores of binary classifiers for bot detection may have yielded biased or inaccurate results. To test this possibility, we systematically compare the differences between non-bots and bots using binary and non-binary classifiers (classified into the categories of astroturf, self-declared, spammers, fake followers, and Other). We use two Twitter corpora, about COVID-19 vaccines (N = 1,697,280) and climate change (N = 1,062,522). We find that both in terms of volume and thematic content, the use of binary classifiers may hinder, distort, or mask differences between humans and bots, that could only be discerned when observing specific bot types. We discuss the theoretical and practical implications of these findings.
Conference Paper
Full-text available
While most online social media accounts are controlled by humans, these platforms also host automated agents called social bots or sybil accounts. Recent literature reported on cases of social bots imitating humans to manipulate discussions, alter the popularity of users, pollute content and spread misinformation, and even perform terrorist propaganda and recruitment actions. Here we present BotOrNot, a publicly-available service that leverages more than one thousand features to evaluate the extent to which a Twitter account exhibits similarity to the known characteristics of social bots. Since its release in May 2014, BotOrNot has served over one million requests via our website and APIs.
Conference Paper
Full-text available
We develop a technique to identify abnormally correlated user accounts in Twitter, which are very unlikely to be human operated. This new approach of bot detection considers cross-correlating user activities and requires no labeled data, as opposed to existing bot detection techniques that consider users independently, and require large amount of recently labeled data. Our system uses a lag-sensitive hashing technique and a warping-invariant correlation measure to quickly organize the user accounts in clusters of abnormally correlated accounts. Our method is 94% precise and detects unique bots that other methods cannot detect. Our system produces daily reports on bots at a rate of several hundred bots per day. The reports are available online for further analysis. https://youtu.be/1YFhbBsZ8zs
Technical Report
Full-text available
While most online social media accounts are controlled by humans, these platforms also host automated agents called social bots or sybil accounts. Recent literature reported on cases of social bots imitating humans to manipulate discussions, alter the popularity of users, pollute content and spread misinformation, and even perform terrorist propaganda and recruitment actions. Here we present BotOrNot, a publicly-available service that leverages more than one thousand features to evaluate the extent to which a Twitter account exhibits similarity to the known characteristics of social bots. Since its release in May 2014, BotOrNot has served over one million requests via our website and APIs.
Conference Paper
Full-text available
Can we identify patterns of temporal activities caused by human communications in social media? Is it possible to model these patterns and tell if a user is a human or a bot based only on the timing of their postings? Social media services allow users to make postings, generating large datasets of human activity time-stamps. In this paper we analyze time-stamp data from social media services and find that the distribution of postings inter-arrival times (IAT) is characterized by four patterns: (i) positive correlation between consecutive IATs, (ii) heavy tails, (iii) periodic spikes and (iv) bimodal distribution. Based on our findings, we propose Rest-Sleep-and-Comment (RSC), a generative model that is able to match all four discovered patterns. We demonstrate the utility of RSC by showing that it can accurately fit real time-stamp data from Reddit and Twitter. We also show that RSC can be used to spot outliers and detect users with non-human behavior, such as bots. We validate RSC using real data consisting of over 35 million postings from Twitter and Reddit. RSC consistently provides a better fit to real data and clearly outperform existing models for human dynamics. RSC was also able to detect bots with a precision higher than 94%.
Conference Paper
Full-text available
As social media is becoming an increasingly im-portant source of public information, companies, organizations and individuals are actively using social media platforms to promote their products, services, ideas and ideologies. Unlike promotional campaigns on TV or other traditional mass media platforms, campaigns on social media often appear in stealth modes. Campaign promoters often try to influence people's behaviors/opinions/decisions in a latent manner such that the readers are not aware that the messages they see are strategic campaign posts aimed at persuading them to buy target prod-ucts/services. Readers take such campaign posts as just organic posts from the general public. It is thus important to discover such campaigns, their promoter accounts and how the campaigns are organized and executed as it can uncover the dynamics of Internet marketing. This discovery is clearly useful for competitors and also the general public. However, so far little work has been done to solve this problem. In this paper, we study this important problem in the context of the Twitter platform. Given a set of tweets streamed from Twitter based on a set of keywords representing a particular topic, the proposed technique aims to identify user accounts that are involved in promotion. We formulate the problem as a relational classification problem and solve it using typed Markov Random Fields (T-MRF), which is proposed as a generalization of the classic Markov Random Fields. Our experiments are carried out using three real-life datasets from the health science domain related to smoking. Such campaigns are interesting to health scientists, government health agencies and related businesses for obvious reasons. Our results show that the proposed method is highly effective.
Article
Full-text available
Twitter is a new web application playing dual roles of online social networking and microblogging. Users communicate with each other by publishing text-based posts. The popularity and open structure of Twitter have attracted a large number of automated programs, known as bots, which appear to be a double-edged sword to Twitter. Legitimate bots generate a large amount of benign tweets delivering news and updating feeds, while malicious bots spread spam or malicious contents. More interestingly, in the middle between human and bot, there has emerged cyborg referred to either bot-assisted human or human-assisted bot. To assist human users in identifying who they are interacting with, this paper focuses on the classification of human, bot, and cyborg accounts on Twitter. We first conduct a set of large-scale measurements with a collection of over 500,000 accounts. We observe the difference among human, bot, and cyborg in terms of tweeting behavior, tweet content, and account properties. Based on the measurement results, we propose a classification system that includes the following four parts: 1) an entropy-based component, 2) a spam detection component, 3) an account properties component, and 4) a decision maker. It uses the combination of features extracted from an unknown user to determine the likelihood of being a human, bot, or cyborg. Our experimental evaluation demonstrates the efficacy of the proposed classification system.
Conference Paper
Full-text available
Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine truly massive time series for the first time. We demonstrate the following extremely unintuitive fact; in large datasets we can exactly search under DTW much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of all of the time series datasets considered in all data mining papers ever published. We show that our ideas allow us to solve higher-level time series data mining problem such as motif discovery and clustering at scales that would otherwise be untenable. In addition to mining massive datasets, we will show that our ideas also have implications for real-time monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible.
Conference Paper
As web services such as Twitter, Facebook, Google, and Yahoo now dominate the daily activities of Internet users, cyber criminals have adapted their monetization strategies to engage users within these walled gardens. To facilitate access to these sites, an underground market has emerged where fraudulent accounts - automatically generated credentials used to perpetrate scams, phishing, and malware - are sold in bulk by the thousands. In order to understand this shadowy economy, we investigate the market for fraudulent Twitter accounts to monitor prices, availability, and fraud perpetrated by 27 merchants over the course of a 10-month period. We use our insights to develop a classifier to retroactively detect several million fraudulent accounts sold via this marketplace, 95% of which we disable with Twitter's help. During active months, the 27 merchants we monitor appeared responsible for registering 10-20% of all accounts later flagged for spam by Twitter, generating $127-459K for their efforts.
Article
In this study, we examine the abuse of online social networks at the hands of spammers through the lens of the tools, techniques, and support infrastructure they rely upon. To perform our analysis, we identify over 1.1 million accounts suspended by Twitter for disruptive activities over the course of seven months. In the process, we collect a dataset of 1.8 billion tweets, 80 million of which belong to spam accounts. We use our dataset to characterize the behavior and lifetime of spam accounts, the campaigns they execute, and the wide-spread abuse of legitimate web services such as URL shorteners and free web hosting. We also identify an emerging marketplace of illegitimate programs operated by spammers that include Twitter account sellers, ad-based URL shorteners, and spam affiliate programs that help enable underground market diversification. Our results show that 77% of spam accounts identified by Twitter are suspended within on day of their first tweet. Because of these pressures, less than 9% of accounts form social relationships with regular Twitter users. Instead, 17% of accounts rely on hijacking trends, while 52% of accounts use unsolicited mentions to reach an audience. In spite of daily account attrition, we show how five spam campaigns controlling 145 thousand accounts combined are able to persist for months at a time, with each campaign enacting a unique spamming strategy. Surprisingly, three of these campaigns send spam directing visitors to reputable store fronts, blurring the line regarding what constitutes spam on social networks.