Conference PaperPDF Available

Under the Shadow of Sunshine: Characterizing Spam Campaigns Abusing Phone Numbers Across Online Social Networks


Abstract and Figures

Cybercriminals abuse Online Social Networks (OSNs) to lure victims into a variety of spam. Among different spam types, a less explored area is OSN abuse that leverages the telephony channel to defraud users. Phone numbers are advertized via OSNs, and users are tricked into calling these numbers. To expand the reach of such scam / spam campaigns, phone numbers are advertised across multiple platforms like Facebook, Twitter, GooglePlus, Flickr, and YouTube. In this paper, we present the first data-driven characterization of cross-platform campaigns that use multiple OSN platforms to reach their victims and use phone numbers for monetization. We collect 23M posts containing 1.8M unique phone numbers from Twitter, Facebook, GooglePlus, Youtube, and Flickr over a period of six months. Clustering these posts helps us identify 202 campaigns operating across the globe with Indonesia, United States, India, and United Arab Emirates being the most prominent originators. We find that even though Indonesian campaigns generate highest volume (3.2M posts), only 1.6% of the accounts propagating Indonesian campaigns have been suspended so far. By examining campaigns running across multiple OSNs, we discover that Twitter detects and suspends 93% more accounts than Facebook. Therefore, sharing intelligence about abuse-related user accounts across OSNs can aid in spam detection. According to our dataset, around 35K victims and 8.8M USD could have been saved if intelligence was shared across the OSNs. By analyzing phone number based spam campaigns running on OSNs, we highlight the unexplored variety of phone-based attacks surfacing on OSNs.
Content may be subject to copyright.
Under the Shadow of Sunshine: Characterizing Spam Campaigns
Abusing Phone Numbers Across Online Social Networks
Srishti Gupta
Dhruv Kuchhal
Payas Gupta
Mustaque Ahamad
Georgia Institute of Technology
Manish Gupta
Microsoft, India
Ponnurangam Kumaraguru
Cybercriminals abuse Online Social Networks (OSNs) to lure vic-
tims into a variety of spam. Among dierent spam types, a less
explored area is OSN abuse that leverages the telephony channel
to defraud users. Phone numbers are advertized via OSNs, and
users are tricked into calling these numbers. To expand the reach of
such scam / spam campaigns, phone numbers are advertised across
multiple platforms like Facebook, Twitter, GooglePlus, Flickr, and
YouTube. In this paper, we present the rst data-driven characteriza-
tion of cross-platform campaigns that use multiple OSN platforms
to reach their victims and use phone numbers for monetization.
We collect
23M posts containing
1.8M unique phone num-
bers from Twitter, Facebook, GooglePlus, Youtube, and Flickr over
a period of six months. Clustering these posts helps us identify
202 campaigns operating across the globe with Indonesia, United
States, India, and United Arab Emirates being the most promi-
nent originators. We nd that even though Indonesian campaigns
generate highest volume (
3.2M posts), only 1.6% of the accounts
propagating Indonesian campaigns have been suspended so far.
By examining campaigns running across multiple OSNs, we dis-
cover that Twitter detects and suspends
93% more accounts than
Facebook. Therefore, sharing intelligence about abuse-related user
accounts across OSNs can aid in spam detection. According to our
dataset, around
35K victims and
$8.8M could have been saved
if intelligence was shared across the OSNs. By analyzing phone
number based spam campaigns running on OSNs, we highlight the
unexplored variety of phone-based attacks surfacing on OSNs.
Information systems
Security and privacy Human and
societal aspects of security and privacy;
ACM Reference Format:
Srishti Gupta, Dhruv Kuchhal, Payas Gupta, Mustaque Ahamad, Manish
Gupta, and Ponnurangam Kumaraguru. 2018. Under the Shadow of Sunshine:
Characterizing Spam Campaigns Abusing Phone Numbers Across Online
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from
WebSci ’18, May 27–30, 2018, Amsterdam, Netherlands
©2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-5563-6/18/05. . . $15.00
Social Networks. In WebSci ’18: 10th ACM Conference on Web Science, May
27–30, 2018, Amsterdam, Netherlands. ACM, New York, NY, USA, 10 pages.
The increasing popularity of Online Social Networks (OSNs) has
attracted a cadre of criminals who craft large-scale phishing and
spam campaigns targeted against OSN users. Traditionally, spam-
mers have been driving trac to their websites by luring users to
click on URLs in their posts on OSNs [
]. A signicant frac-
tion of OSN spam research has looked at solutions driven by URL
blacklists [
], manual classication [
], and honeypots [
Since defence mechanisms against malicious / spam URLs have
already matured, cybercriminals are looking for other ways to en-
gage with users. Telephony has become a cost-eective medium
for such engagement, and phone numbers are now being used to
drive call trac to spammer operated resources (e.g., call centers,
Over-The-Top messaging applications like WhatsApp).
In this paper, we explore a data-driven approach to understand
OSN abuse that makes use of phone numbers as action tokens in
the realization / monetization phase of spam campaigns. Telephony
has turned out to be an eective tool for spammers because Internet
crime reports suggest that people fell victim to phone scams leading
to a loss of $7.4B in 2015 for Americans alone
. Specically, in the
phone-based abuse of OSNs, spammers advertise phone numbers
under their control via OSN posts and lure OSN users into calling
these numbers. Since spammers use phone calls to trap victims, it is
safe to assume that spammers would provide real phone numbers
under their control. In addition, advertising phone numbers reduce
spammers’ overhead of nding the set of potential victims who
can be targeted via the phone. Over phone conversations, they try
convincing the victims that their services are genuine, and deceive
them into making payments [
]. To maximize their reach and
impact, we observe that spammers disseminate similar content
across multiple OSNs.
While URLs help spammers attract victims to websites that
host malicious content, phone numbers provide more leverage
to spammers. Due to the inherent trust associated with the tele-
phony medium and the impact of human touch over phone calls,
spammers using phone numbers stand a better chance of convinc-
ing and hence are likely to make more impact. Besides, they can
use fewer phone numbers as compared to URLs; a large number
1 report-2017/
of URLs are required to evade ltering mechanisms incorporated
by OSNs.
Moreover, the monetization and advertising channel
in phone-based campaigns i.e., (Phone) and (Web) respectively is
dierent as compared to a single channel (Web) used in URL-based
campaigns. Hence, phone-based spam requires correlation of abuse
information across channels which makes it harder for OSN service
providers to build eective solutions. Since the modus operandi in
URL-based and phone-based spam campaigns is dierent, leaving
phone-based spams unexplored can limit OSN service providers’
ability to defend their users from spam. While extensive solutions
have been built to educate users about URL-based spam [
], lim-
ited education is available for phone-based attacks. This is evident
from several well publicized and long running Tech Support spam
campaigns (since 2008) that use phone numbers to lure victims lead-
ing to huge nancial losses in the past, as reported by the Federal
Bureau of Investigation [
]. Although detecting and avoiding OSN
abuse using phone numbers is so critical now, to the best of our
knowledge, this space is largely unexplored.
In this paper, we address this gap by taking the rst step in identi-
fying and characterizing spam campaigns that abuse phone numbers
across multiple OSNs. Studying phone-based spam across multiple
OSNs provides a new perspective and helps in understanding how
spammers work in coordination to increase their impact. From 22M
posts collected from Twitter, Facebook, GooglePlus, YouTube, and
Flickr, we identify 202 campaigns running across dierent coun-
tries, leveraging 806 unique abusive phone numbers. Studying these
campaigns, we make the following key observations:
We nd that the cross-platform phone based spam campaigns
originate from more than 16 countries, but most of them
come from Indonesia, United States of America (USA), In-
dia, and United Arab Emirates (UAE). These campaigns are
supported less number of phone numbers as compared to
URLs, perhaps due to (a) the high cost of acquiring a phone
number, and (b) weak defense mechanisms against phone -
based spam. Victims that fall prey to these campaigns are
oered banned lmography, personal products and a variety
of other services; but the services are not delivered even after
successful payment.
As reported in earlier research [
], we also nd evidence
that suggests spammers collude to maximize their reach ei-
ther by creating multiple accounts or promoting other spam-
mers’ content. To evade suspension strategies of each OSN,
spammers keep the volume per account low. Our results
show that accounts are suspended after being active for 33
days (on average); while literature suggests that spammers
involved in URL-based spam campaigns, on the other hand,
could survive only for three days after their rst post [
In addition, 68.7% of spammer accounts are never suspended.
Again, this suggests a crucial need to build eective solutions
to combat phone-based spam.
Our analysis also suggests that OSN service providers should
work together in the ght against phone-based spam cam-
paigns. By examining phone numbers involved in campaigns
across OSNs, we nd that although all OSNs are consistently
being abused, Twitter is the most preferred OSN for propa-
gating a phone campaign. By analyzing spammers’ multiple
identities across OSNs, we nd that Twitter is able to suspend
93.3% more accounts than Facebook. Thus, cross-platform
intelligence can be useful in preventing the onset and reduc-
ing the lifetime of a campaign on a particular network with
good accuracy. We estimate that cross-platform intelligence
can help protect 35,407 victims across OSNs, resulting in
potential savings of $8.8M.
Altogether, our results shed light on phone-based spam cam-
paigns where spammers are using one channel (OSN) to spread
their content, and the other channel (voice / SMS / message via
phone) to convince their victims to fall prey to their campaigns.
Given that no timely and eective lters exist on either channel to
combat such spam, there is an imperative need to build one.
Spam is a growing problem for OSNs, and several researchers have
looked at dierent ways to combat it. In this section, we present
prior research in detecting spam campaigns on OSNs.
Handling non-phone based spam:
There has been a large
body of work that reports the existence of spam on multiple OSNs
like YouTube [
], Twitter [
], and Facebook [
]. Thomas et al.
studied the characteristics of suspended accounts on Twitter [
With an in-depth analysis of several spam campaigns, they reported
that 77% spam accounts suspended by Twitter were taken down on
the day of their rst tweet. Apart from this, there has been work
done to dierentiate a spammer from a non-spammer [2, 4, 21, 34,
]. Lumezanu et al. studied the spread of URL campaigns on email
and Twitter and found that spam domains receive better coverage
when they appear both on Twitter and email [
]. In addition to
characterizing URL-based spam, methods have been proposed for
detecting [
] and preventing [
] such campaigns. While
a lot of work has been done on characterizing and detecting URL-
based spam campaigns, campaigns abusing phone numbers have
been largely ignored.
Handling phone based spam:
A large fraction of phone spam
includes robocalling and spoong, wherein spammers call the vic-
tims and trick them into giving personal or nancial information
Studies have shown that, in spam activities, phone numbers are
more stable over time than email, and hence can be more helpful in
identifying spammers [
]. Christin et al. analyzed a type of scam
targeting Japanese users, threatening to reveal the users’ browsing
history, in case they do not give them money [
]. In studies men-
tioned above, the authors relied on publicly available datasets to
perform their analyses. In contrast, we develop an infrastructure to
collect millions of posts from OSNs, cluster them into campaigns,
and conduct our analyses. Researchers have investigated phone
number abuse by analyzing cross-application features in Over-The-
Top applications [
], cross-channel SMS abuse [
], characterizing
spam campaigns on Twitter [
], and by characterizing honeypot
numbers [
]. Recently, Miramirkhani et al. studied the
Tech Support campaign that abuse phone numbers, from the per-
spective of domains that were used to host malicious content [
The authors also interacted with spammers to understand their
social engineering tactics. While they focused on URLs and do-
mains abused by spammers, we study the cross-platform spread of
phone-based spam campaigns across OSNs, along with strategies
adopted by spammers for sustainability and visibility. Besides, we
highlight how cross-platform intelligence about spam accounts can
be shared across OSNs to aid in spam detection.
In this section, we discuss our methodology for collecting phone
numbers, posts and other metadata; which we use later to nd
campaigns on OSNs. These campaigns are then tagged as benign
or spam. Figure 1 shows the architecture of our data collection sub-
system that is used to collect phone numbers across multiple OSNs.
We picked Twitter as the starting point to nd phone numbers,
as it provides easier access to large amounts of data as compared
to other online social networks [
]. We set up a framework to
collect a stream of tweets containing phone numbers. Some of the
keywords used in data collection and regular expressions used to
extract phone number from a text are listed in the Appendix 9.1. For
each unique phone number received every day, a query was made
to other OSNs viz. Facebook,
GooglePlus, Flickr, and YouTube,
and for every search, we stored the following details: user details
(user ID, screen name, number of followers and friends), post details
(time of publication, text, URL, number of retweets, likes, shares,
and reactions), and whether the ID were suspended. The data col-
lection ran over a period of six months, between April 25, 2016 and
October 26, 2016. Our system collected 22,690,601 posts containing
1,845,150 unique phone numbers, posted by 3,365,017 unique user
accounts on ve dierent OSNs. After removing noise (i.e., the
posts which do not contain a phone number), the ltered set was
used for nding campaigns.
Sample Feed
Data Expansion
Spam Campaigns
Figure 1: System Architecture for Data Collection across
Multiple OSNs.
We acknowledge that our dataset may contain two kinds of bias:
(1) Only 1% sample of all public tweets is available from the Twitter
Streaming API; it can underestimate the spam campaigns observed
on Twitter. (2) Since we treat Twitter as the starting point, we may
miss some campaigns which are popular on other social networks,
but not on Twitter. However, Twitter provides best access to user
posts, justifying our choice.
Acampaign is dened as a collection of posts made
by a set of users sharing similar text and phone numbers. To make
Collecting data from Facebook was challenging. In April 2015, Facebook deprecated
their post-search API end-point
, so we used an Android mobile OAuth token to
search content using the Graph API [17].
sure that we do not tag any benign campaign as spam, we ltered
out the phone numbers used by even one Twitter veried account.
Every phone number, say ph1, is represented by a set of frequent
unigram tokens which occur around the phone number. All posts
that contain at-least 33% tokens from the representative token set
are put together in a cluster; indicating posts related to the phone
number. Dierent phone numbers, say ph1 and ph2, are put together
in the same cluster if the average Jaccard coecient between the
corresponding set of posts is greater than 0.7. We calculated dif-
ferent values of Jaccard coecient and average silhouette scores
to measure quality of clusters [
], and found 0.7 as knee point for
corresponding value of silhouette score as 0.8. All users that post
about any phone number in the clustered set are put together. A
cluster thus formed is marked as a campaign. Using this method,
we found 22,390 campaigns in the dataset, collectively amounting
to 10.9M posts.
Spam Campaigns:
We ag a campaign as spam if it meets the
following criteria: (a) phone number involved in the campaign is
present in the United States Federal Trade Commission’s Do Not
Call (DNC) dataset
, or (b) even if one OSN account involved in the
campaign is suspended. Further, to be able to characterize the spam
campaigns in detail, we focused only on campaigns with at least
5000 posts. With this, we identied 6,171 out of 22,390 campaigns
as spam. From this set of campaigns, we did a manual inspection
to verify if the campaign is indeed spam. This results in a working
dataset of 202 campaigns comprising of
4.9M posts. During manual
inspection, we also assigned topics to the 202 campaigns, where
multiple campaigns could be assigned the same topic. For instance, a
campaign selling shoes and other selling jackets would be assigned
the topic – “Product Marketing".
In this section, we focus on the following research questions. Where
do spam campaigns originate from? Do spammers use automation
when posting phone numbers or answering “phone calls”? What
does a spammer OSN account suspension depend on? What is the
typical modus operandi of the spammers?
4.1 Where does Phone-based Spam Originate?
It is important to know from which countries does the spam origi-
nate; it can be used in developing anti-spam ltering solution. We
assume that the country associated with a phone number is the
source country. For the analysis, we need to extract the country
of the spam phone number. This is done either by identifying (a)
the language of the post containing the spam phone number via
the ‘lang’ eld in the tweet object, or (b) by the country code using
Google’s phone number library.
These two methods helped in
identifying countries for 127 campaigns. For rest of the campaigns,
we called up the top two frequently occurring phone numbers in the
campaign using Tropo
, a VoIP software that can be used to make
spoofed calls. We recorded all the calls and used Google’s Speech
to detect language and country of the campaign. We could
6 call-data
identify origin country for 26 more campaigns; for the remaining 49,
the country is unknown. Table 1 presents topic distribution across
various campaigns originating from dierent countries along with
the average number of posts being made in each campaign. While
majority of the spam was similar to advanced-fee scam
, where
spammers trick victims to make payments in advance, there were
certain dierent type of campaigns observed in the dataset as well:
Hacking (Tech Support) and Alternating Beliefs (Love Guru). In
the LoveGuru campaign, astrologers promise victims to x their
love and marriage related problems. In the Tech Support campaign,
spammers pose as technical support representatives or claim to be
associated with big technological companies (like Amazon, Google,
Microsoft, Quebec, Norton, Yahoo, Mcafee, Dell, HP, Apple, Adobe,
TrendMicro, and Comcast) and oer technical support xes.
Top four source countries selected by the volume of campaigns
viz. Indonesia, United States of America (USA), India, and United
Arab Emirates (UAE) show interesting characteristics. From Table 1,
we observe that there is a good overlap of campaign categories
across countries, while some countries have specic categories of
campaigns running. Among all the campaign categories, volume
generated by Indonesian campaigns is signicantly higher than any
other country.
4.2 Do Spammers use Automation?
While investigating further, we found that 99.3% pairs of consec-
utive posts related to the same campaign appeared on Twitter in
less than 10 minutes. Given that a major fraction of content ap-
peared within a few minutes, it is likely that content generation is
automated. To ascertain this, we looked at the information of the
client (provided by the Twitter API) used by spammers to interact
with the Twitter API or their web portal. We found that most of the
content was generated using ‘’, a popular bot service,
known to be used by spammers [
]. Apart from the bot service,
several other clients like RoundTeam (0.25%), IFFTT (0.03%), Buer
(0.017%), and Botize (0.016%), were used for Twitter. Besides, we
found that volume per phone number was also high in Indonesian
campaigns; 80% phone numbers had more than 1000 posts. One
would assume that volume per phone number would be low since
there are humans at the other end to service the requests. However,
by processing the text in the posts created in this campaign, we
found that spammers requested users to communicate via SMS or
WhatsApp (
71% posts). This explains why spammers would be
able to handle the load of interacting with victims. There are many
other advantages of using these messaging services – spammers
can further send phishing messages to victims, communicate with
them unmonitored, and potentially use automated bots to reply to
SMSs or Whatsapp messages.
4.3 What Factors Govern Spammers’
As expected, we nd that the visibility (number of likes, shares,
and retweets) of a post is positively correlated with the number of
posts (Pearson correlation coecient = 0.97). While this may sound
intuitive, the number of accounts that were suspended within a
campaign were not positively correlated with the number of posts.
Table 1: Distribution of Campaigns across Topics and Source
Countries. (#C denotes the number of campaigns).
Country Campaign Topics #C #Posts
Argentina Party Reservations
Chile Delivering Goods 1 6,691
Columbia Hotel Booking
Ghana Alternating Beliefs (Marriage, Anxiety) 2 12,825
Guatemala Product Marketing 1 8,821
Hotel Booking
Alternating Beliefs (Marriage, Anxiety)
Hacking(Tech Support)
Hotel Booking
Product Marketing
Alternating Beliefs (Marriage, Anxiety))
Purchasing Followers
Finance, Real Estate
Selling Adult Products
Kuwait Charity (Donation) 1 46,494
Mexico Pornography 1 8,204
Nigeria Alternating Beliefs (Marriage, Anxiety) 1 29,226
Pakistan Finance, Real Estate 1 16,058
Spain Charity (Donation) 1 14,311
UAE Escorts 5 69,263
Party Reservations
Product Marketing
Alternating Beliefs (Marriage, Anxiety)
UK Escorts
Charity (Donation)
Venezuela Hotel Booking
Free Games, Downloads
Party Reservations
Hotel Booking
Product Marketing
Free Games, Books, Downloads
Alternating Beliefs (Marriage, Anxiety)
Finance, Loans, Real Estate
Charity (donation)
We noticed that even though the volume generated by Indonesian
campaigns was 98.2% higher than Indian campaigns, the fraction of
users suspended in Indian campaigns was 85.6% higher. Further, we
observed that the account suspension is dependent on the nature
of campaigns; campaigns providing escort services or technical
support services had more accounts suspended.
Surprisingly, for similar escort service campaign running in two
dierent countries, USA and UAE, there was a signicant dierence
in the number of accounts suspended. Before concluding that the
country plays a major role in account suspension, we performed
detailed analysis as follows.
Figure 2: Comparison of campaigns running in the top 4
countries – Indonesia, USA, India, and UAE across dierent
campaign categories. While visibility that a post receives is
positively correlated with volume, account suspension in a
campaign is not. Escort service and Tech Support campaigns
had largest percentage of suspended accounts. The number
of users suspended is represented by * and # denotes the frac-
tion of posts getting visibility.
The number of posts generated by escort campaign running in
the USA (9,652) was lower than that running in UAE (69,263), but
55.6% user accounts were suspended in the USA in comparison to
only 9.1% accounts suspended in UAE. We looked at several reasons
which could potentially lead to account suspension – volume gen-
erated per user or URLs used in the posts. We noticed that volume
per user was higher for UAE users (Figure 3(a)), number of URLs
shared in UAE campaign was higher, and words used in both the
campaigns had a good overlap. Also, from Figure 3(b), we observed
that inter-arrival time between two consecutive posts made by all
the users in the USA (41s on an average) is lesser than that of posts
made in the UAE campaign (392s on an average).
(a) (b)
Figure 3: Comparing Escort service campaign in USA vs.
UAE. Even though volume generated per USA account is
lower than UAE accounts (a), inter-arrival time between two
consecutive posts in the USA is lesser which could be a po-
tential reason for suspension of accounts (b).
4.4 What is the Spammers’ Modus Operandi?
To ascertain the attack methodology the victims faced, we per-
formed an experiment after receiving our institute’s Institutional
Review Board (IRB) approval. Pretending to be a potential victim,
we called up phone numbers mentioned in campaigns selling adult
(Viagra) pills in USA and UAE. In Indonesia, we interacted with
spammers selling herbal products, and in India with those promot-
ing tech support and astrology services (providing solutions to
marriage and love problems). To avoid time zone conict, we called
the spammers in their local time of the day. Overall, we made 41
calls to dierent phone numbers from Indonesia, India, USA and
UAE. Apart from Indonesia, campaigns from other countries had
an IVR deployed, before reaching a spammer. We posit this can
help in load balancing between limited human resources on the
spammers’ end. Due to language limitation in Indonesia, spammers
preferred chatting over platforms like WhatsApp, where they were
extremely responsive.
The campaigns in USA and UAE were not limited by any deliv-
ery location; they had a usual delivery time of 2–4 weeks. These
campaigns were operating solely over the phone and had no option
of visiting an online portal to make the transaction. The attackers
condently asked for the credit card details over the phone even
though banks advise otherwise. Spammers from Indonesia told
that they would start delivery only after receiving the payment,
which was to be done via bank transfer. During the interactions,
spammers were persuasive in selling products by claiming their
products to be the best as compared to similar products in the mar-
ket. Tech support campaigns in India were providing service to
users remotely over the Internet and charged over call once the
issue was ‘xed’. The catch was that the spammers pretended that
there was a problem with the victims’ computer and then tried to
convince the victim to pay them to x it, as reported in several
. Another astrology based spam campaign running in
India tricked by promising to x users’ marriage and love related
problems within 48 hours
. We called 4 numbers in dierent In-
dian states. Interestingly, all the spammers had a similar way of
dealing with the problem, where they asked to send personal details
over WhatsApp.
It is evident that spammers running campaigns in dierent coun-
tries deploy similar mechanisms to let the victim reach them (posts
on social media), to set up the product / service delivery operation
(product delivery post payment and service delivery prior to pay-
ment), and model of payment (details transfer via phone, WhatsApp,
verbal). It is the product delivery operation that creates deliberate
confusion for a victim; intuitively, the delivery mechanism is similar
for benign campaigns. Spammers leverage the advantage of similar
delivery mechanisms, oer fake promises and later do not deliver.
In this section, we aim to answer the following research questions.
Are spam campaigns run in a cross-OSN manner? How does the
content cross-pollinate across OSNs? How do spammers maximize
visibility? To what extent OSNs are able to detect phone based
spam? Can existing intelligence on URL based spam be trivially
adapted to handle the growing phone based spam problem? Can
cross-platform intelligence help?
11 800-549-5301/2
vashikaran-fake- vashikaran-fraud-cheater-money-taker- l149781.html
5.1 Do Phone-based Spam Campaigns run in a
Cross-OSN Manner?
We observed that spam campaigns do not limit themselves to one
OSN and are rather present on multiple networks. The distribution
of posts across platforms in top 3 spam campaigns: Loveguru (from
Alternating Beliefs category), Tech Support, and Indonesian Herbal
Product (from Product Marketing category) is shown in Table 2.
Even though Twitter has the largest fraction (possibly thanks to
the rst data source bias in our data collection method), all OSNs
are abused to carry out spam campaigns.
Table 2: Top Cross-Platform Spam Campaigns
Campaign TW FB G+ YT FL
Tech Support 28,984 2,151 7,830 2,850 1,737
LoveGuru 6,934 1,418 4,257 101 63
Indonesia Herbal Product 1,443,619 9,238 21 46 336
Due to lack of space, in this section, we focus on studying in
detail the Tech Support campaign. The details for other campaigns
are available at http:// Tech support scams have
been around for a long period
,incurring nancial losses of $2.2M
to victims in 2016 alone, as reported by the US Federal Bureau
of Investigation (FBI) [
]. Earlier, attackers used to call victims
oering to x their computer or PC. Now, attackers have changed
their strategy; instead of calling victims, attackers oat their phone
numbers on OSNs and ask users to call them in case they need any
technical assistance related to their computers. Once the victim
calls the phone number, the attacker asks for remote access to
their machine to diagnose the problem. The attacker fudges the
expected problems with victim‘s machine and convinces her to get
it xed. The reason this campaign is identied as spam, is because
attackers deceive in believing that there exists some problem with
their PC and charge money in return. Previous work has focused
on the methods used by attackers to convince the victim and to
make money [
]. In this paper, we are interested in looking at the
cross-platform behavior of such tech support scam campaigns.
Over the course of six months of data collection, we got a total
of 43,552 posts spread across all the ve OSNs propagating to the
extent of 41 phone numbers. The complete dataset description for
tech support campaigns is shown in Table 3.
Table 3: Statistics for Tech Support Campaign
Features TW FB G+ YT FL
Total Posts 28,984 2,151 7,830 2,850 1,737
Posts with URLs 25,245 1,391 5,714 227 1,503
Distinct Phone Numbers 41 33 37 39 20
Distinct User IDs 748 289 360 433 79
Distinct Posts 16,142 1,797 6,570 2,050 1,449
Distinct URLs 68 951 3,189 80 293
As phone numbers are one of the primary tokens used by spam-
mers, we examined carrier information tied to each number to
identify what kind of phone numbers spammers use viz. landline,
mobile, VoIP, or toll-free). We derived this information from several
online services like Twilio (mobile carrier information)
, True-
caller (spam score assigned to the phone number)
, and HLR
lookups (current active location of the phone number).
We found
that all the phone numbers used in the Tech Support campaign were
toll-free numbers. Using a toll-free number oers several advan-
tages to a spammer: (1) increased credibility: it does not incur a cost
to the person calling, hence people perceive it to be legitimate, (2)
it provides international presence: spammers can be reached from
any part of the world. Further, we found that spammers used ser-
vices like ATL, Bandwidth, and, Wiltel Communications to obtain
these toll-free numbers and that a majority of them were registered
between 2014 and 2016.
5.2 How does Content Cross-pollinate?
Now, we answer the following question: Is a particular OSN preferred
to start the spread of a campaign? Is there a specic pattern in the
way spam propagates on dierent OSNs?
Figure 4(a) shows the temporal pattern of content across OSNs.
Note that our data collection was done over a period of six months
while a campaign may have existed before and / or after this period.
Hence, while the longest detected active time for a campaign in
our dataset is 186 days, the actual time may be greater. A majority
(a) Posts across OSNs (b) Inter-arrival Time of Posts
appearing on OSNs
Figure 4: Temporal properties of Tech Support Campaign
across OSNs – all OSNs are abused to spread the campaign
but volume is maximum on Twitter. Inter-arrival time be-
tween two consecutive posts is minimum for Twitter. Spam-
mers began to heavily abuse Flickr towards the end of our
data collection.
of these posts are densely packed into a small number of short
time bursts, while the entire campaign spans a much longer period.
Though the volume of content is signicantly higher on Twitter, all
OSNs are consistently being abused for propagation. Inter-arrival
time, i.e., the average time between two successive posts is observed
to be least on Twitter (308s), as shown in Figure 4(b). It is interesting
to note that a few campaigns on Flickr have an inter-arrival time
between two posts close to 1s, even though the average inter-arrival
time is highest on Flickr. As Figure 4(a) shows, the volume on
Flickr increased during the last few weeks of our data collection
period. We divided the inter-arrival time into two time windows;
rst 15 weeks, and last 11 weeks. We observed that the average
inter-arrival time in latter time window dropped from 9786s to
2543s which means spammers had started heavily abusing Flickr
to spread the Tech Support campaign. It is hard to ascertain the
motivation of the spammers in sending high volume content on
Twitter, but, we speculate one of the reasons could be the public
nature of the Twitter platform, as compared to closed OSNs like
Facebook. For all the phone numbers, we analyzed the appearance
of phone numbers on dierent OSNs, and the order in which they
appear, as reported in Table 4. For each network that is picked
Table 4: Distribution of phone numbers according to their
rst appearance amongst OSNs. Flickr is never chosen as a
starting point and there is no particular sequence in which
spam propagates across OSNs.
Starting OSN #Cases Most common sequence
Twitter (TW) 12 TW G+ YT
GooglePlus (G+) 10 G+ TW YT FB FL
Facebook (FB) 6 FB G+ TW YT
YouTube (YT) 13 YT G+ TW FB
as the starting point, we identied the most common sequence
in which phone numbers appeared subsequently on other OSNs.
We found that Flickr was never chosen as the starting OSN to
initiate the spread of a phone number. Further, we noticed that the
posts originating from YouTube took the maximum time to reach a
dierent OSN with an average inter-OSN time of 5 hours.
To summarize, we observed that all OSNs were abused to spread
the Tech Support campaign, and no particular OSN was preferred
to drive the campaign. In addition, there was no particular sequence
in which spam propagated across OSNs.
5.3 How do Spammers Maximize Visibility?
We observed various strategies adopted by spammers to increase
the dissemination of their posts. In this section, we discuss those
strategies and their eectiveness.
The Visibility of a post is dened as the action performed by
the user (consumer of the post) in terms of liking or sharing the
post, which accounts for traction a particular post received. For
each network, we dene the value of visibility as follows: number
of likes and reshares on Facebook, +1s and reshares on GooglePlus,
number of likes and retweets on Twitter, and video like count on
YouTube. We did not consider Flickr in our analysis since Flickr
API gives only the view count of the image posted on the plat-
form. A user only viewing an image cannot be assumed to be a
victim of the campaign. To calculate visibility in all scenarios, we
collected the likes / retweets, plus-oners / reshares, and likes from
Twitter, GooglePlus, and Facebook respectively using their APIs.
Apart from calculating values for each visibility attribute, we also
collected properties of the user accounts involved, i.e., the IDs of
user accounts involved in retweeting / liking / resharing the content.
Due to rate limiting constraints on each of the APIs, we could not
fetch visibility information daily. We collected this data six months
after our data collection period, as posts take time to reach their
audience. Due to this, (1) we might have missed information of
tweets posted by suspended accounts, and (2) our total visibility
values represent a lower bound.
To increase the visibility of content, we observed that the spam-
mers use the following tricks: 67% of posts contained hashtags (for
marketing [
], gaining followers [
]), 82.7% of posts contained
URLs (for increased engagement with potential victims), 12.1% of
posts contained short URLs (for obfuscating the destination of a
URL and getting user engagement analytics), and 72% of posts con-
tained photos (as visual content gathers more attention). We also
noticed collusion between accounts and cross-referenced posts to
increase the visibility of the campaign.
Cross-referenced posts:
We call a post cross-referenced if it
was posted to OSN X, but contains a URL redirecting to OSN Y. For
instance, a Twitter post containing a link ‘’ which would
redirect to a dierent OSN, Facebook. Spammers either direct vic-
tims to existing posts or to another prole which is propagating the
same campaign on a dierent OSN. In the Tech Support campaign,
we observed that 3.2% of Facebook posts redirected to YouTube,
and 1.78% of posts redirected from GooglePlus to YouTube.
Collusion between accounts:
In the Tech Support campaign,
we observed traces of collusion, i.e., spammers involved in a par-
ticular campaign, like / share each other’s posts on OSNs or like
their content to increase reachability. Collusion helps in cascading
information to other followers in the network.
We calculated the visibility received by all the posts after re-
moving likes / reshares / retweets by the colluders (i.e., accounts
spreading the campaign already present in the dataset). We noticed
that the posts containing the above-mentioned attributes (hashtags,
URLs, short URLs, photos, cross-referencing, and collusion) gar-
nered around ten times more visibility than posts not containing
them. Around 10% of the posts saw traces of collusion, contributing
to 20% of the total visibility. Maximum visibility (22.1% of total
visibility) was observed for posts containing hashtags. In addition,
we observed that a major chunk of visibility came from GooglePlus,
followed by Facebook. This shows that the audience targeted inu-
ences the visibility garnered by a particular campaign, as Google-
Plus is known to be consumed mostly by IT professionals 17 .
5.4 To what Extent OSNs Suspend User
To aid in the propagation of a campaign, spammers manage multi-
ple accounts to, garner a wider audience, withstand account sus-
pension, and in general increase the volume. Individual spammer
accounts can either use automated techniques to aggressively post
about a campaign or use hand-crafted messages. In this section, we
examine the behavior of user accounts behind the Tech Support
campaign. Spammers want to operate accounts in a stealth mode,
which requires individual accounts to post few posts. It costs eort
to get followers to a spam account, and the number of ‘inuential’
accounts owned by a spammer is limited. Thus, the spammer tends
to repeatedly use accounts to post content keeping volume low per
account (Figure 5(b)), while creating new accounts once in a while
(Figure 5(a)).
Long-lived user accounts:
During our data collection, we found
that 68.7% (1,305) of the accounts were never suspended or taken
down on any of the ve OSNs. This is in stark contrast to the URL
based campaigns [
], where the authors observed that 92% of
17 the-day-who- is-most- likely-to- use-google
(a) New users created from time to time
for campaign sustainability.
(b) Volume per user kept low to evade
Figure 5: New user accounts created from time to time and
volume per ID kept low, to avoid suspension in the Tech Sup-
port Campaign.
the user accounts were suspended within three days of their rst
tweet. To take into account delays in the OSNs’ account suspension
algorithm, we queried all the accounts six months after the data col-
lection to determine which accounts were deleted / suspended. This
process consists of a bulk query to each OSN’s API with the prole
ID of the account.
For each of these accounts, we looked at the
time stamp of the rst and last post within our dataset, after which
we assumed that the account was suspended immediately. Out of
the accounts which were suspended, around 35% of the accounts
were suspended within a day of their rst post; the longest lasting
account was active for 158 days, before nally getting suspended.
On an average, accounts got suspended after being active for 33
days. This is in clear contrast to users getting suspended within
three days for URL based spam campaigns, and thus, focused eorts
are needed to strengthen defense from evolving phone-based spam
5.5 Is Existing Intelligence based on URLs
Useful to Handle Phone-based Spam?
Apart from creating accounts to propagate content, and using phone
numbers to interact with victims, spammers also need a distinct
set of URLs to advertise. In this section, we look at the domains,
subdomains and URL shorteners used by spammers. Of all the posts,
we had 4,581 unique URLs and 594 distinct domains. Of all the URLs,
12.1% were shortened using; 3% of them received over 69,917
clicks (data collected from API), showing that the campaign
was fairly successful.
Given the prevalence of spam on OSNs, we examined the eec-
tiveness of existing blacklists to detect malicious domains. Speci-
cally, we used Google safe browsing
and Web of Trust (WOT)
to see if they were eective in agging domains as malicious. Web
of Trust categorizes the domains into several reputation buckets
along with the condence to assign a category. Please note that
one domain may be listed in multiple categories. We marked a do-
main as malicious if the domain appeared in any of the following
If the account is deleted / suspended, (a) Twitter redirects to
suspended, and returns error 404, (b) Youtube returns ‘user not found’, (c) Facebook
returns error 403 in case the account is suspended, (d) GooglePlus throws a ‘not found’
error, (e) Flickr responds with a ‘user not found’ error.
categories – negative (malware, phishing, scam, potentially illegal),
questionable (adult content). We checked the URLs and domains
even after six months of data collection since blacklists may be slow
in updating response to new spam sites. We marked a URL mali-
cious if it was listed as malicious either by Google safe browsing or
WOT. We checked these domains against the blacklists, nding that
10% of the domains were blacklisted by WOT, none by Google safe
browsing. Overall, we found that existing URL infrastructure was
ineective to blacklist URLs used in phone-based spam campaigns.
5.6 Can Cross-Platform Intelligence be used?
Given that existing URL infrastructure is ineective, we study if
cross-platform intelligence across OSNs can be used. To this end,
we look at the spam user proles across OSNs to gure out which
OSN is most eective in building the intelligence.
Homogeneous identity across OSNs:
Simply analyzing users’
previous posts might not be sucient, as users can switch between
multiple identities, making it hard for OSN service providers to
detect and block them. Moreover, spammers may appear legitimate
based on the small number of posts made by a single identity. The
challenge remains in analyzing the aggregate behavior of multiple
identities. To understand how user activity is correlated across
OSNs, we pose the question: do users have a unique identity on a
particular OSN or do they share identities across OSNs? Within the
same network, can we nd the same users sharing multiple identities?
To answer this, we looked at user identities across dierent OSNs
in aggregate (multiple identities of the same user across dierent
OSNs) and individual (multiple identities of the same user on a
single OSN) forms. If the same user has multiple identities, sharing
similar name or username, it is said to exhibit a homogeneous
identity. To dene user identity in a particular campaign, we used
two textual features: name and username [
]. Since networks like
YouTube and Google Plus do not provide the username, we restrict
matching to identities sharing the same name. We used Levenshtein
distance to nd similarity in usernames. LD(
) is the Levenshtein
edit distance between usernames
. Here, LD(
means the strings are identical, while LD(
) = 0 means they
are completely dierent. After manual verication by comparing
prole images across OSNs, we found users having LD
0.7 are
homogeneous identities. We found four cases where multiple user
identities were found for the same user within the same network,
and in 65 instances, multiple user identities were present for the
same user in more than two networks. Specically, we found 51
users sharing multiple identities across two dierent OSNs, and 10
users sharing multiple identities across 3 OSNs. We noticed that
these accounts shared same phone numbers across OSNs; some
accounts post more phone numbers that are part of tech support
We found that the total number of posts made by these accounts
was highest on GooglePlus (2696), followed by Twitter (1776), Face-
book (577), Flickr (387), and YouTube (323). Out of all the homo-
geneous identities, the following are the percentages of accounts
suspended on each OSN – Twitter (60%), YouTube (48%), Google-
Plus (32%) Flickr (33%), and Facebook (4%). Our data is insucient
to determine whether account suspension is due to dissemination
of content across OSNs or other unobserved spammers’ properties.
Notwithstanding, the association between user identities across
OSNs, strengthens the fact that sharing information about spam-
mer accounts across OSNs could help OSNs to detect spammers
Reducing nancial loss and victimization
: The actual num-
ber of users that are impacted depends on how many victims called
spammers and bought the products advertised by campaigns. Since
it is hard to get this data, we provide a rough estimate of the number
of victims falling for campaigns identied in our dataset. We nd
reputation of spammers in terms of their followers count on Twit-
ter, friends / page likes on Facebook, circle count on GooglePlus,
and subscriber count on Youtube. As these users have subscribed
to spammers to get more content, they are likely to fall for the
spam. Some of the users would be the ones who aren’t aware of
the campaign being spam, while some followers / friends could
be spammers themselves who have followed other spammers’ ac-
counts. We again collected this data after 6 months of our data
collection and recorded 637,573 followers on Twitter, 21,053 friends
on Facebook, 11,538 followers on GooglePlus, and 2,816 likes on
YouTube amounting to a total of 670,164 users. Please note that this
number is a lower bound, as we were not able to retrieve statistics
for suspended / deleted accounts. Assume that we transfer knowl-
edge from Twitter to other OSNs and prevent the onset of campaigns
on other OSNs, we analyzed how much money and victims could
be saved. Looking only at the friends, followers, and likers on Face-
book, GooglePlus, and YouTube respectively, we could save 35,407
(21,053 + 11,538 + 2,816) unique victims and $8.8M (35,407 * $290.9)
by transferring intelligence across OSNs. We used the average cost
of the Tech Support Spam to be $290.9 per victim, as reported by
Miramirkhani et al. [25].
In this section, we provide a synthesis of our evaluations and pro-
pose some recommendations to OSN service providers.
How spammers can be choked?
Phone numbers are a stable
resource for spam since spammers need to provide their real phone
numbers so that victims can reach out to them. A solution built
around phone numbers, therefore, would be more reliable in bring-
ing down spammers. As a countermeasure, there are two potential
mechanisms – a) phone blacklist and b) suspension of OSN accounts.
Aphone blacklist should be created, similar to URL blacklists, to
check if a phone number is involved in a spam / scam campaign.
Blacklisting a phone number would break the connecting link be-
tween victims and spammers, thus bringing down the spammers’
monetization infrastructure. However, it is dicult to create one,
because there are little identiable features associated with a phone
number as there are with URLs like landing page, some special
characters, domain typo-squatting, etc. Therefore, user suspension
which can be collected from OSNs can come to rescue. From this
research we established that the link between a phone number and
the spammer account is crucial. Thus, one can focus on removing
malicious users from user communities sharing the same phone
number. In this network of user accounts, some users would already
be suspended by OSNs. The labels can be recursively propagated
to other unknown nodes from the known suspended nodes using
several graph-based algorithms like Page Rank. Bringing down
the spammers propagating phone numbers would disintegrate the
entire campaign.
There exist some services, like Truecaller
and FTC’s do-not-
call complaint dataset
, which collect information about phone
numbers that spammers use to call victims (incoming spam commu-
nication). In this work, however, we demonstrated that spammers
advertise their phone numbers across OSNs, so that victims would
call them instead (outgoing spam communication). We found the
overlap between our collected phone numbers (associated with po-
tential spam campaigns) with the FTC (0.001%) and Truecaller (0.4%)
databases to be minimal. It is, therefore, imperative that solutions
also be built on outgoing spam communication.
Measuring Impact using Honeypots
.In this work, we focused
on using friends and followers of the user as a metric to measure
the impact; it might not capture the actual victims who fell for
those campaigns. As an alternative approach, one can simulate a
campaign; changing the phone number (say to phone number X)
and keeping the text intact. There are certain services like Twilio
that aid in making calls over the Internet, which can be used to
record the number of calls being made to phone number X. Spammer
networks are dense; to ensure that these simulated campaigns are
visible to a large OSN population, one can use Facebook Ads
or Twitter Ads
for campaign promotion as advertisements. We
believe this is a potential way to measure the impact of campaigns.
With the convergence of telephony and the Internet, the phone
channel has become an attractive target for spammers to exploit and
monetize spam conducted over the Internet. This paper presents
the rst large-scale study of cross-platform spam campaigns that
abuse phone numbers. We collect
23 million posts containing
million unique phone numbers from Twitter, Facebook, GooglePlus,
Youtube, and Flickr over a period of six months. We identied 202
campaigns running from all over the world with Indonesia, United
States, India, and the United Arab Emirates being the highest con-
tributors. We showed that even though Indonesian campaigns gen-
3.2 million posts, only 1.6% have been suspended so far.
However, the number of accounts suspended in a campaign is not
correlated with volume. Campaigns providing escort services and
technical support solutions had more account suspensions. After
interacting with spammers, we observed that they adopt tactics
similar to legitimate services, to convince victims. By examining
campaigns running across OSNs, we showed that Twitter could sus-
93% more accounts spreading spam as compared to Facebook.
Therefore, sharing intelligence about spam user accounts across
OSNs can aid in spam detection;
35K victims and $8.8M could be
saved based on exploratory analysis of our data. We acknowledge
that our validations on some possible explanations proposed in
this work may be not rigorous, due to diculties in thoroughly
obtaining spammers’ motivations. However, we believe that our
rst-of-its-kind analysis of these phenomena still provides great
do-not- call-data
value and opens new doors to understand the phone-based spam-
mer ecosystem across OSNs better.
Mustaque Ahamad’s participation in this research was supported
in part by US National Science Foundation (NSF) grant no. CNS-
1514035. We would like to thank members of Precog, IIIT-Delhi for
their valuable feedback; special thanks to Paridhi Jain.
Hélio Almeida, Dorgival Guedes, Wagner Meira, and Mohammed J Zaki. 2011. Is
there a best quality metric for graph clusters?. In Joint European Conference on
Machine Learning and Knowledge Discovery in Databases. Springer, 44–59.
Amit A Amleshwaram, Narasimha Reddy, Sandeep Yadav, Guofei Gu, and Chao
Yang. 2013. Cats: Characterizing automation of twitter spammers. In Communi-
cation Systems and Networks (COMSNETS), 2013 Fifth International Conference on.
IEEE, 1–10.
Marco Balduzzi, Payas Gupta, Lion Gu, Debin Gao, and Mustaque Ahamad.
2016. MobiPot: Understanding Mobile Telephony Threats with Honeycards. In
Proceedings of the 11th ACM SIGSAC Symposium on Information, Computer and
Communications Security (ASIA CCS ’16). ACM, New York, NY, USA.
Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida.
2010. Detecting spammers on twitter. In Collaboration, electronic messaging,
anti-abuse and spam conference (CEAS), Vol. 6. 12.
Fabrício Benevenuto, Tiago Rodrigues, Virgílio Almeida, Jussara Almeida, and
Marcos Gonçalves. 2009. Detecting spammers and content promoters in on-
line video social networks. In Proceedings of the 32nd international ACM SIGIR
conference on Research and development in information retrieval. ACM, 620–627.
[6] Juan Miguel Carrascosa, Roberto González, Rubén Cuevas, and Arturo Azcorra.
2013. Are trending topics useful for marketing. Proc. COSN (2013).
Nicolas Christin, Sally S Yanagihara, and Keisuke Kamataki. 2010. Dissecting
one click frauds. In Proceedings of the 17th ACM conference on Computer and
communications security. ACM, 15–26.
Zi Chu, Indra Widjaja, and Haining Wang. 2012. Detecting social spam campaigns
on twitter. In International Conference on Applied Cryptography and Network
Security. Springer, 455–472.
Andrei Costin, Jelena Isacenkova, Marco Balduzzi, Aurélien Francillon, and Da-
vide Balzarotti. 2013. The role of phone numbers in understanding cyber-crime
schemes. In Privacy, Security and Trust (PST), 2013 Eleventh Annual International
Conference on. IEEE, 213–220.
[10] Michalis Faloutsos. 2013. Detecting malware with graph-based methods: trac
classication, botnets, and facebook scams. In Proceedings of the 22nd International
Conference on World Wide Web. ACM, 495–496.
Hongyu Gao, Jun Hu, Christo Wilson, Zhichun Li, Yan Chen, and Ben Y Zhao.
2010. Detecting and characterizing social spam campaigns. In Proceedings of the
10th ACM SIGCOMM conference on Internet measurement. ACM, 35–47.
Saptarshi Ghosh, Bimal Viswanath, Farshad Kooti, Naveen Kumar Sharma, Gau-
tam Korlam, Fabricio Benevenuto, Niloy Ganguly, and Krishna Phani Gummadi.
2012. Understanding and combating link farming in the twitter social network. In
Proceedings of the 21st international conference on World Wide Web. ACM, 61–70.
Chris Grier, Kurt Thomas, Vern Paxson, and Michael Zhang. 2010. @ spam: the
underground on 140 characters or less. In Proceedings of the 17th ACM conference
on Computer and communications security. ACM, 27–37.
Payas Gupta, Mustaque Ahamad, Jonathan Curtis, Vijay Balasubramaniyan, and
Alex Bobotek. 2014. M3AAWG Telephony Honeypots: Benets and Deployment
Options. Technical Report.
Payas Gupta, Roberto Perdisci, and Mustaque Ahamad. 2018. Towards Measuring
the Role of Phone Numbers in Twitter-Advertised Spam. In Proceedings of the
13th ACM on Asia Conference on Computer and Communications Security (ASIA
CCS ’18). ACM, New York, NY, USA, 12.
Payas Gupta, Bharath Srinivasan, Vijay Balasubramaniyan, and Mustaque
Ahamad. 2015. Phoneypot: Data-driven Understanding of Telephony Threats.. In
Srishti Gupta, Payas Gupta, Mustaque Ahamad, and Ponnurangam Kumaraguru.
2016. Exploiting Phone Numbers and Cross-Application Features in Targeted
Mobile Attacks. In Proceedings of the 6th Workshop on Security and Privacy in
Smartphones and Mobile Devices. ACM, 73–82.
Jelena Isacenkova, Olivier Thonnard, Andrei Costin, Aurélien Francillon, and
David Balzarotti. 2014. Inside the scam jungle: A closer look at 419 scam email
operations. EURASIP Journal on Information Security 2014, 1 (2014), 4.
Ponnurangam Kumaraguru, Lorrie Faith Cranor, and Laura Mather. 2009. Anti-
phishing landing page: Turning a 404 into a teachable moment for end users.
Conference on Email and Anti-Spam (2009).
Kyumin Lee, James Caverlee, and Steve Webb. 2010. Uncovering social spammers:
social honeypots+ machine learning. In Proceedings of the 33rd international
ACM SIGIR conference on Research and development in information retrieval. ACM,
Kyumin Lee, Brian David Eo, and James Caverlee. 2011. Seven Months with the
Devils: A Long-Term Study of Content Polluters on Twitter.. In ICWSM.
Cristian Lumezanu and Nick Feamster. 2012. Observing common spam in Twitter
and email. In Proceedings of the 2012 ACM conference on Internet measurement
conference. ACM, 461–466.
Eva García Martín, Niklas Lavesson, and Mina Doroud. 2016. Hashtags and
followers. Social Network Analysis and Mining 6, 1 (2016), 1–15.
[24] Aude Marzuoli, Hassan A Kingravi, David Dewey, and Robert Pienta. 2016. Un-
covering the Landscape of Fraud and Spam in the Telephony Channel. In Machine
Learning and Applications (ICMLA), 2016 15th IEEE International Conference on.
IEEE, 853–858.
Najmeh Miramirkhani, Oleksii Starov, and Nick Nikiforakis. 2017. Dial One for
Scam: A Large-Scale Analysis of Technical Support Scams. In Proceedings of the
24th Network and Distributed System Security Symposium (NDSS).
Federal Bureau of Investigation. 2016. TECH SUPPORT SCAM - Federal Bureau
of Investigation. (June 2016).
Miles Osborne and Mark Dredze. 2014. Facebook, Twitter and Google Plus for
breaking news: Is there a winner?. In ICWSM.
Raphael Ottoni, Diego B Las Casas, Joao Paulo Pesce, Wagner Meira Jr, Christo
Wilson, Alan Mislove, and Virgílio AF Almeida. 2014. Of Pins and Tweets:
Investigating How Users Behave Across Image-and Text-Based Social Networks..
Md Sazzadur Rahman, Ting-Kai Huang, Harsha V Madhyastha, and Michalis
Faloutsos. 2012. Frappe: detecting malicious facebook applications. In Proceed-
ings of the 8th international conference on Emerging networking experiments and
technologies. ACM, 313–324.
Bharat Srinivasan, Payas Gupta, Manos Antonakakis, and Mustaque Ahamad.
2016. Understanding Cross-Channel Abuse with SMS-Spam Support Infras-
tructure Attribution. In European Symposium on Research in Computer Security.
Springer, 3–26.
Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna. 2010. Detecting
spammers on social networks. In Proceedings of the 26th Annual Computer Security
Applications Conference. ACM, 1–9.
Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, and Dawn Song. 2011. Design
and evaluation of a real-time url spam ltering service. In 2011 IEEE Symposium
on Security and Privacy. IEEE, 447–462.
Kurt Thomas, Chris Grier, Dawn Song, and Vern Paxson. 2011. Suspended
accounts in retrospect: an analysis of twitter spam. In Proceedings of the 2011
ACM SIGCOMM conference on Internet measurement conference. ACM, 243–258.
Alex Hai Wang. 2010. Don’t follow me: Spam detection in twitter. In Security
and Cryptography (SECRYPT), Proceedings of the 2010 International Conference on.
IEEE, 1–10.
Steve Webb, James Caverlee, and Calton Pu. 2008. Social Honeypots: Making
Friends With A Spammer Near You.. In CEAS.
Sarita Yardi, Daniel Romero, Grant Schoenebeck, et al
2009. Detecting spam in a
twitter network. First Monday 15, 1 (2009).
9.1 Regular Expressions for Data Collection
We used a curated list of 400 keywords like call, SMS, WhatsApp,
ring, contact, dial, reach etc to lter relevant tweets from Twit-
ter’s Streaming API. While extracting phone numbers from the
tweets, we encountered variations in representation of phone num-
bers, for instance the number 1-888-551-2881 can be represented as
1(888)551-2881, 1(888) 551-2881, 1.888.551.2881, or 1 888 551 2881
where all variations were being counted as dierent phone num-
bers. We ltered out this noise by post-processing the data, where
a couple of regular expressions were used to obtain a valid phone
number from the text obtained from each post are listed below:
1. ('(?<= )\d{6}-\d{3}(?= )|
2. ('(\d[\d ]{5,13}\d{2}) ')
3. ('\$ *\d+[\.]*\d+|\d+[\.]*\d+\$')
4. ('^\d+\s|\s\d+\s|\s\d+$')
... The recruitment dataset of 17,880 annotated job ads are collected and released for public. Srishti et al. [20] performed a study on large scale social media campaigns used to distribute the scammer phone numbers in the social media networks. They identified that 202 campaign groups actively posting phone numbers on the web and also mentioned that some social network (twitter) can red flag as spam campaign better than the other popular social media network (Facebook). ...
... Social engineering scammer identification is much more challenging than one can think of, as the scams involve intricate steps to trace out and multiple stakeholders involves [16] Nigerian check larger amount check The check scam protection discussion Only two ways of performing check scam mentioned Youngsam et al. [14] Nigerian Craigslist magnetic honeypot advertisements better understand Nigerian scammer patterns, tools, email usage etc. 10 groups responsible for most of the activity Aude et al. [17] Telephone scams Honeypot, Audio machine learning Able to group the scammers Identified one third of the calls are robocalls Ting-Fang et al. [15] Romance scams Simulated spam filter to track the scammers Discussed different types of romance scam Affiliated market scams using online dating apps Najmeh et al. [13] Technical support Discovering advertisement in web Automated discovery Still considered as most dangerous scam Youngsam et al. [18] Craigslist Rental Web crawling, automated responses Able to identify the scammer infrastructure used less than half of the scams Craigslist identified and removed from portal Vidros et al. [19] Recruitment fraud Automated web page crawling first dataset available in public ML based techniques used for detection Srishti et al. [20] Spam campaigns Automated crawling of webpages large scale study covering posts on multiple social networks Twitter can suspend spam accounts better than Facebook Yangyu et al. [21] Dating App scam Crawling dating apps in Android store Large scale analysis and fraud app detection The chatbot accounts influence the users buy premium Suarez et al. [22] Dating fraud ML classifier Achieved 96% correct identification Online profile based analysis efficient compared to bot based Agari [23] Romance scams Phishing emails -target divorced, farmer, disabled people Pastrana et al. [24] eWhoring scam Crawl Underground forum Pipeline framework to identify these scams Performed URL and image analysis during the investigation during the scam lifecycle. So, a common language and criteria is required to identify the scammers and reduce the scope of the scammer presence for catching them. ...
Full-text available
Social engineering scams (SES) has been existed since the adoption of the telecommunications by humankind. An earlier version of the scams include leveraging premium phone service to charge the consumers and service providers but not limited to. There are variety of techniques being considered to scam the people due to the advancements in digital data access capabilities and Internet technology. A lot of research has been done to identify the scammer methodologies and characteristics of the scams. However, the scammers finding new ways to lure the consumers and stealing their financial assets. An example would be a recent circumstance of Covid-19 unemployment, which was used as a weapon to scam the US citizens. These scams will not be stopping here, and will keep appearing with new social engineering strategies in the near future. So, to better prepare these kind of scams in ever-changing world, we describe the recent trends of various social engineering scams targeting the innocent people all over the world, who oversight the consequences of scams, and also give detailed description of recent social engineering scams including Covid scams. The social engineering scan threat model architecture is also proposed to map various scams. In addition, we discuss the case study of real-time gift card scam targeting various enterprise organization customers to steal their money and put the organization reputation in stake. We also provide recommendations to internet users for not falling a victim of social engineering scams. In the end, we provide insights on how to prepare/respond to the social engineering scams by following the security incident detection and response life cycle in enterprises.
... A study [21] investigated spammers' strategies to enhance their influence scores by following real users as well as each other on Twitter. Lastly, Gupta et al. [22] conducted a large-scale analysis study of spam campaigns on multiple social platforms that used telephone numbers to lure victims. ...
... To investigate malicious campaigns, a collection of accounts exhibiting spam-like behaviors such as sharing duplicate tweets and URLs were also gathered from trending topics in Saudi Arabia. Following that, we looked at the content and practices of these accounts, classifying individuals with similar or duplicate materials as a group, which is a sample strategy used by several previous studies [22], [28], [29]. The dataset eventually located two campaigns that stood out significantly from the rest of the collected groups. ...
Full-text available
Fake malicious accounts are one of the primary causes of the deterioration of social network content quality. Numerous such accounts are generated by attackers to achieve multiple nefarious goals, including phishing, spamming, spoof- ing, and promotion. These practices pose significant challenges regarding the availability of credible data that reflect real- world social media interactions. This has led to the development of various methods and approaches to combat spammers on social media networks. Previous studies, however, have almost exclusively focused on studying and identifying English-language spam profiles, whereas the problem of malicious Arabic-language accounts remains under-addressed in the literature. In this paper, therefore, we conduct a comprehensive investigation of malicious Arabic-language campaigns on Twitter. The study involves analyzing the accounts of these campaigns from several perspectives, including their number, content, social interaction graphs, lifespans, and day-to-day activities. In addition to expos- ing their spamming tactics, we find that these spam accounts are more successful in avoiding Twitter suspensions than has been previously reported in the literature.
... Moreover, 66% of males, in comparison to 48.9% of females, agree that they feel confident about who they are as a person. This suggests that a smaller number of females feel confident, which corresponds to the stereotype that females have lower self-esteem than males [22]. Figure 1 shows the overall analysis of frequency of each participant's tweets over the span of one week. ...
Full-text available
Distinct polarities of gender stereotypes ascertain that communicative styles demonstrated by men and women are fundamentally disparate. Numerous researches have established varying communicative styles and methods involved in interpersonal communication, predominantly in the analysis of conversational styles as well as etymological strategies. Nevertheless, the widespread of social media has contributed to a pivotal, fascinating shift in the utilization of lexes, encompassing less conventional gender-based articulacy and distinctness amongst youths within virtual setting. Drawing on the data from disseminated survey and purposive observations on sampled Twitter accounts, this study probed on the correlation between gender stereotypes, communicative styles and linguistic features, manifesting the aggressive, assertive, passive-aggressive and passive traits which are associated with gender-based, stereotypical communicative styles. The findings yielded dominating percentages of males against females in which males possess and exhibit all four traits of communicative styles whereas the observations revealed that both genders demonstrate passive-aggressive and assertive traits.
... Moreover, 66% of males, in comparison to 48.9% of females, agree that they feel confident about who they are as a person. This suggests that a smaller number of females feel confident, which corresponds to the stereotype that females have lower self-esteem than males [22]. Figure 1 shows the overall analysis of frequency of each participant's tweets over the span of one week. ...
Full-text available
Distinct polarities of gender stereotypes ascertain that communicative styles demonstrated by men and women are fundamentally disparate. Numerous researches have established varying communicative styles and methods involved in interpersonal communication, predominantly in the analysis of conversational styles as well as etymological strategies. Nevertheless, the widespread of social media has contributed to a pivotal, fascinating shift in the utilization of lexes, encompassing less conventional gender-based articulacy and distinctness amongst youths within virtual setting. Drawing on the data from disseminated survey and purposive observations on sampled Twitter accounts, this study probed on the correlation between gender stereotypes, communicative styles and linguistic features, manifesting the aggressive, assertive, passive-aggressive and passive traits which are associated with gender-based, stereotypical communicative styles. The findings yielded dominating percentages of males against females in which males possess and exhibit all four traits of communicative styles whereas the observations revealed that both genders demonstrate passive-aggressive and assertive traits. (PDF) Theorizing communicative styles on social media: an etymological shift. Available from: [accessed Aug 21 2019].
Conference Paper
Twitter trending hashtags are a primary feature, where users regularly visit to get news or chat with each other. However, this valuable feature has been abused by malicious campaigns that use Twitter hashtags to disseminate religious hatred, promote terrorist propaganda, distribute fake financial news, and spread healthcare rumours. In recent years, some health-related campaigns flooded Arabic trending hashtags in Twitter. These campaigns not only irritate users, but they also distribute malicious content. In this paper, a comprehensive empirical analysis of the ongoing health-related campaigns on Twitter Arabic hashtags is presented. After collecting and an-notating tweets posted by these campaigns, we qualitatively analyzed the characteristics and behaviours of these tweets. We seek to find out what makes some of the tweets posted by these campaigns difficult to detect. Two main findings were identified: (1) these campaigns exhibit some spamming activities, such as using bots and trolls, (2) they use unique hijacked accounts as adversarial examples to obfuscate detection. This study is the first to qualitatively analyze health-related campaigns on Twitter Arabic hashtags from security point of view. Our findings suggest that some of the tweets posted by these campaigns need to be treated as adversarial examples that have not only been crafted to evade detection but also to undermine the deployed detection system.
Conference Paper
Online Social Networks (OSNs) are platforms that have gained immense traction from society today. Social media has reshaped our social world and has been playing a pivotal role in sculpting our personal and professional goals. While it provides invaluable information to millions of individuals daily, it has also become one of the most popular places for spam campaigns. In this paper, we design an algorithm for the recognition of spam campaigns, specifically focusing on a phone-numbers based approach. We build a system for spam campaign recognition with an emphasis on phone numbers in the light of the malicious activity that is vandalizing our online experience. This research focuses on data extracted from monitoring the following social networking channels: Tumblr, Twitter, and Flickr. The paper serves as an analytical lens for spam posts accumulated over four months. Regular expressions are used for data cleaning to identify posts containing phone numbers. We collected over 18 million spam posts and filtered the spam-containing posts using regular expressions. Next, we used a Bayesian Model called Latent Dirichlet Allocation (LDA) to perform a statistical model for detecting the category of the posts. We further use the bag-of-words and the tf-idf means to this data and apply cosine similarity for the similarity measure.
Social networks have generated immense amounts of data that have been successfully utilized for research and business purposes. The approachability and immediacy of social media have also allowed ill-intentioned users to perform several harmful activities that include spamming, promoting, and phishing. These activities generate massive amounts of low-quality content that often exhibits duplicate, automated, inappropriate, or irrelevant content that subsequently affects users’ satisfaction and imposes a significant challenge for other social media-based systems. Several real-time systems were developed to tackle this problem by focusing on filtering a specific kind of low-quality content. In this paper, we present a fine-grained real-time classification approach to identify several types of low-quality tweets (i.e., phishing, promoting, and spam tweets) written in Arabic. The system automatically extracts textual features using deep learning techniques without relying on hand-crafted features that are often time-consuming to be obtained and are tailored for a single type of low-quality content. This paper also proposes a lightweight model that utilizes a subset of the textual features to identify spamming Twitter accounts in a real-time setting. The proposed methods are evaluated on a real-world dataset (40, 000 tweets and 1, 000 accounts), showing superior performance in both models with accuracy and F1-scores of 0.98. The proposed system classifies a tweet in less than five milliseconds and an account in less than a second.
Conference Paper
Full-text available
Smartphones have fueled a shift in the way we communicate with each other via Instant Messaging. With the convergence of Internet and telephony, new Over-The-Top (OTT) messaging applications (e.g., WhatsApp, Viber, WeChat etc.) have emerged as an important means of communication for millions of users. These applications use phone numbers as the only means of authentication and are becoming an attractive medium for attackers to deliver spam and carry out more targeted attacks. The universal reach of telephony along with its past trusted nature makes phone numbers attractive identifiers for reaching potential attack targets. In this paper, we explore the feasibility, automation, and scalability of a variety of targeted attacks that can be carried out by abusing phone numbers. These attacks can be carried out on different channels viz. OTT messaging applications, voice, e-mail, or SMS. We demonstrate a novel system that takes a phone number as an input, leverages information from applications like Truecaller and Facebook about the victim and his / her social network, checks the presence of phone number's owner (victim) on the attack channel (OTT messaging applications, voice, e-mail, or SMS), and finally targets the victim on the chosen attack channel. As a proof of concept, we enumerated through a random pool of 1.16 million phone numbers and demonstrated that targeted attacks could be crafted against the owners of 255,873 phone numbers by exploiting cross-application features. Due to the significantly increased user engagement via new mediums of communication like OTT messaging applications and ease with which phone numbers allow collection of pertinent information, there is a clear need for better protection of applications that rely on phone numbers.
Full-text available
We have conducted an analysis of data from 502,891 Twitter users and focused on investigating the potential correlation between hashtags and the increase of followers to determine whether the addition of hashtags to tweets produces new followers. We have designed an experiment with two groups of users: one tweeting with random hashtags and one tweeting without hashtags. The results showed that there is a correlation between hashtags and followers: on average, users tweeting with hashtags increased their followers by 2.88, while users tweeting without hashtags increased 0.88 followers. We present a simple, reproducible approach to extract and analyze Twitter user data for this and similar purposes.
Conference Paper
The telephony channel has become an attractive target for cyber criminals, who are using it to craft a variety of attacks. In addition to delivering voice and messaging spam, this channel is also being used to lure victims into calling phone numbers that are controlled by the attackers. One way this is done is by aggressively advertising phone numbers on social media (e.g., Twitter). This form of spam is then monetized over the telephony channel, via messages/calls made by victims. We refer to this type of attacks as outgoing phone communication (OPC) attacks. By collecting approximately 70M tweets containing over 5,786 phone numbers over a period of 14 months, we are able to measure properties of multiple spam campaigns, including well-known tech support scams. Our contributions include a novel data collection technique that amplifies tweets containing phone numbers, clustering of tweets that are part of a given OPC attack campaign, and brief analysis of particularly interesting campaigns. We also show that some of the campaigns we analyze appear to attempt to avoid account suspension by Twitter, by including reputable URLs in their tweets. In fact, we find that Twitter suspended only about 3.5% of the accounts that participated in the top 15 spam campaigns we measured. Our results not only demonstrate a new kind of abuse exploiting the telephony channel but also show the potential benefits of using phone numbers to fight spam on Twitter.
Conference Paper
Recent convergence of telephony with the Internet offers malicious actors the ability to craft cross-channel attacks that leverage both telephony and Internet resources. Bulk messaging services can be used to send unsolicited SMS messages to phone numbers. While the long-term properties of email spam tactics have been extensively studied, such behavior for SMS spam is not well understood. In this paper, we discuss a novel SMS abuse attribution system called CHURN. The proposed system is able to collect data about large SMS abuse campaigns and analyze their passive DNS records and supporting website properties. We used CHURN to systematically conduct attribution around the domain names and IP addresses used in such SMS spam operations over a five year time period. Using CHURN, we were able to make the following observations about SMS spam campaigns: (1) only 1 % of SMS abuse domains ever appeared in public domain blacklists and more than 94 % of the blacklisted domain names did not appear in such public blacklists for several weeks or even months after they were first reported in abuse complaints, (2) more than 40 % of the SMS spam domains were active for over 100 days, and (3) the infrastructure that supports the abuse is surprisingly stable. That is, the same SMS spam domain names were used for several weeks and the IP infrastructure that supports these campaigns can be identified in a few networks and a small number of IPs, for several months of abusive activities. Through this study, we aim to increase the situational awareness around SMS spam abuse, by studying this phenomenon over a period of five years.
Conference Paper
Over the past decade, the number of mobile phones has increased dramatically, overtaking the world population in October 2014. In developing countries like India and China, mobile subscribers outnumber traditional landline users and account for over 90% of the active population. At the same time, convergence of telephony with the Internet with technologies like VoIP makes it possible to reach a large number of telephone users at a low or no cost via voice calls or SMS (short message service) messages. As a consequence, cybercriminals are abusing the telephony channel to launch attacks, e.g., scams that offer fraudulent services and voice-based phishing or vishing, that have previously relied on the Internet. In this paper, we introduce and deploy the first mobile phone honeypot called MobiPot that allow us to collect fraudulent calls and SMS messages. We implement multiple ways of advertising mobile numbers (honeycards) on MobiPot to investigate how fraudsters collect phone numbers that are targeted by them. During a period of over seven months, MobiPot collected over two thousand voice calls and SMS messages, and we confirmed that over half of them were unsolicited. We found that seeding honeycards enables us to discover attacks on the mobile phone numbers which were not known before.
Twitter is widely seen as being the go to place for breaking news. Recently however, competing Social Media have begun to carry news. Here we examine how Facebook, Google Plus and Twitter report on breaking news.We consider coverage (whether news events are reported) and latency (the time when they are reported). Using data drawn from three weeks in December 2013, we identify 29 major news events, ranging from celebrity deaths, plague outbreaks to sports events. We find that all media carry the same major events, but Twitter continues to be the preferred medium for breaking news, almost consistently leading Facebook or Google Plus. Facebook and Google Plus largely repost newswire stories and their main research value is that they conveniently package multitple sources of information together.. Copyright © 2014, Association for the Advancement of Artificial Intelligence ( All rights reserved.
Conference Paper
With 20 million installs a day, third-party apps are a major reason for the popularity and addictiveness of Facebook. Unfortunately, hackers have realized the potential of using apps for spreading malware and spam. The problem is already significant, as we find that at least 13% of apps in our dataset are malicious. So far, the research community has focused on detecting malicious posts and campaigns. In this paper, we ask the question: given a Facebook application, can we determine if it is malicious? Our key contribution is in developing FRAppE---Facebook's Rigorous Application Evaluator---arguably the first tool focused on detecting malicious apps on Facebook. To develop FRAppE, we use information gathered by observing the posting behavior of 111K Facebook apps seen across 2.2 million users on Facebook. First, we identify a set of features that help us distinguish malicious apps from benign ones. For example, we find that malicious apps often share names with other apps, and they typically request fewer permissions than benign apps. Second, leveraging these distinguishing features, we show that FRAppE can detect malicious apps with 99.5% accuracy, with no false positives and a low false negative rate (4.1%). Finally, we explore the ecosystem of malicious Facebook apps and identify mechanisms that these apps use to propagate. Interestingly, we find that many apps collude and support each other; in our dataset, we find 1,584 apps enabling the viral propagation of 3,723 other apps through their posts. Long-term, we see FRAppE as a step towards creating an independent watchdog for app assessment and ranking, so as to warn Facebook users before installing apps.