Conference PaperPDF Available

What Happens After You Are Pwnd: Understanding The Use Of Leaked Webmail Credentials In The Wild

Authors:

Abstract and Figures

Cybercriminals steal access credentials to online accounts and then misuse them for their own profit, release them publicly, or sell them on the underground market. Despite the importance of this problem, the research community still lacks a comprehensive understanding of what these stolen accounts are used for. In this paper, we aim to shed light on the modus operandi of miscreants accessing stolen Gmail accounts. We developed an infrastructure that is able to monitor the activity performed by users on Gmail accounts, and leaked credentials to 100 accounts under our control through various means, such as having information-stealing malware capture them, leaking them on public paste sites, and posting them on underground forums. We then monitored the activity recorded on these accounts over a period of 7 months. Our observations allowed us to devise a taxonomy of malicious activity performed on stolen Gmail accounts, to identify differences in the behavior of cybercriminals that get access to stolen accounts through different means, and to identify systematic attempts to evade the protection systems in place at Gmail and blend in with the legitimate user activity. This paper gives the research community a better understanding of a so far understudied, yet critical aspect of the cybercrime economy.
Content may be subject to copyright.
What Happens After You Are Pwnd: Understanding
The Use Of Leaked Webmail Credentials In The Wild
Jeremiah Onaolapo, Enrico Mariconti, and Gianluca Stringhini
University College London
{j.onaolapo, e.mariconti, g.stringhini}@cs.ucl.ac.uk
ABSTRACT
Cybercriminals steal access credentials to webmail ac-
counts and then misuse them for their own profit, re-
lease them publicly, or sell them on the underground
market. Despite the importance of this problem, the
research community still lacks a comprehensive under-
standing of what these stolen accounts are used for. In
this paper, we aim to shed light on the modus operandi
of miscreants accessing stolen Gmail accounts. We de-
veloped an infrastructure that is able to monitor the ac-
tivity performed by users on Gmail accounts, and leaked
credentials to 100 accounts under our control through
various means, such as having information-stealing mal-
ware capture them, leaking them on public paste sites,
and posting them on underground forums. We then
monitored the activity recorded on these accounts over
a period of 7 months. Our observations allowed us to
devise a taxonomy of malicious activity performed on
stolen Gmail accounts, to identify differences in the be-
havior of cybercriminals that get access to stolen ac-
counts through different means, and to identify system-
atic attempts to evade the protection systems in place
at Gmail and blend in with the legitimate user activity.
This paper gives the research community a better un-
derstanding of a so far understudied, yet critical aspect
of the cybercrime economy.
Categories and Subject Descriptors
J.4 [Computer Applications]: Social and Behavioral
Sciences; K.6.5 [Security and Protection]: Unautho-
rized Access
Permission to make digital or hard copies of all or part of this work for personal
or classroom use is granted without fee provided that copies are not made or
distributed for profit or commercial advantage and that copies bear this notice
and the full citation on the first page. Copyrights for components of this work
owned by others than ACM must be honored. Abstracting with credit is per-
mitted. To copy otherwise, or republish, to post on servers or to redistribute to
lists, requires prior specific permission and/or a fee. Request permissions from
permissions@acm.org.
IMC 2016, November 14-16, 2016, Santa Monica, CA, USA
c
2016 ACM. ISBN 978-1-4503-4526-2/16/11. . . $15.00
DOI: http://dx.doi.org/10.1145/2987443.2987475
Keywords
Cybercrime, Webmail, Underground Economy, Malware
1. INTRODUCTION
The wealth of information that users store in web-
mail accounts on services such as Gmail, Yahoo! Mail,
or Outlook.com, as well as the possibility of misusing
them for illicit activities has attracted cybercriminals,
who actively engage in compromising such accounts.
Miscreants obtain the credentials to victims’ online ac-
counts by performing phishing scams [17], by infect-
ing users with information-stealing malware [29], or by
compromising large password databases, leveraging the
fact that people often use the same password across
multiple services [16]. Such credentials can be used by
the cybercriminal privately, or can then be sold on the
black market to other cybercriminals who wish to use
the stolen accounts for profit. This ecosystem has be-
come a very sophisticated market in which only vetted
sellers are allowed to join [30].
Cybercriminals can use compromised accounts in mul-
tiple ways. First, they can use them to send spam [18].
This practice is particularly effective because of the
established reputation of such accounts: the already-
established contacts of the account are likely to trust
its owner, and are therefore more likely to open the
messages that they receive from her [20]. Similarly, the
stolen account is likely to have a history of good be-
havior with the online service, and the malicious mes-
sages sent by it are therefore less likely to be detected
as spam, especially if the recipients are within the same
service (e.g., a Gmail account used to send spam to
other Gmail accounts) [33]. Alternatively, cybercrim-
inals can use the stolen accounts to collect sensitive
information about the victim. Such information can
include financial credentials (credit card numbers, bank
account numbers), login information to other online ser-
vices, and personal communications of the victim [13].
Despite the importance of stolen accounts for the
underground economy, there is surprisingly little work
on the topic. Bursztein et al. [13] studied the modus
operandi of cybercriminals collecting Gmail account cre-
dentials through phishing scams. Their paper shows
that criminals access these accounts to steal financial
information from their victims, or use these accounts to
send fraudulent emails. Since their work only focused
on one possible way used by criminals to steal user login
credentials, it leaves questions unanswered on how gen-
eral their observations are compared to credentials ac-
quired through other means. Most importantly, [13] re-
lies on proprietary information from Google, and there-
fore it is not possible for other researchers to replicate
their results or build on top of their work.
Other researchers did not attempt studying the activ-
ity of criminals on compromised online accounts because
it is usually difficult to monitor what happens to them
without being a large online service. The rare excep-
tions are studies that look at information that is pub-
licly observable, such as the messages posted on Twitter
by compromised accounts [18,19].
To close this gap, in this paper we present a system
that is able to monitor the activity performed by at-
tackers on Gmail accounts. To this end, we instrument
the accounts using Google Apps Script [1]; by doing so,
we were able to monitor any time an email was opened,
favorited, sent, or a new draft was created. We also
monitor the accesses that the accounts receive, with
particular attention to their system configuration and
their origin. We call such accounts honey accounts.
We set up 100 honey accounts, each resembling the
Gmail account of the employee of a fictitious company.
To understand how criminals use these accounts af-
ter they get compromised, we leaked the credentials to
such accounts on multiple outlets, modeling the differ-
ent ways in which cybercriminals share and get access to
such credentials. First, we leaked credentials on paste
sites, such as pastebin [5]. Paste sites are commonly
used by cybercriminals to post account credentials after
data breaches [2]. We also leaked them to underground
forums, which have been shown to be the place where
cybercriminals gather to trade stolen commodities such
as account credentials [30]. Finally, we logged in to
our honey accounts on virtual machines that were pre-
viously infected with information stealing malware. By
doing this, the credentials will be sent to the cybercrim-
inal behind the malware’s command and control infras-
tructure, and will then be used directly by her or placed
on the black market for sale [29]. We know that there
are other outlets that attackers use, for instance, phish-
ing and data breaches, but we decided to focus on paste
sites, underground forums, and malware in this paper.
We worked in close collaboration with the Google anti-
abuse team, to make sure that any unwanted activity by
the compromised accounts would be promptly blocked.
The accounts were configured to send any email to a
mail server under our control, to prevent them from
successfully delivering spam.
After leaking our credentials, we recorded any inter-
action with our honey accounts for a period of 7 months.
Our analysis allowed us to draw a taxonomy of the dif-
ferent actions performed by criminals on stolen Gmail
accounts, and provided us interesting insights on the
keywords that criminals typically search for when look-
ing for valuable information on these accounts. We
also show that criminals who obtain access to stolen ac-
counts through certain outlets appear more skilled than
others, and make additional efforts to avoid detection
from Gmail. For instance, criminals who steal account
credentials via malware make more efforts to hide their
identity, by connecting from the Tor network and dis-
guising their browser user agent. Criminals who obtain
access to stolen credentials through paste sites, on the
other hand, tend to connect to the accounts from lo-
cations that are closer to the typical location used by
the owner of the account, if this information is shared
with them. At the lowest level of sophistication are
criminals who browse free underground forums looking
for free samples of stolen accounts: these individuals do
not take significant measures to avoid detection, and
are therefore easier to detect and block. Our findings
complement what was reported by previous work in the
case of manual account hijacking [13], and show that
the modus operandi of miscreants varies considerably
depending on how they obtain the credentials to stolen
accounts.
In summary, this paper makes the following contri-
butions:
We developed a system to monitor the activity of
Gmail accounts. We publicly release the source
code of our system, to allow other researchers to
deploy their own Gmail honey accounts and fur-
ther the understanding that the security commu-
nity has of malicious activity on online services. To
the best of our knowledge, this is the first publicly
available Gmail honeypot infrastructure.
We deployed 100 honey accounts on Gmail, and
leaked credentials through three different outlets:
underground forums, public paste sites, and vir-
tual machines infected with information-stealing
malware.
We provide detailed measurements of the activ-
ity logged by our honey accounts over a period of
7 months. We show that certain outlets on which
credentials are leaked appear to be used by more
skilled criminals, who act stealthy and actively at-
tempt to evade detection systems.
2. BACKGROUND
Gmail accounts. In this paper we focus on Gmail
accounts, with particular attention to the actions per-
formed by cybercriminals once they obtain access to
someone else’s account. We made this choice over other
webmail platforms because Gmail allows users to set up
scripts that augment the functionality of their accounts,
and it was therefore the ideal platform for developing
webmail–based honeypots. To ease the understanding
of the rest of the paper, we briefly summarize the capa-
bilities offered by webmail accounts in general, and by
Gmail in particular.
In Gmail, after logging in, users are presented with a
view of their Inbox. The inbox contains all the emails
that the user received, and highlights the ones that have
not been read yet by displaying them in boldface font.
Users have the option to mark emails that are important
to them and that need particular attention by starring
them. Users are also given a search functionality, which
allows them to find emails of interest by typing related
keywords. They are also given the possibility to orga-
nize their email by placing related messages in folders,
or assigning them descriptive labels. Such operations
can be automated by creating rules that automatically
process received emails. When writing emails, content
is saved in a Drafts folder until the user decides to send
it. Once this happens, sent emails can be found in a
dedicated folder, and they can be searched similarly to
what happens for received emails.
Threat model. Cybercriminals can get access to ac-
count credentials in many ways. First, they can per-
form social engineering-based scams, such as setting up
phishing web pages that resemble the login page of pop-
ular online services [17] or sending spearphishing emails
pretending to be members of customer support teams at
such online services [32]. As a second way of obtaining
user credentials, cybercriminals can install malware on
victim computers and configure it to report back any
account credentials issued by the user to the command
and control server of the botnet [29]. As a third way of
obtaining access to user credentials, cybercriminals can
exploit vulnerabilities in the databases used by online
services to store them [6].
User credentials can also be obtained illegitimately
through targeted online password guessing techniques [36],
often aided by the problem of password reuse across
various online services [16]. Finally, cybercriminals can
steal user credentials and access tokens by running net-
work sniffers [14] or mounting Man-in-the-Middle [11]
attacks against victims.
After stealing account credentials, a cybercriminal
can either use them privately for their own profit, re-
lease them publicly, or sell them on the underground
market. Previous work studied the modus operandi of
cybercriminals stealing user accounts through phishing
and using them privately [13]. In this work, we study
a broader threat model in which we mimic cybercrimi-
nals leaking credentials on paste sites [5] as well as mis-
creants advertising them for sale on underground fo-
rums [30]. In particular, previous research showed that
cybercriminals often offer a small number of account
credentials for free to test their “quality” [30]. We fol-
lowed a similar approach, pretending to have more ac-
counts for sale, but never following up to any further
inquiries. In addition, we simulate infected victim ma-
chines in which malware steals the user’s credentials and
sends them to the cybercriminal. We describe our setup
and how we leaked account credentials on each outlet
in detail in Section 3.2.
3. METHODOLOGY
Our overall goal was to gain a better understanding
of malicious activity in compromised webmail accounts.
To achieve this goal, we developed a system able to
monitor accesses and activity on Gmail accounts. We
set up accounts and leaked them through different out-
lets. In the following sections, we describe our system
architecture and experiment setup in detail.
3.1 System overview
Our system comprises two components, namely, honey
accounts and a monitor infrastructure.
Honey accounts. Our honey accounts are webmail ac-
counts instrumented with Google Apps Script to mon-
itor activity in them. Google Apps Script is a cloud-
based scripting language based on JavaScript, designed
to augment the functionality of Gmail accounts and
Google Drive documents, in addition to building web
apps [4]. The scripts we embedded in the honey
accounts send notifications to a dedicated webmail ac-
count under our control whenever an email is opened,
sent, or“starred.” In addition, the scripts send us copies
of all draft emails created in the honey accounts. We
also added a “heartbeat message”function, to send us a
message once a day from each honey account, to attest
that the account was still functional and had not been
blocked by Google.
In each honey account, we hid the script in a Google
Docs spreadsheet. We believe that this measure makes
it unlikely for attackers to find and delete our scripts.
To minimize abuse, we changed each honeypot account’s
default send-from address to an email address pointing
to a mailserver under our control. All emails sent from
the honeypot accounts are delivered to the mailserver,
which simply dumps the emails to disk and does not
forward them to the intended destination.
Monitoring infrastructure. Google Apps Scripts are
quite powerful, but they do not provide enough informa-
tion in some cases. For example, they do not provide lo-
cation information and IP addresses of accesses to web-
mail accounts. To track those accesses, we set up exter-
nal scripts to drive a web browser and periodically login
into each honey account and record information about
visitors (cookie identifier, geolocation information, and
times of accesses, among others). The scripts navigate
to the visitor activity page in each honey account, and
dump the pages to disk, for offline parsing. By col-
lecting information from the visitor activity pages, we
obtain location and system configuration information of
accesses, as provided by Google’s geolocation and sys-
tem fingerprinting system.
We believe that our honey account and monitoring
framework unleashes multiple possibilities for researchers
who want to further study the behavior of attackers in
webmail accounts. For this reason, we release the source
code of our system1.
3.2 Experiment setup
As part of our experiments, we first set up a number
of honey accounts on Gmail, and then leaked through
multiple outlets used by cybercriminals.
Honey account setup. We created 100 Gmail ac-
counts and assigned them random combinations of pop-
ular first and last names, similar to what was done
in [31]. Creating and setting up these accounts is a
manual process. Google also rate-limits the creation
of new accounts from the same IP address by presenting
a phone verification page after a few accounts have been
created. These factors imposed limits on the number of
honey accounts we could set up in practice.
We populated the freshly-created accounts with emails
from the public Enron email dataset [22]. This dataset
contains the emails sent by the executives of the en-
ergy corporation Enron, and was publicly released as
evidence for the bankruptcy trial of the company. This
dataset is suitable for our purposes, since the emails
that it contains are the typical emails exchanged by
corporate users. To make the honey accounts believ-
able and avoid raising suspicion from cybercriminals
accessing them, we mapped distinct recipients in the
Enron dataset to our fictional characters (i.e., the ficti-
tious “owners” of the honey accounts), and replaced the
original first names and last names in the dataset with
our honey first names and last names. In addition, we
changed all instances of“Enron” to a fictitious company
name that we came up with.
In order to have realistic email timestamps, we trans-
lated the old Enron email timestamps to recent times-
tamps slightly earlier than our experiment start date.
For instance, given two email timestamps t1and t2in
the Enron dataset such that t1is earlier than t2, we
translate them to more recent timestamps T1and T2
such that T1is earlier than T2. We then schedule those
particular emails to be sent to the recipient honey ac-
counts at times T1and T2respectively. We sent be-
tween 200 – 300 emails from the Enron dataset to each
honey account in the process of populating them.
Leaking account credentials. To achieve our objec-
tives, we had to entice cybercriminals to interact with
our account honeypots while we logged their accesses.
We selected paste sites and underground forums as ap-
propriate venues for leaking account credentials, since
they tend to be misused by cybercriminals for dissem-
ination of stolen credentials. In addition, we leaked
some credentials through malware, since this is a popu-
lar way by which professional cybercriminals steal cre-
dentials and compromise accounts [10]. We divided
the honeypot accounts in groups and leaked their cre-
dentials in different locations, as shown in Table 1. We
leaked 50 accounts in total on paste sites. For 20 of
1https://bitbucket.org/gianluca students/gmail-honeypot
them, we leaked basic credentials (username and pass-
word pairs) on the popular paste sites pastebin.com and
pastie.org. We leaked 10 account credentials on Russian
paste websites (p.for-us.nl and paste.org.ru). For the re-
maining 20 accounts, we leaked username and password
pairs along with UK and US location information of the
fictitious personas that we associated with the honey ac-
counts. We also included date of birth information of
each persona.
Group Accounts Outlet of leak
1 30 paste websites (no location)
2 20 paste websites (with location)
3 10 forums (no location)
4 20 forums (with location)
5 20 malware (no location)
Table 1: List of account honeypot groupings.
We leaked 30 account credentials on underground fo-
rums. For 10 of them, we only specified username and
password pairs, without additional information. In a
manner similar to the paste site leaks described earlier,
we appended UK and US location information to un-
derground forum leaks, claiming that our fictitious per-
sonas lived in those locations. We also included date of
birth information for each persona.
To leak credentials, we used these forums: offen-
sivecommunity.net,bestblackhatforums.eu,hack-
forums.net, and blackhatworld.com. We selected them
because they were open for anybody to register, and
were highly ranked in Google results. We acknowledge
that some underground forums are not open, and have a
strict vetting policy to let users in [30]. Unfortunately,
however, we did not have access to any private forum.
In addition, the same approach of studying open un-
derground forums has been used by previous work [7].
When leaking credentials on underground forums, we
mimicked the modus operandi of cybercriminals that
was outlined by Stone-Gross et al. in [30]. In the pa-
per, the authors showed that cybercriminals often post
a sample of their stolen datasets on the forum to show
that the accounts are real, and promise to provide ad-
ditional data in exchange for a fee. We logged the mes-
sages that we received on underground forums, mostly
inquiring about obtaining the full dataset, but we did
not follow up to them.
Finally, to study the activity of criminals obtaining
credentials through information-stealing malware in honey
accounts, we leaked access credentials of 20 accounts to
information-stealing malware samples. To this end, we
selected malware samples from the Zeus family, which
is one of the most popular malware families performing
information stealing [10], as well as from the Corebot
family. We will provide detailed information on our
malware honeypot infrastructure in the next section.
The reason for leaking different accounts on different
outlets is to study differences in the behavior of cyber-
criminals getting access to stolen credentials through
different sources. Similarly, we provide decoy location
information in some leaks, and not in others, with the
idea of observing differences in malicious activity de-
pending on the amount and type of information avail-
able to cybercriminals. As we will show in Section 4,
the accesses that were observed in our honey accounts
were heavily influenced by the presence of additional
location information in the leaked content.
Malware honeypot infrastructure. Our malware
sandbox system is structured as follows. A web server
entity manages the honey credentials (usernames and
passwords) and the malware samples. The host ma-
chine creates a Virtual Machine (VM), which contacts
the web server to request an executable malware file and
a honey credential file. The structure is similar to the
one explained in [21]. The malware file is then executed
in the VM (that is, the VM is infected with malware),
after which a script drives a browser in the VM to lo-
gin to Gmail using the downloaded credentials. The
idea is to expose the honey credentials to the malware
that is already running in the VM. After some time,
the infected VM is deleted and a fresh one is created.
This new VM downloads another malware sample and
a different honey credential file, and it repeats the in-
fection and login operation. To maximize the efficiency
of the configuration, before the experiment we carried
out a test without the Gmail login process to select only
samples whose C&C servers were still up and running.
3.3 Threats to validity
We acknowledge that seeding the honey accounts with
emails from the Enron dataset may introduce bias into
our results, and may make the honey accounts less be-
lievable to visitors. However, it is necessary to note that
the Enron dataset is the only large publicly available
email corpus, to the best of our knowledge. To make the
emails believable, we changed the names in the emails,
dates, and company name. In the future, we will work
towards obtaining or generating a better email dataset,
if possible. Also, some visitors may notice that the
honey accounts did not receive any new emails during
the period of observation, and this may affect the way
in which criminals interact with the accounts. Another
threat is that we only leaked honey credentials through
the outlets listed previously (namely paste sites, un-
derground forums, and malware), therefore, our results
reflect the activity of participants present on those out-
lets only. Finally, since we selected only underground
forums that are publicly accessible, our observations
might not reflect the modus operandi of actors who are
active on closed forums that require vetting for signing
up.
3.4 Ethics
The experiments performed in this paper require some
ethical considerations. First of all, by giving access to
our honey accounts to cybercriminals, we incur the risk
that these accounts will be used to damage third par-
ties. To minimize this risk, as we said, we configured our
accounts in a way that all emails would be forwarded
to a sinkhole mailserver under our control and never
delivered to the outside world. We also established a
close collaboration with Google and made sure to re-
port to them any malicious activity that needed atten-
tion. Although the suspicious login filters that Google
typically uses to protect their accounts from unautho-
rized accessed were disabled for our honey accounts, all
other malicious activity detection algorithms were still
in place, and in fact Google suspended a number of
accounts under our control that engaged in suspicious
activity. It is important to note, however, that our ap-
proach does not rely on help from Google to work. Our
main reason for enlisting Google’s help to disable sus-
picious login filters was to ensure that all accesses get
through to the honey accounts (most accesses would
be blocked if Google did not disable the login filters).
This does not impact directly on our methodology, and
as a result does not reduce the wider applicability of
our approach. It is also important to note that Google
did not share with us any details on the techniques
used internally for the detection of malicious activity on
Gmail. Another point of risk is ensuring that the mal-
ware in our VMs would not be able to harm third par-
ties. We followed common practices [28] such as restrict-
ing the bandwidth available to our virtual machines and
sinkholing all email traffic sent by them. Finally, our
experiments involve deceiving cybercriminals by provid-
ing them fake accounts with fake personal information
in them. To ensure that our experiments were run in
an ethical fashion, we obtained IRB approval from our
institution.
4. DATA ANALYSIS
We monitored the activity on our honey accounts for
a period of 7 months, from 25th June, 2015 to 16th
February, 2016. In this section, we first provide an
overview of our results. We then discuss a taxonomy
of the types of activity that we observed. We provide
a detailed analysis of the type of activity monitored
on our honey accounts, focusing on the differences in
modus operandi shown by cybercriminals who obtain
credentials to our honey accounts from different outlets.
We then investigate whether cybercriminals attempt to
evade location-based detection systems by connecting
from locations that are closer to where the owner of ac-
count typically connects from. We also develop a met-
ric to infer which keywords attackers search for when
looking for interesting information in an email account.
Finally, we analyze how certain types of cybercriminals
appear to be stealthier and more advanced than others.
Google records each unique access to a Gmail account
and labels the access with a unique cookie identifier.
These unique cookie identifiers, along with more infor-
mation including times of accesses, are included in the
visitor activity pages of Gmail accounts. Our scripts
extract this data, which we analyze in this section. For
the sake of convenience, we will use the terms “cookie”
and “unique access” interchangeably in the remainder of
this paper.
4.1 Overview
We created, instrumented, and leaked 100 Gmail ac-
counts for our experiments. To avoid biasing our re-
sults, we removed all accesses made to honey accounts
by IP addresses from our monitoring infrastructure. We
also removed all accesses that originated from the city
where our monitoring infrastructure is located. After
this filtering operation, we observed 326 unique accesses
to the accounts during the experiment, during which
147 emails were opened, 845 emails were sent, and there
were 12 unique draft emails composed by cybercrimi-
nals.
In total, 90 accounts received accesses during the ex-
periment, comprising 41 accounts leaked to paste sites,
30 accounts leaked to underground forums, and 19 ac-
counts leaked through malware. 42 accounts were
blocked by Google during the course of the experiment,
due to suspicious activity. We were able to log activity
in those accounts for some time before Google blocked
them. 36 accounts were hijacked by cybercriminals, that
is, the passwords of such accounts were changed by the
cybercriminals. As a result, we lost control of those
accounts. We did not observe any attempt by at-
tackers to change the default send-from address of our
honey accounts. However, assuming that happened and
attackers started sending spam messages, Google would
block such accounts since we asked them to monitor the
accounts with particular attention. A dataset contain-
ing the parsed metadata of the accesses received from
our honey accounts during our experiments is publicly
available at http://dx.doi.org/10.14324/000.ds.1508297
4.2 A taxonomy of account activity
From our dataset of activity observed in the honey
accounts, we devise a taxonomy of attackers based on
unique accesses to such accounts. We identify four types
of attackers, described in detail in the following.
Curious. These accesses constitute the most basic type
of access to stolen accounts. After getting hold of ac-
count credentials, people login on those accounts to
check if such credentials work. Afterwards, they do not
perform any additional action. The majority of the ob-
served accesses belong to this category, accounting for
224 accesses. We acknowledge that this large number
of curious accesses may be due in part to experienced
attackers avoiding interactions with the accounts after
logging in, probably after some careful observations in-
dicating that the accounts do not look real. This could
potentially introduce some bias into our results.
Gold diggers. When getting access to a stolen ac-
count, attackers often want to understand its worth.
For this reason, on logging into honey accounts, some
attackers search for sensitive information, such as ac-
count information and attachments that have financial-
related names. They also seek information that may
be useful in spearphishing attacks. We call these ac-
cesses “gold diggers.” Previous research showed that
this practice is quite common for manual account hi-
jackers [13]. In this paper, we confirm that finding,
provide a methodology to assess the keywords that cy-
bercriminals search for, and analyze differences in the
modus operandi of gold digger accesses for credentials
leaked through different outlets. In total, we observed
82 accesses of this type.
Spammers. One of the main capabilities of webmail
accounts is sending emails. Previous research showed
that large spamming botnets have code in their bots
and in their C&C infrastructure to take advantage of
this capability, by having the bots directly connect to
such accounts and send spam [30]. We consider accesses
to belong to this category if they send any email. We
observed 8 accounts of this type that recorded such ac-
cesses. This low number of accounts shows that send-
ing spam appears not to be one of the main purposes
that cybercriminals use stolen accounts for, when stolen
through the outlets that we studied.
Hijackers. A stealthy cybercriminal is likely to keep
a low profile when accessing a stolen account, to avoid
raising suspicion from the account’s legitimate owner.
Less concerned miscreants, however, might just act to
lock the legitimate owner out of their account by chang-
ing the account’s password. We call these accesses “hi-
jackers.” In total, we observed 36 accesses of this type.
A change of password prevents us from scraping the vis-
itor activity page, and therefore we are unable to col-
lect further information about the accesses performed
to that account.
It is important to note that the taxonomy classes that
we described are not exclusive. For example, an at-
tacker might use an account to send spam emails, there-
fore falling in the “spammer” category, and then change
the password of that account, therefore falling into the
“hijacker” category. Such overlaps happened often for
the accesses recorded in our honey accounts. It is in-
teresting to note that there was no access that behaved
exclusively as “spammer.” Miscreants that sent spam
through our honey accounts also acted as “hijackers” or
as “gold diggers,” searching for sensitive information in
the account.
We wanted to understand the distribution of different
types of accesses in accounts that were leaked through
different means. Figure 1shows a breakdown of this dis-
tribution. As it can be seen, cybercriminals who get ac-
cess to stolen accounts through malware are the stealth-
iest, and never lock the legitimate users out of their
account. Instead, they limit their activity to check-
ing if such credentials are real or searching for sensi-
tive information in the account inbox, perhaps in an at-
tempt to estimate the value of the accounts. Accounts
leaked through paste sites and underground forums see
the presence of “hijackers.” 20% of the accesses to ac-
counts leaked through paste sites, in particular, belong
Figure 1: Distribution of types of accesses for different
credential leak accesses. As it can be seen, most accesses
belong to the “curious” category. It is possible to spot
differences in the types of activities for different leak
outlets. For example, accounts leaked by malware do
not present activity of“hijacker” type. Hijackers, on the
other hand, are particularly common among miscreants
who obtain stolen credentials through paste sites.
Figure 2: CDF of the length of unique accesses for dif-
ferent types of activity on our honey accounts. The vast
majority of unique accesses lasts a few minutes. Spam-
mers tend to use accounts aggressively for a short time
and then disconnect. The other types of accesses, and
in particular “curious” ones, come back after some time,
likely to check for new activity in the honey accounts.
to this category. Accounts leaked through underground
forums, on the other hand, see the highest percentage
of “gold digger” accesses, with about 30% of all accesses
belonging to this category.
4.3 Activity on honey accounts
In the following, we provide detailed analysis on the
unique accesses that we recorded for our honey accounts.
4.3.1 Duration of accesses
For each cookie identifier, we recorded the time that
the cookie first appeared in a particular honey account
as t0, and the last time it appeared in the honey ac-
count as tlast. From this information, we computed the
duration of activity of each cookie as tlast t0. It is nec-
essary to note that tlast of each cookie is a lower bound,
since we cease to obtain information about cookies if the
password of the honey account that is recording cookies
Figure 3: CDF of the time passed between account cre-
dentials leaks and the first visit by a cookie. Accounts
leaked through paste sites receive on average accesses
earlier than accounts leaked through other outlets.
is changed, for instance. Figure 2shows the Cumulative
Distribution Function (CDF) of the length of unique ac-
cesses of different types of attackers. As it can be seen,
the vast majority of accesses are very short, lasting only
a few minutes and never coming back. “Spammer” ac-
cesses, in particular, tend to send emails in burst for
a certain period and then disconnect. “Hijacker” and
“gold digger” accesses, on the other hand, have a long
tail of about 10% accesses that keep coming back for
several days in a row. The CDF shows that most “cu-
rious” accesses are repeated over many days, indicating
that the cybercriminals keep coming back to find out
if there is new information in the accounts. This con-
flicts with the finding in [13], which states that most
cybercriminals connect to a compromised webmail ac-
count once, to assess its value within a few minutes.
However, [13] focused only on accounts compromised
via phishing pages, while we look at a broader range
of ways in which criminals can obtain such credentials.
Our results show that the modes of operations of cy-
bercriminals vary, depending on the outlets they obtain
stolen credentials from.
4.3.2 Time between leak and first access
We then studied how long it takes after credentials
are leaked on different outlets before our infrastructure
records accesses from cybercriminals. Figure 3reports
a CDF of the time between leak and first access for
accounts leaked through different outlets. As it can be
seen, within the first 25 days after leak, we recorded
80% of all unique accesses to accounts leaked to paste
sites, 60% of all unique accesses to accounts leaked to
underground forums, and 40% of all unique accesses to
accounts leaked to malware. A particularly interesting
observation is the nature of unique accesses to accounts
leaked to malware. A close look at Figure 3reveals
rapid increases in unique accesses to honey accounts
leaked to malware, about 30 days after the leak, and
also after 100 days, indicated by two sharp inflection
points.
Figure 4sheds more light into what happened at
those points. The figure reports the unique accesses
to each of our honey accounts over time. An interesting
aspect to note is that accounts that are leaked on public
outlets such as forum and paste sites can be accessed
by multiple cybercriminals at the same time. Account
credentials leaked through malware, on the other hand,
are available only to the botmaster that stole them, un-
til they decide to sell them or to give them to some-
one else. Seeing bursts in accesses to accounts leaked
through malware months after the actual leak happened
could indicate that the accounts were visited again by
the same criminal who operated the malware infrastruc-
ture, or that the accounts were sold on the underground
market and that another miscreant is now using them.
This hypothesis is somewhat confirmed by the fact that
these bursts in accesses were of the “gold digger” type,
while all previous accesses to the same accounts were of
the “curious” type.
In addition, Figure 4shows that the majority of ac-
counts leaked to paste sites were accessed within a few
days of leak, while a particular subset was not accessed
for more than 2 months. That subset refers to the ten
credentials we leaked to Russian paste sites. The cor-
responding honey accounts were not accessed for more
than 2 months from the time of leak. This either indi-
cates that cybercriminals are not many on the Russian
paste sites, or maybe they did not believe that the ac-
counts were real, thus not bothering to access them.
4.3.3 System configuration of accesses
We observed a wide variety of system configurations
for the accesses across groups of leaked accounts, by
leveraging Google’s system fingerprinting information
available to us inside the honey accounts. As shown
in Figure 5a, accesses to accounts leaked on paste sites
were made through a variety of popular browsers, with
Firefox and Chrome taking the lead. We also recorded
many accesses from unknown browsers. It is possible
for an attacker to hide browser information from Google
servers by presenting an empty user agent and hiding
other fingerprintable information [27]. About 50% of
accesses to accounts leaked through paste sites were
not identifiable. Chrome and Firefox take the lead in
groups leaked in underground forums as well, but there
is less variety of browsers there. Interestingly, all ac-
cesses to accounts in malware groups were made from
unknown browsers. This shows that cybercriminals that
accessed groups leaked through malware were stealth-
ier than others. While analyzing the operating systems
used by criminals, we observed that honey accounts
leaked through malware mostly received accesses from
Windows computers, followed by Mac OS X and Linux.
This is shown in Figure 5b. In the paste sites and un-
derground forum groups, we observe a wider range of
operating systems. More than 50% of computers in the
Figure 4: Plot of duration between time of leak and
unique accesses in accounts leaked through different
outlets. As it can be seen, accounts leaked to mal-
ware experience a sudden increase in unique accesses
after 30 days and 100 days from the leak, indicating
that they may have been sold or transferred to some
other party by the cybercriminals behind the malware
command and control infrastructure.
three categories ran on Windows. It is interesting to
note that Android devices were also used to connect
to the honey accounts in paste sites and underground
forum groups.
The diversity of devices and browsers in the paste
sites and underground forums groups indicates a mot-
ley mix of cybercriminals with various motives and ca-
pabilities, compared to the malware groups that appear
to be more homogeneous. It is also obvious that attack-
ers that steal credentials through malware make more
efforts to cover their tracks by evading browser finger-
printing.
4.3.4 Location of accesses
We recorded the location information that we found
in the accesses that were logged by our infrastructure.
Our goal was to understand patterns in the locations
(or proxies) used by criminals to access stolen accounts.
Out of the 326 accesses logged, 132 were coming from
Tor exit nodes. More specifically, 28 accesses to ac-
counts leaked on paste sites were made via Tor, out of a
total of 144 accesses to accounts leaked on paste sites.
48 accesses to accounts leaked on forums were made
through Tor, out of a total of 125 accesses made to ac-
counts leaked on forums. We observed 57 accesses to
accounts leaked through malware, and all except one of
those accesses were made via Tor. We removed these ac-
cesses for further analysis, since they do not provide in-
formation on the physical location of the criminals. Af-
ter removing Tor nodes, 173 unique accesses presented
location information. To determine this location infor-
(a) Distribution of browsers of honey account accesses (b) Distribution of operating systems of honey account
accesses
Figure 5: Distribution of browsers and operating systems of the accesses that we logged to our honey accounts. As it
can be seen, accounts leaked through different outlets attracted cybercriminals with different system configurations.
mation, we used the geolocation provided by Google on
the account activity page of the honey accounts. We
observed accesses from a total of 29 countries. To un-
derstand whether the IP addresses that connected to
our honey accounts had been recorded in previous ma-
licious activity, we ran checks on all IP addresses we
observed against the Spamhaus blacklist. We found
20 IP addresses that accessed our honey accounts in the
Spamhaus blacklist. Because of the nature of this black-
list, we believe that the addresses belong to malware-
infected machines that are used by cybercriminals to
connect to the stolen accounts.
One of our goals was to observe if cybercriminals at-
tempt to evade location-based login risk analysis sys-
tems by tweaking access origins. In particular, we wanted
to assess whether telling criminals the location where
the owner of an account is based influences the loca-
tions that they will use to connect to the account. De-
spite observing 57 accesses to our honey accounts leaked
through malware, we discovered that all these connec-
tions except one originated from Tor exit nodes. This
shows that the malware operators that accessed our ac-
counts prefer to hide their location through the use of
anonymizing systems rather than modifying their lo-
cation based on where the stolen account is typically
connecting from.
While leaking the honey credentials, we chose Lon-
don and Pontiac, MI as our decoy UK and US locations
respectively. The idea was to claim that the honey ac-
counts leaked with location details belonged to fictitious
personas living in either London or Pontiac. However,
we realized that leaking multiple accounts with the same
location might cause suspicion. As a result, we chose de-
coy UK and US locations such that London and Pontiac,
IL were the midpoints of those locations respectively.
To observe the impact of availability of location in-
formation about the honey accounts on the locations
that cybercriminals connect from, we calculated the
median values of distances of the locations recorded
in unique accesses, from the midpoints of the adver-
tised decoy locations in our account leaks. For ex-
ample, for the accesses Ato honey accounts leaked
on paste sites, advertised with UK information, we ex-
tracted location information, translated them to coor-
dinates LA, and computed the dist paste U K vector as
distance(LA, midUK ), where midU K are London’s co-
ordinates. All distances are in kilometers. We extracted
the median values of all distance vectors obtained, and
plotted circles on UK and US maps, specifying those
median distances as radii of the circles, as shown in
Figures 6a and 6b.
Interestingly, we observe that connections to accounts
with advertised location information originate from places
closer to the midpoints than accounts with leaked infor-
mation containing usernames and passwords only. Fig-
ure 6a shows that connections to accounts leaked on
paste sites and forums result in the smaller median cir-
cles, that is, the connections originate from locations
closer to London, the UK midpoint. The smallest circle
is for the accounts leaked on paste sites, with adver-
tised UK location information (radius 1400 kilometers).
In contrast, the circle of accounts leaked on paste sites
without location information has a radius of 1784 kilo-
meters. The median circle of the accounts leaked in
underground forums, with no advertised location infor-
mation, is the largest circle in Figure 6a, while the one
of accounts leaked in underground forums, along with
UK location information, is smaller.
We obtained similar results in the US plot, with some
interesting distinctions. As shown in Figure 6b, con-
nections to honey accounts leaked on paste sites, with
advertised US locations are clustered around the US
midpoint, as indicated by the circle with a radius of
939 kilometers, compared to the median circle of ac-
counts leaked on paste sites without location informa-
tion, which has a radius of 7900 kilometers. However,
despite the fact that the median circle of accounts leaked
in underground forums with advertised location infor-
mation is less than that of the one without advertised
location information, the difference in their radii is not
as pronounced. This again supports the indication that
cybercriminals on paste sites exhibit more location mal-
(a) Distance of login locations from the UK midpoint (b) Distance of login locations from the US midpoint
Figure 6: Distance of login locations from the midpoints of locations advertised while leaking credentials. Red
lines indicate credentials leaked on paste sites with no location information, green lines indicate credentials leaked
on paste sites with location information, purple lines indicate credentials leaked on underground forums without
location information, while blue lines indicate credentials leaked on underground forums with location information.
As it can be seen, account credentials leaked with location information attract logins from hosts that are closer to
the advertised midpoint than credentials that are posted without any location information.
leability, that is, masquerading their origins of accesses
to appear closer to the advertised location, when pro-
vided. It also shows that cybercriminals on the studied
forums are less sophisticated, or care less than the ones
on paste sites.
Statistical significance. As we explained, Figures 6a
and 6b show that accesses to leaked accounts happen
closer to advertised locations if this information is in-
cluded in the leak. To confirm the statistical signifi-
cance of this finding, we performed a Cramer Von Mises
test [15]. The Anderson version [8] of this test is used
to understand if two vectors of values do likely have
the same statistical distribution or not. The p-value
has to be under 0.01 to let us state that it is possible
to reject the null hypothesis (i.e., that the two vectors
of distances have the same distribution), otherwise it
is not possible to state with statistical significance that
the two distance vectors come from different distribu-
tions. The p-value from the test on paste sites vec-
tors (p-values of 0.0017415 for UK location information
versus no location and 0.0000007 for US location infor-
mation versus no location) allows us to reject the null
hypothesis, thus stating that the two vectors come from
different distributions while we cannot say the same ob-
serving the p-values for the tests on forum vectors (p-
values of 0.272883 for the UK case and 0.272011 for the
US one). Therefore, we can conclusively state that the
statistical test proves that criminals using paste sites
connect from closer locations when location information
is provided along with the leaked credentials. We can-
not reach that conclusion in the case of accounts leaked
to underground forums, although Figures 6a and 6b
indicate that there are some location differences in this
case too.
4.3.5 What are “gold diggers” looking for?
Cybercriminals compromise online accounts due to
the inherent value of those accounts. As a result, they
assess accounts to decide how valuable they are, and de-
cide exactly what to do with such accounts. We decided
to study the words that they searched for in the honey
accounts, in order to understand and potentially char-
acterize anomalous searches in the accounts. A limiting
factor in this case was the fact that we did not have ac-
cess to search logs of the honey accounts, but only to the
content of the emails that were opened. To overcome
this limitation, we employed Term Frequency–Inverse
Document Frequency (TF-IDF). TF-IDF is used to rank
words in a corpus by importance. As a result we re-
lied on TF-IDF to infer the words that cybercriminals
Searched words T F ID FRT F IDFAT F I DFRT F I DFACommon words T F I DFRT F I DFAT F ID FRT F IDFA
results 0.2250 0.0127 0.2122 transfer 0.2795 0.2949 -0.0154
bitcoin 0.1904 0.0 0.1904 please 0.2116 0.2608 -0.0493
family 0.1624 0.0200 0.1423 original 0.1387 0.1540 -0.0154
seller 0.1333 0.0037 0.1296 company 0.0420 0.1531 -0.1111
localbitcoins 0.1009 0.0 0.1009 would 0.0864 0.1493 -0.0630
account 0.1114 0.0247 0.0866 energy 0.0618 0.1471 -0.0853
payment 0.0982 0.0157 0.0824 information 0.0985 0.1308 -0.0323
bitcoins 0.0768 0.0 0.0768 about 0.1342 0.1226 0.0116
below 0.1236 0.0496 0.0740 email 0.1402 0.1196 0.0207
listed 0.0858 0.0207 0.0651 power 0.0462 0.1175 -0.0713
Table 2: List of top 10 words by T F ID FRT F IDFA(on the left) and list of top 10 words by T F ID FA(on the
right). The words on the left are the ones that have the highest difference in importance between the emails opened
by attackers and the emails in the entire corpus. For this reason, they are the words that attackers most likely
searched for when looking for sensitive information in the stolen accounts. The words on the right, on the other
hand, are the ones that have the highest importance in the entire corpus.
searched for in the honey accounts. TF-IDF is a prod-
uct of two metrics, namely Term Frequency (TF) and
Inverse Document Frequency (IDF). The idea is that we
can infer the words that cybercriminals searched for, by
comparing the important words in the emails opened by
cybercriminals to the important words in all emails in
the decoy accounts.
In its simplest form, TF is a measure of how fre-
quently term tis found in document d, as shown in
Equation 1. IDF is a logarithmic scaling of the fraction
of the number of documents containing term t, as shown
in Equation 2where Dis the set of all documents in the
corpus, Nis the total number of documents in the cor-
pus, |dD:td|is the number of documents in D,
that contain term t. Once TF and IDF are obtained,
TF-IDF is computed by multiplying TF and IDF, as
shown in Equation 3.
tf(t, d) = ft,d (1)
idf (t, D ) = log N
|dD:td|(2)
tfidf (t, d, D) = tf (t, d)×idf (t, D ) (3)
The output of TF-IDF is a weighted metric that ranges
between 0 and 1. The closer the weighted value is to 1,
the more important the term is in the corpus.
We evaluated TF-IDF on all terms in a corpus of
text comprising two documents, namely, all emails dA
in the honey accounts, and all emails dRopened by
the attackers. The intuition is that the words that
have a large importance in the emails that have been
opened by a criminal, but have a lower importance in
the overall dataset, are likely to be keywords that the
attackers searched for in the Gmail account. We pre-
processed the corpus by filtering out all words that have
less than 5 characters, and removing all known header-
related words, for instance “delivered” and “charset,”
honey email handles, and also removing signaling infor-
mation that our monitoring infrastructure introduced
into the emails. After running TF-IDF on all remaining
terms in the corpus, we obtained their TF-IDF values
as vectors T F ID FAand T F IDFR, the TF-IDF val-
ues of all terms in the corpus [dA, dR]. We proceeded
to compute the vector T F IDFRT F I DFA. The top
10 words by T F IDFRT F I DFA, compared to the
top 10 words by T F IDFAare presented in Table 2.
Words that have T F IDFRvalues that are higher than
T F I DFAwill rank higher in the list, and those are the
words that the cybercriminals likely searched for.
As seen in Table 2, the top 10 important words by
T F I DFRT F IDFAare sensitive words, such as “bit-
coin,”“family,” and “payment.” Comparing these words
with the most important words in the entire corpus re-
veals the indication that attackers likely searched for
sensitive information, especially financial information.
In addition, words with the highest importance in the
entire corpus (for example, “company” and “energy”),
shown in the right side of Table 2, have much lower im-
portance in the emails opened by cybercriminals, and
most of them have negative values in T F IDFRT F I DFA.
This is a strong indicator that the emails opened in the
honey accounts were not opened at random, but were
the result of searches for sensitive information.
Originally, the Enron dataset had no “bitcoin” term.
However, that term was introduced into the opened
emails document dR, through the actions of one of the
cybercriminals that accessed some of the honey accounts.
The cybercriminal attempted to send blackmail mes-
sages from some of our honey accounts to Ashley Madi-
son scandal victims [3], requesting ransoms in bitcoin,
in exchange for silence. In the process, many draft
emails containing information about “bitcoin” were cre-
ated and abandoned by the cybercriminal, and other
cybercriminals opened them during later accesses. That
way, our monitoring infrastructure picked up “bitcoin”
related terms, and they rank high in Table 2, showing
that cybercriminals showed a lot of interest in those
emails.
4.4 Interesting case studies
In this section, we present some interesting case stud-
ies that we encountered during our experiments. They
help to shed further light into actions that cybercrimi-
nals take on compromised webmail accounts.
Three of the honey accounts were used by an attacker
to send multiple blackmail messages to some victims of
the Ashley Madison scandal. The blackmailer threat-
ened to expose the victims, unless they made some pay-
ments in bitcoin to a specified bitcoin wallet. Tutorials
on how to make bitcoin payments were also included in
the messages. The blackmailer created and abandoned
many drafts emails targeted at more Ashley Madison
victims, which as we have already mentioned some other
visitors to the accounts opened, thus contributing to
the opened emails that our monitoring infrastructure
recorded.
Two of the honey accounts received notification emails
about the hidden Google Apps Script in both honey
accounts “using too much computer time.” The noti-
fications were opened by an attacker, and we received
notifications about the opening actions.
Finally, an attacker registered on an carding forum
using one of the honey accounts as registration email
address. As a result, registration confirmation infor-
mation was sent to the honey account This shows that
some of the accounts were used as stepping stones by
cybercriminals to perform further illicit activity.
4.5 Sophistication of attackers
From the accesses we recorded in the honey accounts,
we identified 3 peculiar behaviors of cybercriminals that
indicate their level of sophistication, namely, configu-
ration hiding – for instance by hiding user agent in-
formation, location filter evading – by connecting from
locations close to the advertised decoy location if pro-
vided, and stealthiness – avoiding performing clearly
malicious actions such as hijacking and spamming. At-
tackers accessing the different groups of honey accounts
exhibit different types of sophistication. Those access-
ing accounts leaked through malware are stealthier than
others – they don’t hijack the accounts, and they don’t
send spam from them. They also access the accounts
through Tor, and they hide their system configuration,
for instance, their web browser is not fingerprintable by
Google. Attackers accessing accounts leaked on paste
sites tend to connect from locations closer to the ones
specified as decoy locations in the leaked account. They
do this in a bid to evade detection. Attackers access-
ing accounts leaked in underground forums do not make
significant attempts to stay stealthy or to connect from
closer locations. These differences in sophistication could
be used to characterize attacker behavior in future work.
5. DISCUSSION
In this section, we discuss the implications of the
findings we made in this paper. First, we talk about
what our findings mean for current mitigation tech-
niques against compromised online service accounts, and
how they could be used to devise better defenses. Then,
we talk about some limitations of our method. Finally,
we present some ideas for future work.
Implications of our findings. In this paper, we made
multiple findings that provide the research community
with a better understanding of what happens when on-
line accounts get compromised. In particular, we dis-
covered that if attackers are provided with location in-
formation about the online accounts, they then tend
to connect from places that are closer to those adver-
tised locations. We believe that this is an attempt to
evade current security mechanisms employed by online
services to discover suspicious logins. Such systems of-
ten rely on the origin of logins, to assess how suspicious
those login attempts are. Our findings show that there
is an arms race going on, with attackers attempting
to actively evade the location-based anomaly detection
systems employed by Google. We also observed that
many accesses were received through Tor exit nodes, so
it is hard to determine the exact origins of logins. This
problem shows the importance of defense in depth in
protecting online systems, in which multiple detection
systems are employed at the same time to identify and
block miscreants.
Despite confirming existing evasion techniques in use
by cybercriminals, our experiments also highlighted in-
teresting behaviors that could be used to develop effec-
tive systems to detect malicious activity. For example,
our observations about the words searched for by the cy-
bercriminals show that behavioral modeling could work
in identifying anomalous behavior in online accounts.
Anomaly detection systems could be trained adaptively
on words being searched for by the legitimate account
owner over a period of time. A deviation of search be-
havior would then be flagged as anomalous, indicating
that the account may have been compromised. Sim-
ilarly, anomaly detection systems could be trained on
the durations of connections during benign usage, and
deviations from those could be flagged as anomalous.
Limitations. We encountered a number of limitations
in the course of the experiments. For example, we were
able to leak the honey accounts only on a few outlets,
namely paste sites, underground forums, and malware.
In particular, we could only target underground forums
that were open to the public and for which registration
was free. Similarly, we could not study some of the most
recent families of information-stealing malware such as
Dridex, because they would not execute in our virtual
environment. Attackers could find the scripts we hid in
the honey accounts and remove them, making it impos-
sible for us to monitor the activity of the accounts. This
is an intrinsic limitation of our monitoring architecture,
but in principle studies similar to ours could be per-
formed by the online service providers themselves, such
as Google and Facebook. By having access to the full
logs of their systems, such entities would have no need
to set up monitoring scripts, and it would be impossi-
ble for attackers to evade their scrutiny. Finally, while
evaluating what cybercriminals were looking for in the
honey accounts, we were able to observe the emails that
they found interesting in the honey accounts, not every-
thing they searched for. This is because we do not have
access to the search logs of the honey accounts.
Future work. In the future, we plan to continue ex-
ploring the ecosystem of stolen accounts, and gaining a
better understanding of the underground economy sur-
rounding them. We would explore ways to make the
decoy accounts more believable, to attract more cyber-
criminals and keep them engaged with the decoy ac-
counts. We intend to set up additional scenarios, such
as studying attackers who have a specific motivation,
for example compromising accounts that belong to po-
litical activists (rather than generic corporate accounts,
as we did in this paper). We would also like to study
whether demographic information, as well as the lan-
guage that the emails in honey accounts are written
in, influence the way in which cybercriminals interact
with these accounts. To mitigate the fact that our in-
frastructure can only identify search terms for emails
that were found in the accounts, we plan to seed the
honey accounts with some specially crafted emails con-
taining decoy sensitive information, for instance, fake
bank account information and login credentials, along
with other regular email messages. Hopefully, this type
of specialized email seeding will help to increase the va-
riety of hits when cybercriminals search for content in
the honey accounts, by improving the seeding of the
honey accounts. We believe this will improve our in-
sight into what criminals search for.
6. RELATED WORK
In this section, we briefly compare this paper with
previous work, noting that most previous work focused
on spam and social spam. Only a few focused on manual
hijacking of accounts and their activity.
Bursztein et al. [13] investigated manual hijacking of
online accounts through phishing pages. The study fo-
cuses on cybercriminals that steal user credentials and
use them privately, and shows that manual hijacking
is not as common as automated hijacking by botnets.
This paper illustrates the usefulness of honey creden-
tials (account honeypots), in the study of hijacked ac-
counts. Compared to the work by Bursztein et al.,
which focused on phishing, we analyzed a much broader
threat model, looking at account credentials automati-
cally stolen by malware, as well as the behavior of cy-
bercriminals that obtain account credentials through
underground forums and paste sites. By focusing on
multiple types of miscreants, we were able to show dif-
ferences in their modus operandi, and provide multi-
ple insights on the activities that happen on hijacked
Gmail accounts in the wild. We also provide an open
source framework that can be used by other researchers
to set up experiments similar to ours and further explore
the ecosystem of stolen Google accounts. To the best
of our knowledge, our infrastructure is the first pub-
licly available Gmail honeypot infrastructure. Despite
the fact that the authors of [13] had more visibility on
the account hijacking phenomenon than we did, since
they were operating the Gmail service, the dataset that
we collected is of comparable size to theirs: we logged
326 malicious accesses to 100 accounts, while they stud-
ied 575 high-confidence hijacked accounts.
A number of papers looked at abuse of accounts on
social networks. Thomas et al. [34] studied Twitter ac-
counts under the control of spammers. Stringhini et
al. [31] studied social spam using 300 honeypot profiles,
and presented a tool for detection of spam on Face-
book and Twitter. Similar work was also carried out
in [9,12,24,38]. Thomas et al. [35] studied underground
markets in which fake Twitter accounts are sold and
then used to spread spam and other malicious content.
Unlike this paper, they focus on fake accounts and not
on legitimate ones that have been hijacked. Wang et
al. [37] proposed the use of patterns of click events to
spot fake accounts in online services.
Researchers also looked at developing systems to de-
tect compromised accounts. Egele et al. [18] presented
a system that detects malicious activity in online social
networks using statistical models. Stringhini et al. [32]
developed a tool for detecting compromised email ac-
counts based on the behavioral modeling of senders.
Other papers investigated the use of stolen credentials
and stolen files by setting up honeyfiles. Liu et al. [25]
deployed honeyfiles containing honey account creden-
tials in P2P shared spaces. The study used a similar
approach to ours, especially in the placement of honey
account credentials. However, they placed more empha-
sis on honeyfiles than on honey credentials. Besides,
they studied P2P networks while our work focuses on
compromised accounts in webmail services. Nikiforakis
et al. [26] studied privacy leaks in file hosting services by
deploying honeyfiles on them. In our previous work [23],
we developed an infrastructure to study malicious activ-
ity in online spreadsheets, using an approach similar to
the one described in this paper. Stone-Gross et al. [30]
studied a large-scale spam operation by analyzing 16
C&C servers of Pushdo/Cutwail botnet. In the paper,
the authors highlight that the Cutwail botnet, one of
the largest of its time, has the capability of connect-
ing to webmail accounts to send spam. In their paper,
Stone-Gross et al. also describe the activity of cyber-
criminals on spamdot, a large underground forum. They
show that cybercriminals were actively trading account
information such as the one provided in this paper, pro-
viding free “teasers” of the overall datasets for sale. In
this paper, we used a similar approach to leak account
credentials on underground forums.
7. CONCLUSION
In this paper, we presented a honey account system
able to monitor the activity of cybercriminals that gain
access to Gmail account credentials. Our system is
publicly available to encourage researchers to set up
additional experiments and improve the knowledge of
our community regarding what happens after webmail
accounts are compromised2. We leaked 100 honey ac-
counts on paste sites, underground forums, and virtual
machines infected with malware, and provided detailed
statistics of the activity of cybercriminals on the ac-
counts, together with a taxonomy of the criminals. Our
findings could help the research community to get a
better understanding of the ecosystem of stolen online
accounts, and potentially help researchers to develop
better detection systems against this malicious activity.
8. ACKNOWLEDGMENTS
We wish to thank our shepherd Andreas Haeberlen
for his advice on how to improve our paper, and Mark
Risher and Tejaswi Nadahalli from Google for their sup-
port throughout the project. We also thank the anony-
mous reviewers for their comments. This work was
supported by the EPSRC under grant EP/N008448/1,
and by a Google Faculty Award. Jeremiah Onaolapo
was supported by the Petroleum Technology Develop-
ment Fund (PTDF), Nigeria, while Enrico Mariconti
was funded by the EPSRC under grant 1490017.
9. REFERENCES
[1] Apps Script.
https://developers.google.com/apps-script/?hl=en.
[2] Dropbox User Credentials Stolen: A Reminder To
Increase Awareness In House.
http://www.symantec.com/connect/blogs/
dropbox-user-credentials-stolen-reminder-\
increase-awareness-house.
[3] Hackers Finally Post Stolen Ashley Madison
Data. https://www.wired.com/2015/08/
happened-hackers-posted-stolen-ashley-madison-data/.
[4] Overview of Google Apps Script.
https://developers.google.com/apps-script/overview.
[5] Pastebin. pastebin.com.
[6] The Target Breach, By the Numbers.
http://krebsonsecurity.com/2014/05/
the-target-breach-by-the-numbers/.
[7] S. Afroz, A. C. Islam, A. Stolerman,
R. Greenstadt, and D. McCoy. Doppelg¨
anger
finder: Taking stylometry to the underground. In
IEEE Symposium on Security and Privacy, 2014.
[8] T. W. Anderson and D. A. Darling. Asymptotic
theory of certain “goodness of fit” criteria based
on stochastic processes. The Annals of
Mathematical Statistics, 1952.
2https://bitbucket.org/gianluca students/gmail-honeypot
[9] F. Benevenuto, G. Magno, T. Rodrigues, and
V. Almeida. Detecting Spammers on Twitter. In
Conference on Email and Anti-Spam (CEAS),
2010.
[10] H. Binsalleeh, T. Ormerod, A. Boukhtouta,
P. Sinha, A. Youssef, M. Debbabi, and L. Wang.
On the analysis of the Zeus botnet crimeware
toolkit. In Privacy, Security and Trust (PST),
2010.
[11] D. Boneh, S. Inguva, and I. Baker. SSL MITM
Proxy. http:// crypto.stanford.edu/ssl- mitm, 2007.
[12] Y. Boshmaf, I. Muslukhov, K. Beznosov, and
M. Ripeanu. The socialbot network: when bots
socialize for fame and money. In Annual
Computer Security Applications Conference
(ACSAC), 2011.
[13] E. Bursztein, B. Benko, D. Margolis,
T. Pietraszek, A. Archer, A. Aquino, A. Pitsillidis,
and S. Savage. Handcrafted Fraud and Extortion:
Manual Account Hijacking in the Wild. In ACM
Internet Measurement Conference (IMC), 2014.
[14] E. Butler. Firesheep.
http:// codebutler.com/ firesheep, 2010.
[15] H. Cram`er. On the composition of elementary
errors. Skandinavisk Aktuarietidskrift, 1928.
[16] A. Das, J. Bonneau, M. Caesar, N. Borisov, and
X. Wang. The Tangled Web of Password Reuse.
In Symposium on Network and Distributed System
Security (NDSS), 2014.
[17] R. Dhamija, J. D. Tygar, and M. Hearst. Why
phishing works. In ACM Conference on Human
Factors in Computing Systems (CHI), 2006.
[18] M. Egele, G. Stringhini, C. Kruegel, and
G. Vigna. COMPA: Detecting Compromised
Accounts on Social Networks. In Symposium on
Network and Distributed System Security (NDSS),
2013.
[19] M. Egele, G. Stringhini, C. Kruegel, and
G. Vigna. Towards Detecting Compromised
Accounts on Social Networks. In IEEE
Transactions on Dependable and Secure
Computing (TDSC), 2015.
[20] T. N. Jagatic, N. A. Johnson, M. Jakobsson, and
F. Menczer. Social Phishing. Communications of
the ACM, 50(10):94–100, 2007.
[21] J. P. John, A. Moshchuk, S. D. Gribble, and
A. Krishnamurthy. Studying Spamming Botnets
Using Botlab. In USENIX Symposium on
Networked Systems Design and Implementation
(NSDI), 2009.
[22] B. Klimt and Y. Yang. Introducing the Enron
Corpus. In Conference on Email and Anti-Spam
(CEAS), 2004.
[23] M. Lazarov, J. Onaolapo, and G. Stringhini.
Honey Sheets: What Happens to Leaked Google
Spreadsheets? In USENIX Workshop on Cyber
Security Experimentation and Test (CSET), 2016.
[24] K. Lee, J. Caverlee, and S. Webb. The social
honeypot project: protecting online communities
from spammers. In World Wide Web Conference
(WWW), 2010.
[25] B. Liu, Z. Liu, J. Zhang, T. Wei, and W. Zou.
How many eyes are spying on your shared folders?
In ACM Workshop on Privacy in the Electronic
Society (WPES), 2012.
[26] N. Nikiforakis, M. Balduzzi, S. Van Acker,
W. Joosen, and D. Balzarotti. Exposing the Lack
of Privacy in File Hosting Services. In USENIX
Workshop on Large-Scale Exploits and Emergent
Threats (LEET), 2011.
[27] N. Nikiforakis, A. Kapravelos, W. Joosen,
C. Kruegel, F. Piessens, and G. Vigna. Cookieless
monster: Exploring the ecosystem of web-based
device fingerprinting. In IEEE Symposium on
Security and Privacy, 2013.
[28] C. Rossow, C. J. Dietrich, C. Grier, C. Kreibich,
V. Paxson, N. Pohlmann, H. Bos, and M. van
Steen. Prudent practices for designing malware
experiments: Status quo and outlook. In IEEE
Symposium on Security and Privacy, 2012.
[29] B. Stone-Gross, M. Cova, L. Cavallaro,
B. Gilbert, M. Szydlowski, R. Kemmerer,
C. Kruegel, and G. Vigna. Your Botnet is My
Botnet: Analysis of a Botnet Takeover. In ACM
Conference on Computer and Communications
Security (CCS), 2009.
[30] B. Stone-Gross, T. Holz, G. Stringhini, and
G. Vigna. The underground economy of spam: A
botmaster’s perspective of coordinating
large-scale spam campaigns. In USENIX
Workshop on Large-Scale Exploits and Emergent
Threats (LEET), 2011.
[31] G. Stringhini, C. Kruegel, and G. Vigna.
Detecting Spammers on Social Networks. In
Annual Computer Security Applications
Conference (ACSAC), 2010.
[32] G. Stringhini and O. Thonnard. That Ain’t You:
Blocking Spearphishing Through Behavioral
Modelling. In Detection of Intrusions and
Malware, and Vulnerability Assessment (DIMVA),
2015.
[33] B. Taylor. Sender Reputation in a Large Webmail
Service. In Conference on Email and Anti-Spam
(CEAS), 2006.
[34] K. Thomas, C. Grier, D. Song, and V. Paxson.
Suspended accounts in retrospect: an analysis of
Twitter spam. In ACM Internet Measurement
Conference (IMC), 2011.
[35] K. Thomas, D. McCoy, C. Grier, A. Kolcz, and
V. Paxson. Trafficking Fraudulent Accounts: The
Role of the Underground Market in Twitter Spam
and Abuse. In USENIX Security Symposium,
2013.
[36] D. Wang, Z. Zhang, P. Wang, J. Yan, and
X. Huang. Targeted Online Password Guessing:
An Underestimated Threat. In ACM Conference
on Computer and Communications Security
(CCS), 2016.
[37] G. Wang, T. Konolige, C. Wilson, X. Wang,
H. Zheng, and B. Y. Zhao. You are how you click:
Clickstream analysis for sybil detection. USENIX
Security Symposium, 2013.
[38] S. Webb, J. Caverlee, and C. Pu. Social
Honeypots: Making Friends with a Spammer
Near You. In Conference on Email and Anti-Spam
(CEAS), 2008.
... Not only does data theft from organizations appear to occur relatively frequently, it could also have various severe consequences, including financial losses, reputational damage, and negative emotional experiences (Button et al., 2021). Furthermore, as is the case with identity theft in the physical realm (Copes and Vieraitis 2009), digital data theft facilitates a range of other criminal activities, such as business e-mail compromise, which entails impersonation through e-mail (Lazarus 2024), and extortion (Matthijsse, VanT Hoff-de Goede, and Leukfeldt 2023;Onaolapo, Mariconti, and Stringhini 2016). Despite the wide range of potential negative consequences, data theft from computer systems is still mainly examined from a technical perspective (Beaman et al. 2021;Mat et al. 2021). ...
... These tools could reveal systems, and by extension organizations, that offenders might have otherwise missed. Other studies also demonstrated that offenders navigated their way to valuable data by using search tools of mailboxes and entering search terms like "bitcoin," "bank," or "password" (Bursztein et al. 2014;Onaolapo, Mariconti, and Stringhini 2016). In essence, scan and search tools shape offenders' virtual awareness space. ...
... Moreover, monitoring systems that track user activity can help detect anomalies indicating insider threats, allowing for timely intervention (Holt, 2013). Educating employees about the importance of cybersecurity and fostering a culture of transparency and ethical behavior are also crucial in mitigating insider threats (Onaolapo et al., 2016). ...
... Conversely, technical vulnerabilities like poor encryption and misconfigurations typically result from insufficient implementation of security protocols and lack of oversight. Factors contributing to these vulnerabilities include the absence of comprehensive security policies, inadequate training, and the inherent complexity of managing modern IT environments (Onaolapo et al., 2016). High-profile breaches, such as the Equifax incident caused by poor encryption practices and the Target breach involving compromised vendor credentials, illustrate the critical interplay between technical flaws and human errors in facilitating cyberattacks (Mughaid et al., 2022). ...
... Moreover, monitoring systems that track user activity can help detect anomalies indicating insider threats, allowing for timely intervention (Holt, 2013). Educating employees about the importance of cybersecurity and fostering a culture of transparency and ethical behavior are also crucial in mitigating insider threats (Onaolapo et al., 2016). ...
... Conversely, technical vulnerabilities like poor encryption and misconfigurations typically result from insufficient implementation of security protocols and lack of oversight. Factors contributing to these vulnerabilities include the absence of comprehensive security policies, inadequate training, and the inherent complexity of managing modern IT environments (Onaolapo et al., 2016). High-profile breaches, such as the Equifax incident caused by poor encryption practices and the Target breach involving compromised vendor credentials, illustrate the critical interplay between technical flaws and human errors in facilitating cyberattacks (Mughaid et al., 2022). ...
Article
Full-text available
Cybersecurity Software Bugs Weak Passwords Misconfigurations Social Engineering Vulnerabilities Information Security This systematic review examines the most significant cybersecurity vulnerabilities, employing the PRISMA methodology to analyze findings from a comprehensive selection of 150 recent research articles. The study identifies and explores key vulnerabilities, including phishing, compromised credentials, poor encryption, misconfigurations, malicious insiders, ransomware, and exploited trust relationships. The findings highlight the persistent prevalence of phishing and compromised credentials, driven by evolving attacker tactics and the increasing complexity of remote work environments. Technical vulnerabilities such as inadequate encryption and misconfigurations remain critical issues, emphasizing the need for stringent security protocols and continuous monitoring. Malicious insiders continue to pose substantial risks, necessitating robust access controls and comprehensive employee education. The review also underscores the growing sophistication of ransomware attacks, particularly those employing double extortion tactics, and the significant threat posed by compromised trust relationships between organizations. The study concludes that a holistic approach, integrating advanced technical defenses with human-centric strategies, is essential for enhancing cybersecurity resilience and protecting sensitive information in an ever-evolving digital landscape. Article Information
Article
The increased sophistication and frequency of phishing attacks that target organizations necessitate a comprehensive cyber security strategy to handle phishing attacks from several perspectives, such as the detection of phishing and testing of users’ awareness. Through a systematic review of 163 research articles, we analyzed the organization-oriented phishing research to categorize current research and identify future opportunities. We find that a notable number of studies concentrate on phishing detection and awareness while other layers of protection are overlooked, such as the mitigation of phishing. In addition, we draw attention to shortcomings and challenges. We believe that this article will provide opportunities for future research on phishing in organizations.
Article
We draw on the Protection Motivation Theory (PMT) to design interventions that encourage users to change breached passwords. Our online experiment ( n =1,386) compared the effectiveness of a threat appeal (highlighting the negative consequences after passwords were breached) and a coping appeal (providing instructions on changing the breached password) in a 2×2 factorial design. Compared to the control condition, participants receiving the threat appeal were more likely to intend to change their passwords, and participants receiving both appeals were more likely to end up changing their passwords. Participants’ password change behaviors are further associated with other factors, such as their security attitudes (SA-6) and time passed since the breach, suggesting that PMT-based interventions are useful but insufficient to fully motivate users to change their passwords. Our study contributes to PMT’s application in security research and provides concrete design implications for improving compromised credential notifications.
Conference Paper
Full-text available
Cloud-based documents are inherently valuable, due to the volume and nature of sensitive personal and business content stored in them. Despite the importance of such documents to Internet users, there are still large gaps in the understanding of what cybercriminals do when they illicitly get access to them by for example compromising the account credentials they are associated with. In this paper, we present a system able to monitor user activity on Google spreadsheets. We populated 5 Google spreadsheets with fake bank account details and fake funds transfer links. Each spreadsheet was configured to report details of accesses and clicks on links back to us. To study how people interact with these spreadsheets in case they are leaked, we posted unique links pointing to the spreadsheets on a popular paste site. We then monitored activity in the accounts for 72 days, and observed 165 accesses in total. We were able to observe interesting modifications to these spreadsheets performed by illicit accesses. For instance, we observed deletion of some fake bank account information, in addition to insults and warnings that some visitors entered in some of the spreadsheets. Our preliminary results show that our system can be used to shed light on cybercriminal behavior with regards to leaked online documents.
Conference Paper
Full-text available
While trawling online/offine password guessing has been intensively studied, only a few studies have examined targeted online guessing, where an attacker guesses a specific victim’s password for a service, by exploiting the victim's personal information such as one sister password leaked from the victim’s another account and some personally identifiable information (PII). A key challenge for targeted online guessing is to choose the most effective password candidates, while the number of guess attempts allowed by a server's lockout or throttling mechanisms is typically very small. We propose TarGuess, a framework that systematically characterizes typical targeted guessing scenarios with seven sound mathematical models, each of which is based on varied kinds of data available to an attacker. These models allow us to design novel and effcient guessing algorithms. Extensive experiments on 10 large real-world password datasets show the effectiveness of TarGuess. Particularly, TarGuess I~IV capture the four most representative scenarios and within 100 guesses: (1) TarGuess-I outperforms its foremost counterpart by 142% against security-savvy users and by 46% against normal users; (2) TarGuess-II outperforms its foremost counterpart by 169% on security-savvy users and by 72% against normal users; and (3) Both TarGuess-III and IV gain success rates over 73% against normal users and over 32% against security-savvy users. TarGuess-III and IV, for the first time, address the issue of cross-site online guessing when given the victim’s one sister password and some PII.
Article
Full-text available
Compromising social network accounts has become a profitable course of action for cybercriminals. By hijacking control of a popular media or business account, attackers can distribute their malicious messages or disseminate fake information to a large user base. The impacts of these incidents range from a tarnished reputation to multi-billion dollar monetary losses on financial markets. In our previous work, we demonstrated how we can detect large-scale compromises (i.e., so-called campaigns) of regular online social network users. In this work, we show how we can use similar techniques to identify compromises of individual high-profile accounts. High-profile accounts frequently have one characteristic that makes this detection reliable -- they show consistent behavior over time. We show that our system, were it deployed, would have been able to detect and prevent three real-world attacks against popular companies and news agencies. Furthermore, our system, in contrast to popular media, would not have fallen for a staged compromise instigated by a US restaurant chain for publicity reasons.
Conference Paper
Full-text available
One of the ways in which attackers steal sensitive information from corporations is by sending spearphishing emails. A typical spearphishing email appears to be sent by one of the victim's coworkers or business partners, but has instead been crafted by the attacker. A particularly insidious type of spearphishing emails are the ones that do not only claim to be written by a certain person, but are also sent by that per-son's email account, which has been compromised. Spearphishing emails are very dangerous for companies, because they can be the starting point to a more sophisticated attack or cause intellectual property theft, and lead to high financial losses. Currently, there are no effective systems to protect users against such threats. Existing systems leverage adaptations of anti-spam techniques. However, these techniques are often inadequate to detect spearphishing attacks. The reason is that spearphishing has very different characteristics from spam and even traditional phishing. To fight the spearphishing threat, we propose a change of focus in the techniques that we use for detecting malicious emails: instead of looking for features that are indicative of attack emails, we look for emails that claim to have been written by a certain person within a company, but were actually authored by an attacker. We do this by modelling the email-sending behavior of users over time, and comparing any subsequent email sent by their accounts against this model. Our approach can block advanced email attacks that traditional protection systems are unable to detect, and is an important step towards detecting advanced spearphishing attacks.
Conference Paper
Full-text available
Fake identities and Sybil accounts are pervasive in today's online communities. They are responsible for a growing number of threats, including fake product reviews, malware and spam on social networks, and astroturf political campaigns. Unfortunately, studies show that existing tools such as CAPTCHAs and graph-based Sybil detectors have not proven to be effective defenses. In this paper, we describe our work on building a practical system for detecting fake identities using server-side clickstream models. We develop a detection approach that groups "similar" user clickstreams into behavioral clusters, by partitioning a similarity graph that captures distances between clickstream sequences. We validate our clickstream models using ground-truth traces of 16,000 real and Sybil users from Renren, a large Chinese social network with 220M users. We propose a practical detection system based on these models, and show that it provides very high detection accuracy on our clickstream traces. Finally, we worked with collaborators at Renren and LinkedIn to test our prototype on their server-side data. Following positive results, both companies have expressed strong interest in further experimentation and possible internal deployment.
Article
Stylometry is a method for identifying anonymous authors of anonymous texts by analyzing their writing style. While stylometric methods have produced impressive results in previous experiments, we wanted to explore their performance on a challenging dataset of particular interest to the security research community. Analysis of underground forums can provide key information about who controls a given bot network or sells a service, and the size and scope of the cybercrime underworld. Previous analyses have been accomplished primarily through analysis of limited structured metadata and painstaking manual analysis. However, the key challenge is to automate this process, since this labor intensive manual approach clearly does not scale. We consider two scenarios. The first involves text written by an unknown cybercriminal and a set of potential suspects. This is standard, supervised stylometry problem made more difficult by multilingual forums that mix l33t-speak conversations with data dumps. In the second scenario, you want to feed a forum into an analysis engine and have it output possible doppelgangers, or users with multiple accounts. While other researchers have explored this problem, we propose a method that produces good results on actual separate accounts, as opposed to data sets created by artificially splitting authors into multiple identities. For scenario 1, we achieve 77% to 84% accuracy on private messages. For scenario 2, we achieve 94% recall with 90% precision on blogs and 85.18% precision with 82.14% recall for underground forum users. We demonstrate the utility of our approach with a case study that includes applying our technique to the Carders forum and manual analysis to validate the results, enabling the discovery of previously undetected doppelganger accounts.
Article
Online accounts are inherently valuable resources-both for the data they contain and the reputation they accrue over time. Unsurprisingly, this value drives criminals to steal, or hijack, such accounts. In this paper we focus on manual account hijacking-account hijacking performed manually by humans instead of botnets. We describe the details of the hijacking workflow: The attack vectors, the exploitation phase, and post-hijacking remediation. Finally we share, as a large online company, which defense strategies we found effective to curb manual hijacking. Copyright
Conference Paper
As web services such as Twitter, Facebook, Google, and Yahoo now dominate the daily activities of Internet users, cyber criminals have adapted their monetization strategies to engage users within these walled gardens. To facilitate access to these sites, an underground market has emerged where fraudulent accounts - automatically generated credentials used to perpetrate scams, phishing, and malware - are sold in bulk by the thousands. In order to understand this shadowy economy, we investigate the market for fraudulent Twitter accounts to monitor prices, availability, and fraud perpetrated by 27 merchants over the course of a 10-month period. We use our insights to develop a classifier to retroactively detect several million fraudulent accounts sold via this marketplace, 95% of which we disable with Twitter's help. During active months, the 27 merchants we monitor appeared responsible for registering 10-20% of all accounts later flagged for spam by Twitter, generating $127-459K for their efforts.
Conference Paper
File hosting services (FHSs) are used daily by thousands of people as a way of storing and sharing files. These services normally rely on a security-through-obscurity approach to enforce access control: For each uploaded file, the user is given a secret URI that she can share with other users of her choice. In this paper, we present a study of 100 file hosting services and we show that a significant percentage of them generate secret URIs in a predictable fashion, allowing attackers to enumerate their services and access their file list. Our experiments demonstrate how an attacker can access hundreds of thousands of files in a short period of time, and how this poses a very big risk for the privacy of FHS users. Using a novel approach, we also demonstrate that attackers are aware of these vulnerabilities and are already exploiting them to get access to other users' files. Finally we present SecureFS, a client-side protection mechanism which can protect a user's files when uploaded to insecure FHSs, even if the files end up in the possession of attackers.