Tell Me About Yourself: The Malicious CAPTCHA Attack
Draft of version to appear in the WWW Conference, April 2016. Comments appreciated!
Dept. of Computer Science
College of Management, Academic Studies
Dept. of Computer Science
Bar Ilan University
We present the malicious CAPTCHA attack, allowing rogue
sites to trick users into unknowingly disclosing their pri-
vate information. This circumvents the Same Origin Policy
(SOP), whose goal is to prevent direct access by the rogue
site to such private information. The rogue site exploits
the fact that sites often display some private information to
users upon request; the rogue site displays such private in-
formation in what appears to the user as a CAPTCHA. The
text is obfuscated, hence, users are unaware that by solv-
ing the CAPTCHA, they are disclosing private information
to the rogue site. Information so disclosed includes name,
phone number, email and physical addresses, search history,
preferences, partial credit card numbers, and more.
The vulnerability is common. We conﬁrmed that the at-
tack works for many popular sites, including nine out of the
ten most popular websites. Our results are conﬁrmed using
IRB-approved, ethical user experiments.
Rogue websites exploit browser vulnerabilities, ‘con’ the
user (phishing, scams, malware,...), or perform cross-site at-
tacks, to extract or manipulate user information at ‘victim’
The main defense against cross-site attacks is the Same
Origin Policy (SOP) access-control mechanism , imple-
mented in all browsers. Grossly simpliﬁed, the SOP al-
lows a script received from one site, say rogue.org, to ac-
cess only objects and responses from sites in the same do-
main (rogue.org). The goal is to prevent rogue websites from
learning information about the interaction of the user with
other, target sites (which are from diﬀerent ‘origin’), while
allowing ‘legitimate’ web use. Speciﬁcally, the SOP allows
websites to receive and access content from sites in ‘the same
origin’, as well as to embed objects from other websites,
mainly, images, scripts, and webpages (in frames). Browsers
do not provide facilities for scripts to read the contents of
such embedded objects.
Attackers use diﬀerent attack methods to circumvent the
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for proﬁt or commercial advantage and that copies
bear this notice and the full citation on the ﬁrst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior speciﬁc
permission and/or a fee.
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.
SOP; most notable are cross-site scripting (XSS) attacks,
which trick the browser into running a script provided by
the attacker, as if it is an authorized script from the target
website. Most XSS and other cross-site attacks exploit im-
plementation vulnerabilities of the target website (and/or of
the browser or players/extensions, e.g., Flash). By exploit-
ing these vulnerabilities, these attacks allow scripts from the
rogue site to extract user’s data from target sites, circum-
venting the access restrictions imposed by the SOP on the
ﬂow of data from the target sites to the script loaded from
the rogue site.
We show that the SOP is not suﬃcient to prevent leakage
of sensitive user information to a rogue site – even when
properly used and implemented– without vulnerabilities.
We present the cross-site malicious CAPTCHA attack,
which circumvents the SOP to expose private details of the
user, kept by the ‘victim’ site. In contrast to XSS and other
known attacks, the malicious CAPTCHA attack does not
extract information from the (SOP-restricted) ﬂow from site
to browser, or depend on browser or web-site vulnerabilities.
Instead, the user is tricked into providing her own informa-
tion to the attacker, i.e., the attack exploits the user as a
Speciﬁcally, the rogue site presents to the user what ap-
pears as an ordinary CAPTCHA. In reality, this is one or
often multiple small frames containing private information
of the user, embedded from the target website. The user, un-
aware of the fact that the CAPTCHA contains her own infor-
mation, ‘answers’ it, i.e., types and sends the contents to the
attacker. This ‘user side-channel’ can be similarly abused in
other ways, e.g., displaying the private information as part
of a game or typing-test. We focused on CAPTCHA since
it is a generic, well-deﬁned and widely-used mechanism. In-
deed, users are so used to being forced to solve CAPTCHAs,
that they usually just ‘solve’ the CAPTCHA by ﬁlling in
the required response, without paying much attention to the
contents. See .
An example for a CAPTCHA that encapsulates the Gmail
address of the victim appears in Figure 1.
The malicious CAPTCHA attack can only expose infor-
mation that is displayed by the target website to the user,
and furthermore, allowed to be embedded in a diﬀerent site
(in a frame). Sites often restrict the presentation of their
pages within frames, e.g., using the XFO header, mainly
to defend against clickjacking attacks ; see Section 2.3.
This limits the scope of information that can be included
in the malicious CAPTCHA in a site-speciﬁc way. How-
ever, Table 1 shows that all but one of the ten most-popular
(a) Gmail application cache manifest
(b) Anagram CAPTCHA of ﬁrst 15 characters
Figure 1: Exposing the ﬁrst 15 characters of the Gmail
address of the victim (email@example.com), using ﬁxed-
width font anagram CAPTCHA
Name Additional information
Youtube √Email address, followed users, liked
items, Google+ circles
Facebook √Followed users, liked items
Yahoo √Anti-CSRF token
Baidu √Personal information: calendar, age,
education, job, interests etc.
Amazon √Number of items in the cart, customer
Taobao √Physical address, phone number,
QQ √Subject and details of recently received
Live & Bing
& Outlook √Search history, Skype client ID
eBay √Physical address
Aliexpress √Email and physical addresses, birth-
day, phone number
Craigslist √Email address, saved searches
Booking.com √Credit card last digits and expiration
Table 1: Information vulnerable to the malicious-
CAPTCHA attack for the ten most popular sites , and
below, for four other important sites.
websites, and other important sites, allow the embedding of
pages/objects containing sensitive user information, i.e., are
vulnerable to the malicious CAPTCHA attack. As shown,
the malicious CAPTCHA may expose the name of the victim
user, email and physical addresses, information about pref-
erences, personal information provided by the user, search
history, partial credit card details, and more. This infor-
mation can be abused for phishing and other attacks. In
particular, the attack exposes the Yahoo! anti-CSRF token,
thereby facilitating CSRF attacks.
We validated the eﬀectiveness of our malicious
CAPTCHA attacks in ethical, IRB-approved user ex-
periments. In particular, we evaluated diﬀerent obfuscation
methods, such as shuﬄing the iframes and adding dummy
data to the CAPTCHA, to make it even harder for users
to detect the attack. In Section 4, we further develop the
attack and show how to use a single malicious CAPTCHA
to answer multiple questions about private data.
Ethics. We disclosed our attacks to the relevant websites
and browser vendors, to allow them to incorporate defenses.
All of our usability experiments were IRB-approved.
•We introduce the malicious CAPTCHA attack and
show that it exposes sensitive information about users
from most popular sites. We also show that for some
popular sites, it is possible to steal tokens that can be
exploited for other attacks (see Table 1).
•We present diﬀerent obfuscation methods for the ma-
licious CAPTCHA attack.
•We present a variant of malicious CAPTCHA attacks
for eﬃciently exposing multiple Boolean properties.
•We evaluate their eﬀectiveness using IRB-approved,
ethical user experiments.
•We present defenses against our attacks
Organization. In Section 2 we describe the adver-
sary model and brieﬂy provide the necessary background
on CAPTHA and iframes. In Section 3, we present and
demonstrate several variants of the malicious CAPTCHA
attack. In Section 4 we show how to use a single malicious
CAPTCHA to answer multiple questions about private data,
and demonstrate the attack to learn about the preferences
of Facebook and Google users, and to expose the search his-
tory of Bing users. In Section 5 we evaluate the attacks
of the previous sections. Before we conclude the paper, we
dedicate two sections to defenses and to related work.
2.1 Adversary Model & Roadmap
To launch the malicious CAPTCHA attack, only two
modest capabilities are required from the attacker: (1) con-
trol one or more malicious web pages, and (2) to be able to
‘lure’ victims into visiting these pages.
is enabled in the victim’s browser. However, it is possible to
case, the attacker might present a CAPTCHA to a website
for which the user is not authenticated.
Many attacks are possible under this adversary model.
However, the most severe attacks, such as cross site scripting
(XSS) or installing malware, require ﬁnding vulnerabilities
in websites or browsers. While it is considered very diﬃcult
to ﬁnd such vulnerabilities in popular browsers and websites
like Gmail or Facebook, the malicious CAPTCHA can be
applied on both of them.
Stealing information about the user can also be done
via social engineering. However, unlike the malicious
CAPTCHA attack, in social engineering the user does know
that she is giving away her details. Many users might de-
liver incorrect or ﬁctitious details if they are asked to do so
The roadmap of the malicious CAPTCHA attack contains
three steps: (1) luring the victim into the attacker’s mali-
cious page, (2) determining the websites for which the victim
is authenticated, and (3) manipulating personalized infor-
mation of the victim from such a website and presenting it
as a malicious CAPTCHA. We brieﬂy describe the ﬁrst two
steps; the third step is discussed in depth in the following
Luring the victim user to the attacker’s webpage.
The basic requirement of the malicious CAPTCHA attack,
as well as any other cross-site attack, is to cause the user
to visit the attacker’s page. This requirement is consid-
ered easy, and is the ﬁrst step of several other known at-
tacks. To lure random users into the website, the attacker
can use legitimate site-promotion techniques. For example,
the attacker can advertise a free downloads website, and
require the user to ﬁll in a CAPTCHA before a ﬁle is down-
loaded. Other ways to lure many users, or even a speciﬁc
one, is using phishing emails and social-engineering tech-
niques [14,19, 20].
Determine target websites to which the user is
currently logged-on. Because the malicious CAPTCHA
attack relies on loading personalized pages of websites, it
is crucial for the attacker to ﬁrst determine the websites
to which the victim is logged-on. The adversary can use
known cross-site login detection techniques [7,12, 21, 22,29].
Alternatively, for very popular web services such as Google
and Facebook, the adversary can assume that the victim is
logged-in and a posteriori verify the login status from the
input of the victim to the malicious CAPTCHA.
CAPTCHA, short for Completely Automated Public Tur-
ing test to tell Computers and Humans Apart , is a
challenge-response test used by websites to distinguish be-
tween requests sent by ‘real human users’ versus requests
sent by automated agents such as bots and scripts (or
HTML). For example, CAPTCHAs are used to prevent
spammers from creating many email accounts for spam cam-
paigns, and to block CSRF attacks .
CAPTCHA is a very common defense mechanism, and the
average web surfer is accustomed to taking such challenges
and solving them. Solving CAPTCHA challenges is usually
required when the user creates a new account or changes
settings. It is also used when web services detect abnormal
behavior, either from the account or from the IP address of
the user. For example, Google prompts a CAPTCHA when
an aberrant sequence of search requests are sent.
A CAPTCHA is believed to be very hard to solve us-
ing automation (i.e., by a machine/program, without in-
volving a person), but easy for humans to solve. How-
ever, it is common to ﬁnd CAPTCHAs that are relatively
hard (yet possible) for humans to solve. The most typi-
cal form of CAPTCHA challenge is an image of deformed
text. The user is asked to type the text shown in the image,
thereby ‘proving’ she is human. Signiﬁcant deformation is
required to prevent sophisticated malicious programs from
solving the CAPTCHA. Indeed, several techniques were
presented to automatically break CAPTCHAs. The most
prominent techniques involve optical character recognition
and mechanisms to identify distorted characters [24, 32]. In
spite of these methods, CAPTCHA is considered a reliable
countermeasure against automated malicious actions. Of
course, for our attacks, the diﬃculty of automatically solv-
ing CAPTCHAs is irrelevant. It suﬃces that users solve the
CAPTCHA without suspecting an attack, thereby exposing
their own private information.
2.3 Loading Webpages in iFrame and Click-
All modern browsers allow a host webpage to include
frames called HTML iframes from other ‘guest’ websites.
This is used in many legitimate ways. Since the Same Ori-
gin Policy prevents the host webpage from accessing the con-
tents of the guest website, this feature is considered to be
secure against leakage of information.
However, the use of frames is known to facilitate click-
jacking [4, 15, 27] attacks. Clickjacking (also called UI re-
dress attack, UI redressing), is a malicious technique that
tricks its victim into unconsciously performing actions in a
guest website. In clickjacking, the attacker uses CSS fea-
tures to load a frame containing part of the guest webpage
as an invisible, transparent layer, over a benign-looking link
or button in the hosting webpage. The victim user is manip-
ulated into clicking the benign-looking link or button shown,
not knowing that she is actually clicking on a link or a but-
ton in the host website, possibly purchasing some product or
unintentionally signaling ‘like’ to some item (‘Likejacking’).
The most eﬀective countermeasure against clickjacking, is
the X-Frame-Options (XFO) HTTP response header. This
server-client defense was proposed and integrated into In-
ternet Explorer by Microsoft in 2009 , and later imple-
mented in the other popular browsers. The XFO header
allows a website to restrict other webpages from loading
(hosting) its webpages in an iframe. The header takes one
of the three values: DENY, SAMEORIGIN, or ALLOW-
FROM (list of origins). Websites should return the XFO
header set to either DENY or SAMEORIGIN with every
HTTP response containing a webpage that should not be
hosted. On the client side, the browsers need to examine
the XFO header before they load a page in an iframe. If
the XFO header does not allow the guest page to be loaded
with the particular host, the browser should not load it;
most browsers present a message in the iframe instead.
Today, the awareness of clickjacking is high. The rec-
ommended and widely-adopted practice is to use appropri-
ate XFO headers on every page that contains links or other
HTML elements that may be clickjacked, especially any sen-
sitive operations. Yet, as shown in Table 1, it is possible to
extract information from most of these websites. Typically,
this is due to pages that display private information, but
to which XFO was not applied, since they do not contain
links, buttons, or other elements that may be clickjacked to
cause some sensitive operation. Namely, our results moti-
vate extending the practice of including XFO in the response
header, and apply it also to pages that contain sensitive, pri-
3. THE MALICIOUS CAPTCHA ATTACK
The malicious CAPTCHA attack tricks web users who are
used to solving CAPTCHAs into copying and sending infor-
mation about themselves to the attacker, without realizing it
is their personal information. Basically, the attacker loads
personalized webpages with private information in several
iframes, and uses CSS tricks to make the set of iframes ap-
pear as a CAPTCHA, without arousing the user’s suspicion.
We assume that a victim user is visiting an attacking,
rogue website, and that the victim is authenticated to a tar-
get website. This allows the rogue site to use an inline frame
(iframe) to load some of the target-site pages containing pri-
vate, personal details. We refer to the page containing the
private information as the target page, and to the sensitive,
private information that the attacker wants to expose, as
the target private record (TPR).
There are three challenges to the malicious-CAPTCHA
attack: (1) The attacker needs to identify exploitable TPR
that can be loaded in a frame within the rogue webpage
(in particular, not blocked by appropriate XFO header).
(2) The attacker needs to present the TPR in one or more
frames, while avoiding attack detection.; the user may detect
the attack due to the content (e.g., user recognizing her own
name in the CAPTCHA), or due to diﬀerences in style from
typical CAPTCHA. (3) The attacker needs to recover the
TPR from the user’s response to the CAPTCHA, possibly
dealing with user mistakes (e.g., using only lowercase) and
possible presentation errors.
We ﬁrst explain how it is possible to identify exploitable
TPRs in Section 3.1. In Section 3.2 we discuss simple
CAPTCHA, which presents one frame containing the TPR.
Simple CAPTCHA is limited to scenarios in which the user
is unlikely to recognize the TPR. In Section 3.3 we discuss
anagram obfuscation. In Section 3.4, we describe two ad-
vanced variants of the attack, and in Section 3.5 we present
a detailed example. Finally, in Section 3.6, we discuss the
technical aspects of transforming a TPR into a CAPTCHA
to minimize the risk of raising suspicion while maximizing
3.1 Identify Exploitable TPR
We followed a simple algorithm to identify private or sen-
sitive information from a website that can be loaded in an
iframe. We ﬁrst created an account for the website and cre-
ated a list with private information or sensitive data that
might be related to this account. This is actually a list of
known TPR values that we would want to extract if our ac-
count was the victim. Next we crawled the website while the
account was authenticated and recorded the traﬃc using an
HTTP proxy. Finally, we searched for the list’s TPRs in the
HTTP responses that did not contain the XFO header. We
also veriﬁed that the TPR are indeed visible to avoid cases
are not presented by the browser.
We also searched the TPRs in the browser using regular
string searches to overcome cases where a TPR is received in
an HTTP response with XFO header, but is later presented
in a page that can be loaded in an iframe (e.g., using AJAX).
3.2 Simple CAPTCHA
The simple CAPTCHA uses a single iframe that contains
and presents to the user the whole TPR as the CAPTCHA
text (or some of the CAPTCHA text). This technique is
practical only when the user is unlikely to recognize the
fact that the TPR contains sensitive, private information.
For example, using simple CAPTCHA to present a user-
recognizable TPR, such as one containing the user’s name,
may cause users to suspect the rogue website of an attack.
Surprisingly, we found that some TPRs contain sensitive,
privacy-intrusive content, yet are unlikely to raise users’ sus-
picions, even when presented directly to the user in a simple
CAPTCHA. We oﬀer two such examples.
ID numbers. Many websites identify users, groups, and
other items using unique user ID numbers. Most of the users
do not know their user ID, or the ID of the items/pages
they visit. However, detecting the ID of an item is usually
equivalent to the detection of the item itself, since mapping
an ID to its item is usually a simple procedure.
Speciﬁcally, we found that in Microsoft email service (out-
(a) Crumb received as JSON
(b) Simple anagram
Figure 2: Simple CAPTCHA based on Yahoo! anti-CSRF
look.com), it is possible to load a JSON ﬁle in an iframe,
such that the Skype client ID of the user appears in a ﬁxed
position relative to the beginning of the response. Similarly,
it is possible to load the customer ID of Amazon users.
Anti CSRF tokens. Anti-CSRF tokens are sent to-
gether with HTTP requests to detect and prevent CSRF
attacks . Anti-CSRF tokens should be infeasible to forge
or predict, and as such, they are usually generated pseudo-
randomly. Most users are not even aware of the existence of
these tokens. Nevertheless, an attacker who gains an anti-
CSRF token of a user that already visits the attacker’s web-
site, can often launch a successful CSRF attack, and thereby
perform operations as if it was authorized by the user.
Speciﬁcally, Yahoo! uses several CSRF tokens, called
crumbs. We found that it is possible to load a short JSON
ﬁle that contains a crumb used in the Yahoo! Messenger ap-
plication. This crumb is included with every message sent
out by the Messenger application.
The original JSON ﬁle, with the crumb and the simple
CAPTCHA that is based on it, appear in Figure 2.
3.3 Anagram CAPTCHA
An anagram is a word or phrase that is created by re-
arranging the letters of another word or phrase. For ex-
ample, Elvis can be rearranged as livEs. In an anagram
CAPTCHA, the attacker displays diﬀerent parts of the TPR
in tiny iframes, which we call fragments. The fragments
are permuted from their original positions using a randomly
chosen permutation, thereby creating an anagram. The at-
tacker, upon receiving the permuted fragments from the vic-
tim, can reproduce the TPR.
The mere fact that the CAPTCHA contains an anagram
of the text from the TPR, may suﬃce to make it less likely
for the victim user to suspect the anagram CAPTCHA. In-
deed, the anagram is essentially a transposition cipher, ap-
plied once using a randomly chosen key.
3.3.1 Fixed-width Font Anagram CAPTCHA
In a ﬁxed-width font anagram CAPTCHA, the attacker
displays each letter of the TPR in a tiny iframe, which we
call a fragment.
Clearly, this type of CAPTCHA works best when the
TPR is presented using a ﬁxed-width (also called monospace
or non-proportional) font; namely, every character occupies
the same amount of horizontal space. Most websites do
not use monospace ﬁxed-width fonts. However, by default,
Google Chrome, Mozilla Firefox, and the Safari browsers
use monospace font to present HTTP responses that contain
plain text (without HTML tags), e.g., JSON ﬁles. In par-
ticular, this holds for the examples presented above (in Sec-
tion 3.2) of TPR that can be used in a simple CAPTCHA.
There are other types of TPR that can be presented using
ﬁxed-width font CAPTCHA. For example, it is possible to
load the Gmail address of a user in an iframe. The Gmail
address appears as a remark in a ﬁxed position within an
application cache manifest  that can be loaded in an
iframe. Hence, the rogue website can use a ﬁxed-width ana-
gram CAPTCHA to expose the user’s Gmail address. See
Figure 1 for an example of a manifest and the corresponding
3.3.2 Variable-width Font CAPTCHA
We found that most of the private data that can be loaded
in an iframe, cannot be displayed using ﬁxed-width font. Us-
ing the method of Section 3.3.1 would not work well, since at-
tackers cannot isolate each character into a separate iframe.
In these cases, the attacker can create a variable-width font
anagram CAPTCHA, which is basically a permutation of
blindly cut pieces of the TPR.
The main challenge in creating such an anagram
CAPTCHA is that with variable-width (proportional) fonts,
each character takes up only as much width as required by
the shape of the character; some are wider, some are nar-
rower. For example, the letter ltakes signiﬁcantly less space
than the letter mor even L.
When ‘blindly carving’ a small iframe containing only a
small part of the TPR, using variable-width font, the width
of each letter depends on the letter itself. Because the letters
are not known in advance, it is inevitable that some letters
will be ‘cut’. Namely, it is impossible for the attacker to split
the TPR into several fragments, each containing a whole -
yet single - letter.
Assuming that each TPR letter should appear in some
fragment, in its entirety, each fragment should be wider than
the widest letter. Moreover, to ensure that even the widest
letter is not cut, the width of the fragment should be twice
as wide as the widest letter. However, for some fonts, using
this value will result in fragments that are too wide; this may
alert the user into noticing part of her private information,
possibly exposing the attack. In our experimental validation
(see Section 5.1), we used a fragment width that is a bit more
than twice the width of most of the small letters.
Additionally, for every fragment width, simply cutting the
TPR into disjoint fragments will probably result in cut let-
ters. To facilitate reconstruction of the TPR, the fragments
should have some overlap between them. Note that frag-
ments will often contain partial, ‘cut’ letters in their right-
hand and/or left-hand edges. In our experiments, we in-
structed participants to ignore cut letters using a message
placed below the CAPTCHA.
See Figure 4(a) for an example of variable-length font
CAPTCHA, exposing the user’s name (in this example,
‘Inno Cent’). The CAPTCHA contains six (permuted)
iframes, each containing a part of the Facebook comments
box (Figure 3).
3.4 Avoiding Detection and Improving TPR
The CAPTCHAs we described so far have two main weak-
1. Avoiding detection may be a challenge, especially when
the TPR is short, contains repeating characters, or
is embedded using variable-width font (Section 3.3.2),
where a single fragment might contain two or more
2. TPR recovery may also be a challenge, especially when,
due to variable-width font, fragments contain parts of
characters and overlapping characters. Furthermore,
the above methods do not provide any mechanism to
detect (or correct) user-errors.
We now present two techniques that help overcome these
Camouﬂaged CAPTCHA. The camouﬂaged
CAPTCHA adds ‘dummy’ characters to the CAPTCHA,
to camouﬂage the fact that the anagram uses the same
characters as the victim user’s TPR. This addition of
irrelevant letters makes it even harder for the victim to
detect the TPR. This is especially signiﬁcant when the
TPR is very short or contains very few characters.
Additionally, the attacker can use the ‘dummy’ characters
to verify that the victim’s input reﬂects some of the challenge
text presented, and to localize the characters. Unlike the
TPR, the attacker knows the dummy letters, and if they do
not appear in the solution, it is an indication that the user
might not have ﬁlled in the CAPTCHA correctly.
Two-step CAPTCHA. CAPTCHAs have become
harder and harder to solve to prevent bots from cracking
them. Web users have become accustomed to obfuscated
CAPTCHAs that are diﬃcult to solve, and to failures in
solving them. The two-step CAPTCHA exploits this fact
to separate the TPR into two CAPTCHAs, where each is
an anagram (possibly camouﬂaged) on diﬀerent parts of
the TPR. The attacker presents the ﬁrst CAPTCHA; once
the victim completes it successfully, the attacker contin-
ues as though an incorrect solution was received, and re-
places the ﬁrst CAPTCHA with the second. With two-step
CAPTCHA, the whole TPR does not appear on the vic-
tim’s screen at any one time. This makes the detection of
the TPR harder. The two-step CAPTCHA also allows expo-
sure of more parts of a long TPR and the use of camouﬂage
characters to detect errors and ease recovery.
3.5 Example: Exposing Facebook Username
Facebook prevents its pages from appearing in the iframes
of other websites. However, its social plugins  allow other
websites to integrate Facebook features. The comments box
is one of these plugins. This box presents comments by
Facebook users, allowing a logged-in Facebook user to com-
ment. Because the name of the logged-in client appears in
the comments box, an attacker can embed fragments of the
comments box in malicious CAPTCHAs to extract the name
of the user.
Personal names diﬀer in their lengths; hence, in some cases
a long name may not completely appear in the CAPTCHA.
However, we found that the attacker can signiﬁcantly con-
trol the characters displayed, using text width and language
(text direction), e.g., to extract the ﬁrst or the last names as
required. For example, if the attacker wants to get the last
name, she should use a comments box in which the name
is adjusted right-to-left (Figure 3). She can then create the
fragments in this direction, beginning with the position of
the last letter of the name. Figure 4 demonstrates the three
techniques based on loading the Facebook comments box
(Figure 3) in iframes.
3.6 Presenting Content as a CAPTCHA
We now explain how a rogue website can present the TPR
so it looks like a typical CAPTCHA, and how to transpose
the TPR as required for anagram CAPTCHAs. The chal-
lenge is the same origin policy (SOP), which does not al-
low direct access to the content of the target page, and in
Figure 3: Facebook comments box for user Inno Cent
(a) anagram (b) Camouﬂaged anagram
(c) Two-step camouﬂaged anagram
Figure 4: Three CAPTCHAs exposing the user name Inno
Cent by embedding the Facebook comments box (Figure 3).
The camouﬂaged-anagram CAPTCHA is based on a com-
ment box in which the name is adjusted left-to-right without
any text after it. In the other CAPTCHAs, it is possible to
see fragments of the phrase Posting as.
particular to the TPR. Instead, we use the fact that the
rogue site can load the target page, containing the TPR,
in an iframe, and manipulate its appearance. We now
explain HTML/CSS techniques to create fake CAPTCHA
from iframes of the target page, and discuss several chal-
Cutting the TPR. We use CSS to present only a spe-
ciﬁc part of the TPR, which we call a fragment. Speciﬁcally,
to load a fragment, we use a div element with the size of
the fragment, and load the target page within the div ele-
ment. Assume that in the target webpage, the position of
the fragment is given as oﬀsets of Lpixels from the left of
the page and Tpixels from the top. Placing the iframe in
oﬀsets of −Land −Tpixels to the left and the top of its de-
fault position in the div element, will present the fragment
inside the div element. We use CSS to hide any part of the
iframe that is presented outside the boundaries of its parent
Controlling the size of the text. We use the scaling
option of CSS to manipulate the size of the div element, so
that its size is appropriate for use in CAPTCHA. In many
websites, the TPR appears in tiny font, which is not usually
used in CAPTCHA; hence, scaling is crucial.
CAPTCHA look and feel. Although we cannot control
the style of the target page, it is possible to aﬀect its view
by adding semi-transparent elements over it. This is impor-
tant because the text’s color and the background may help
the user recognize the target page. In addition to the chal-
lenge text, CAPTCHAs often involve some lines on the text
or in the background. We imitate this behavior by adding
semi-transparent pictures of lines. It is also possible to ro-
tate some of the fragments; this is another transformation
visually similar to the transformations used in CAPTCHA.
Preventing access to the target webpage. Present-
ing a webpage in an iframe does not restrict the user from
accessing it. Even if scrolling is disabled for the iframe, it
is still possible to focus on the iframe and to move to other
parts of the page. This may allow users to notice that this
is not a regular CAPTCHA. Another problem occurs if the
TPR is a hyperlink; in this case, the mouse pointer changes
when the user rolls over the TPR. Clicking on the link will
load another page in the middle of the CAPTCHA. Even if
the TPR is not a hyperlink, a real CAPTCHA is presented
as an image and the user should not be able to mark the
text of the fake CAPTCHA.
To prevent detection, we cover the iframes with transpar-
ent elements. We also disable focusing on elements in the
iframe using the Tab key; this is done by setting the tabIndex
attribute of every iframe to -1.
4. MALICIOUS CAPTCHA FOR MULTI-
PLE BOOLEAN QUESTIONS
The CAPTCHAs we presented so far trick users into pro-
viding the attacker with the answer to an open question,
e.g., what is your name. In such cases, the attacker needs
to trick the user into disclosing all, or at least most, of the
characters in the answer. Since the length of the response is
usually unknown, the attacker can generally expose at most
a single ‘answer’ per CAPTCHA.
However, as we discuss in this section, the attacker is of-
ten interested in answers to Boolean or multiple-choice ques-
tions, e.g., to test for speciﬁc history or preferences of the
user. In such cases, it is not necessary to load the whole
response to the CAPTCHA. Instead, it is possible to load a
minimal fragment of the response that will be diﬀerent be-
tween the cases. Because usually only single small fragment
is necessary to get the answer, the same CAPTCHA can be
used to combine several fragments that will answer multiple
We describe two examples for such a CAPTCHA. We be-
gin with Facebook and Google, where the attacker learns
whether the victim likes something or follows someone. We
continue with more complicated cases, in which the attacker
can ask questions about terms that might appear in the
search history of a Bing user.
4.1 Facebook and Google: Detect Likes and
To defend against clickjacking attacks, Google and Face-
book do not allow third-party sites to load any of their pages
in iframes. However, both Google and Facebook oﬀer a ‘but-
ton’ (social plugins), allowing their users to “like” items and
to follow other users. Websites are encouraged to embed
these buttons in their pages to promote their content via
In Section 3.5, we showed that it is possible to exploit
Facebook social plugins to extract the name of the Facebook
user; the same attack works for Google. We now show that
it is possible to learn additional information in both Google
and Facebook. Speciﬁcally, we focus on followed people and
liked items. Both are considered private data with potential
for diﬀerent forms of abuse, and both can be extracted from
users of Google and Facebook. We now explain how to de-
tect which items a Facebook user likes, among some set of
items, and to similarly detect followed Google+ users.
4.1.1 Detect Facebook Likes
Facebook allows third-party websites to embed Like but-
tons for arbitrary chosen items. In the standard view, a
label with some text comes with the button. This text is
changed depending on whether the authenticated user likes
the item or not. To detect the like status, it is enough to
check the position of some letter for one of the options, and
to decide, based only on its value. For example, if the ﬁrst
letter is ‘Y’, then this is the beginning of “You and XXXX
others like this” or “You like this”. Otherwise, either the
‘B’ of ‘Be the ﬁrst of your friends to like this’, or the ﬁrst
digit of the number of liking users might appear. Because
one letter is enough to detect whether a given item is ‘liked’,
it is possible to use only one of the multiple frames in the
CAPTCHA for each ‘like’ validation. This leaves the other
frames for a side-channel of other information, such as the
‘like‘ of additional items.
4.1.2 Detect Followed Google+ Users
Google+ allows third-party websites to oﬀer a Follow but-
ton to visitors, specifying any Google+ user proﬁle. When
a user clicks on the button, Google+ adds the user as a fol-
lower of the speciﬁed proﬁle. Unlike Facebook, there is no
label with changing text as a function of the follow status.
Instead, the text and the color of the button itself change.
The button has the text “Follow” and is colored in white if
the user does not follow the user. Otherwise, either “Follow-
ing” or the number of Google+ circles that is relevant to the
followed user, appear on the button with a blue background.
To diﬀerentiate between the cases, we can use some of the
techniques presented by Weinberg et al.  to distinguish
between hyperlinks marked in diﬀerent colors. For example,
it is possible to take parts from the buttons that do not
contain text, and to write digits with them over a white
background. The digits will not be seen if the button is
white like the background; it will be visible only if the user
follows the candidate and hence the button is blue. By using
an appropriate font, a single digit can even be used to test
more than one button; see .
4.2 Extracting Bing Search History
Search history exposes sensitive information about the
user, such as interests, needs, and actions; see . Sev-
eral works even discuss the challenge of hiding the search
queries of users from the search engines themselves; see 
and references within.
In this subsection, we show how it is possible to expose
parts of the search history of users for the Bing search en-
gine. Like other popular search engines, by default, Bing
saves its users’ search history. This information is used to
personalize the search results and the advertisements, as well
as to oﬀer more relevant autocomplete suggestions when the
user types in the search box.
For each letter that an authenticated user types into
the Bing search box, the Bing client-side script sends a
request for personalized autocomplete suggestions for the
typed term. Bing dedicates the ﬁrst autocomplete sugges-
tions to terms that appear in the search history of the user,
such that the typed term is a preﬁx of them.
Our attack exploits two features of Bing’s autocomplete
mechanism. First, autocomplete suggestions can be loaded
in an iframe. Furthermore, Bing allows cross-site autocom-
plete requests. Namely, websites visited by a Bing user
Table 2: Terms for which the attacker requests autocomplete
suggestions (T0), to detect whether the victim searched for
other terms (T), and the letters that the content in their
expected position will appear in the CAPTCHA.
ﬁrst general sug-
gestion for T0
letters to search in
Ecstasy ecs ecs tuning ecstasy
LSD ls lsu lsd
Cocaine coca coca-cola cocaine
Heroin hero hero monkey heroin
can request personalized autocomplete-suggestions in the
name of the authenticated user. Bing does not deploy
tokens or any other mechanism to prevent cross-site re-
quests, and responds with personalized autocomplete sug-
gestions. History-based autocomplete suggestions are pre-
sented ﬁrst, such that suggestions based on newer/more pop-
ular searched terms appear higher. Consequently, the last
search query, or a very popular search query of the victim,
usually appears as the ﬁrst autocomplete suggestion for an
empty term. This is the case if the user just accesses the
search box without typing any letter. An attacker can load
this suggestion as the TPR of a malicious CAPTCHA at-
Similar to the Boolean CAPTCHA presented above for
Facebook and Google, the attacker can also ask several
Boolean questions about the search history in a single ma-
licious CAPTCHA. Say the attacker wants to test whether
or not the victim searched for a term T=t1t2. . . tn. As-
sume also that there is some value m <=n, such that the
ﬁrst general (not personalized) autocomplete suggestion re-
turned by Bing for the term T0=t1t2...tm(a preﬁx of T),
is diﬀerent from T. If the user has not searched for T, by
typing T0, the ﬁrst autocomplete suggestion will not be T.
Trivially, the attacker could just ask for T0and apply the
malicious CAPTCHA techniques on the ﬁrst autocomplete
suggestion (the TPR). However, in this case, most of the
TPR is neither important nor relevant. The beginning of
the TPR is expected to be T0and is already known to the
attacker. The rest of the letters are also not necessary, be-
cause the attacker knows the expected position of each letter,
when Tis the ﬁrst autocomplete suggestion. Hence, she can
just pick one or two letters, and load only the parts of the
page where they are expected to appear. If the victim ﬁlls
the CAPTCHA solution with the expected letters, then T
was probably suggested, which means that Twas searched
for by the victim.
Instead of asking about multiple terms and checking
whether they match the ﬁrst autocomplete suggestion, it
is possible to use the malicious CAPTCHA to search for a
term in several autocomplete suggestions returned by Bing.
4.2.1 Example: check four terms
Assuming the attacker wants to check whether the victim
searched for the following drugs: ecstasy, LSD, cocaine, and
heroin. Table 2 shows relevant T0for each of the terms, as
well as letters to use in the CAPTCHA. Figure 5 depicts the
given CAPTCHA in three cases: (1) the victim has searched
for all four terms, (2) the victim has searched for only two
terms, and (3) the victim has not searched for any of them.
(a) Template (b) All were searched (c) Only ecstasy and heroin
were searched (d) None was searched
Figure 5: Malicious CAPTCHA to detect whether each of four terms appears in the search history of the victim (see Table
2). Subﬁgure (a) is the template of the CAPTCHA; the dummy letters no,Sand gare used to detect incorrect input and to
separate between the four parts of the CAPTCHA: (1) ecstasy, (2) lsd, (3) cocaine, (4) heroin. The bold letters will appear
in the relevant frame of the template if the victim searched for the term.
We conducted two experiments to evaluate the eﬀective-
ness of the malicious CATPCHA attacks described in the
previous sections. We focus on the variable-width (propor-
tional) font CAPTCHA, described in Section 3.3.2, since
most TPRs appear in such fonts. Moreover, variable-width
CAPTCHA poses greater challenges to the attacker, com-
pared to ﬁxed-width and simple CAPTCHAs. We speciﬁ-
cally consider two main aspects:
1. Avoiding arousal of the victim’s suspicions
2. Reproducing the TPR
The ﬁrst experiment evaluated the eﬀectiveness of ana-
gram CAPTCHA, with and without the advanced variants
described in Section 3.4. The second experiment evaluated
the use of malicious CAPTCHA to get the answers to mul-
tiple Boolean questions, as described in Section 4.
5.1 Obfuscation Techniques Evaluation
In this section we describe an experiment that compares
the eﬀectiveness of the anagram CAPTCHA to that of
the camouﬂaged anagram CAPTCHA and to that of the
two-step camouﬂaged anagram CAPTCHA. The experiment
tests whether users suspect or notice when they type ana-
grams of their own details, and validates that the attack cor-
rectly recovers these details. For the experiment, we chose
to extract the name of Facebook users.
5.1.1 Experiment procedure
At the outset of the experiment, we asked the participants
to sign into their Facebook account. Then, we explained that
the goal of the experiment was to test whether they noticed
being manipulated, where the manipulation may involve a
change in a familiar mechanism. We explained that the par-
ticipants of the experiment were divided into two groups:
one that will be manipulated and a control group with no
manipulation. In practice, all the participants went through
the same procedure. Since the results were so good (lack of
detection), there was no need to compare to a control group,
which could only produce even better results. Throughout
the experiment, the participants could report any suspicious
action or to indicate that they detected some manipulation.
The technical procedure of the experiment included three
simple tasks; each involved clicking a link to some Facebook
page and answering a question. Before each of the tasks,
the participants were asked to solve a CAPTCHA “to prove
they are human”. We used the basic anagram technique and
its two advanced variants mentioned in Section 3.4, from the
hardest to detect (two-step camouﬂaged) to the easiest (ba-
sic anagram). In the basic anagram CAPTCHA we tried to
extract the last name, and in the camouﬂaged CAPTCHA
we tried to extract the ﬁrst name. In the two-step camou-
ﬂaged CAPTCHA we tried to extract both of them.
In each step of the procedure, below the CAPTCHA or the
question, the user was asked to report if there was anything
suspicious. Examples of all three CATPCHAs appear in
Figure 4. Notice, for the two-step and the basic anagram
CAPTCHAs, the string that comes before the name in the
target page “Posting as” may be rearranged together with
Ethics. We extracted only information that the user
agreed to give us in advance. Namely, the participants of
the experiment knew the experiment was on Facebook; they
voluntarily and consciously ﬁlled in a form with the infor-
mation that was later stolen. We obtained IRB approval for
Participants. 30 students participated in the experi-
CAPTCHA behavior. In addition to its basic graph-
ical layout and design that was based on the popular re-
CAPTCHA , our CAPTCHA implementation diﬀered
from typical CAPTCHAs in several ways:
1. No refresh or accessibility button. We did not imple-
ment a refresh mechanism that automatically gener-
ates a new CAPTCHA based on the previous results,
or an ‘accessibility’ CAPTCHA mechanism; speciﬁ-
cally, we could not convert the visual CAPTCHA to
an audio CAPTCHA.
2. To allow the attacker to separate between ﬁrst and last
names, we explicitly wrote below the CAPTCHA that
the CAPTCHA was case sensitive.
3. We did not replace the CAPTCHA after incorrect in-
put. Instead, we prompted a message to the user
telling her that the CAPTCHA was case sensitive and
that (completely) cut letters should be ignored.
All of the above changes only made it easier for the partic-
ipants to suspect the CAPTCHA. Yet, we found that these
changes did not have any eﬀect on the participants.
5.1.2 Suspicion evaluation
Once the participants completed the technical procedure
of the experiment, we presented three additional Yes/No
questions, with the option to add arguments in a text box.
The questions gradually gave additional information to the
user about the attack that actually happened:
1. Have you noticed anything suspicious?
2. Have you noticed any attempt to trick you into di-
vulging your personal information (such as email ad-
dress, name, address, etc.)?
3. Have you noticed that you actually inserted your name
in some of the CAPTCHA challenges?
(a) By the time it took for suspicion to be aroused
(b) By the kind of CAPTCHA that ﬁrst aroused sus-
Figure 6: Suspecting participants (%) by the time the sus-
picion was aroused and by CAPTCHA type
The ﬁrst question asks about an attack, the second ques-
tion describes the attack in general, and the last question
tells the user about the attack. Participants with serious
concerns were expected to return a positive answer to the
ﬁrst question. Others, who might have only the slightest
doubt, were expected to report in the last question.
Participants that returned a positive answer for the last
question, also reported the CAPTCHA in which they no-
ticed the name.
Results. As we expected, the detection rate was low.
During the experiments, only two participants reported that
they saw their names in one of the CAPTCHAs and became
suspicious. Even after we asked the second question and
hinted to the participants that the manipulation was about
tricking them to disclose details about themselves, no addi-
tional users indicated anything suspicious. Once we told the
participants what really happened, three of them indicated
that they saw fragments of their names. Figure 6(a) depicts
these results. Figure 6(b) presents the number of partici-
pants who were suspicious of each of the CAPTCHAs. The
results indicate that the camouﬂage techniques indeed make
it harder to detect the content.
The results of the experiment are even more impressive,
considering its environment. In a real website, no alert for
upcoming manipulation will be given to the victim. Addi-
tionally, we used three CAPTCHAs that encapsulated al-
most the same information; by itself, this fact could help
the victim recognize the attack in the later CAPTCHAs.
CAPTCHA complexity. During the experiment, the
participants were asked to write down anything suspicious.
Almost a third of the users wrote something about the
diﬃculty of solving the CAPTCHA, or about cut letters.
Yet, all of these reports were in the ﬁrst CAPTCHA.
Once the users noticed the special instructions of the
CAPTCHA (capital letters, ignoring cut letters), they solved
the other CAPTCHAs quickly. The average number of at-
tempts until the input matched the dummy pattern of the
CAPTCHA decreased from 2.23 in the ﬁrst step of the two-
step CAPTCHA, to 1.2 in the second part, and to 1.33 in
the camouﬂaged anagram.
5.1.3 TPR reproduction evaluation
We now evaluate how eﬀective it would be for the attacker
to extract the sensitive information from the inputs typed
by the participants. Notice, even if the users do their best
to ﬁll-in a correct response to the CAPTCHA, there may
be mistakes due to accidental typos, or, more often, due
to incorrect reading of the text in the CAPTCHA. Some
letters may appear more than once, completely or partially.
For example, a cut m, might be seen as ror n. See Section
We ﬁrst describe how we reproduced the TPR and then
present the evaluation of the reproduction process.
Reproducing the TPR. The correct text can be recon-
structed by taking advantage of the n-gram distribution of
the text, by estimating the likelihood of duplication between
speciﬁc characters, and by font used in the browser. In our
experiment, we manually recovered the TPR according to
a simple algorithm. Namely, we sorted the fragments by
their real order and omitted dummy characters. Then we
omitted characters that appeared twice, due to the overlap-
ping between fragments. For example, if the input for one
fragment ended with r, and the preﬁx of its next fragment’s
input was m, then we considered removing the rcharacter.
Similarly, we removed thin characters that were likely to ac-
tually result from the beginning or the end of other letters,
e.g., Iand l, as the beginning or end of Mor H. During
the whole procedure, we used a CAPTCHA simulator that
receives a name and shows what the CAPTCHA would look
like for a user with this name. We stopped when we got a
real name, for which the input of the user matched the text
that actually appeared in the CAPTCHA simulator for this
Evaluating the reproduction. Because the solutions
of the three CAPTCHAs overlap, we asked three volunteers
to reproduce the names; each reproduced the solutions of
one kind of CAPTCHA. In the two-step CAPTCHA, there
was only one mistake of an additional iletter in the middle
of an uncommon last name (and no mistake in ﬁrst names).
In the camouﬂaged CAPTCHAs, all the ﬁrst names were
recovered successfully. In the basic anagram CAPTCHA,
only one last name could not be recovered.
Our results show that it is possible to reproduce the TPRs
with a high success rate.
5.2 Evaluating Malicious CAPTCHA for
Multiple Boolean Questions
In Section 4 we described variants of malicious
CAPTCHA that can be used to answer multiple Boolean
questions. In the end of the section we presented an exam-
ple that describes how to detect which drugs (if any) out of 4
options the user searched for in the Bing search engine. We
now describe a small study that uses this example to eval-
uate the eﬀectiveness of the technique described in Section
4. The experiment was conducted with the participation of
20 student volunteers.
In the experiment, each participant was asked to search
for some or none of the four drugs mentioned in Section
4.2.1. After the search, the participant ﬁlled in a form with
her details and solved the CAPTCHA illustrated in Figure
5. Based on the CAPTCHA solution, we reproduced which
of the 16 search combinations was done by each participant,
and veriﬁed the correctness with her. To complete the exper-
iment, we asked the participants to guess how we reproduced
The results of the experiments were perfect;all the
searches were reproduced perfectly and none of the partici-
pants suspected the CAPTCHA.
Similarly to clickjacking, the malicious CAPTCHA attack
relies on loading webpages in iframes; see Section 2.3.
The XFO header prevents the inclusion of speciﬁc web
objects, such as iframes, in other sites. Hence, it is also an
eﬀective defense against the malicious CAPTCHA attack.
The same holds for ‘frame-busting’ techniques [23,27], used
to prevent the framing of objects by browsers that do not
support the XFO header.
However, there are common scenarios where there is
strong motivation to allow third-party sites to include a
personalized web object within an iframe, such as social
media buttons and widgets. In addition, some attacks
were shown allowing the circumvention of the XFO header
and/or of frame-busting techniques, e.g., [23, 27]; these at-
tacks are equally relevant to the use of these defense mech-
anisms against malicious CAPTCHA. This motivated other
defenses against clickjacking [2,4, 18]. We therefore evalu-
ated whether these defenses would also protect against the
malicious CAPTCHA attack.
Some popular clickjacking-countermeasures do not seem
to defend against malicious CAPTCHA. These include de-
fenses based on additional interaction with the user , and
defenses based on the detection of transparent iframes that
are placed on top of other elements . However, it may
also be feasible to detect the malicious CAPTCHA attack
by extending the detection approach of Balduzzi et al. .
Another anti-clickjacking defense that seems to help against
the malicious CAPTCHA attack, is to randomize the posi-
tion of elements .
An alternative simple method to prevent malicious-
CAPTCHA attacks is to avoid presentation of any private in-
formation, at least not in a directly visible form amenable to
the attack (and to use in CAPTCHA). For example, in many
social-network buttons, it may be possible to simply remove
the private information currently displayed, e.g., name, op-
tionally exposing it only after some (simple) action by the
user. For example, in the Facebook comment box, the name
of the authenticated user may appear only when the user
focuses on the text box, or by popping up an annotation
on typing. Attackers could try to manipulate the user into
focusing on or clicking on elements, but we doubt this can
be done eﬀectively enough to become a signiﬁcant threat.
A longer-term defense may be an extension of the XFO
header, allowing websites to include some objects in iframes,
but with restrictions to prevent the malicious CAPTCHA
attack. In particular, sites may restrict the use of other
layers that hide part of the frame or the number of inclusions
of the frame in the same webpage. Some of these defenses
may even be useful to implement as client-only defenses,
e.g., adopt a default policy of preventing more than several
inclusions of the same object or of tiny frames.
7. RELATED WORK
We concentrate on work that uses CAPTCHA to go be-
yond its original goal. The idea of exploiting web users
by making them ﬁll in CAPTCHAs has also been used for
good purposes. The ReCAPTCHA project  used the text
challenge to include images of words that optical character
recognition (OCR) software has been unable to read. The
solutions of the CAPTCHAs, submitted by millions of users
every day, have been used to digitize millions of books and
Our work is not the ﬁrst to prompt CAPTCHA for mali-
cious purposes. Instead of using dedicated human resources
to manually break CAPTCHA , Egele at al. oﬀered to in-
ject prompted CAPTCHA challenges of one web-service into
other web-services, and to take advantage of their users, by
forcing them to solve the CAPTCHAs .
Weinberg et al.  explained how to detect links that
were followed by the victims, based on the link’s style; this
work exploits the fact that the links followed are presented
diﬀerently to the users. Similar to our attack, although the
attacker can present links to the victim from her website, it is
impossible for her to see how they are actually presented by
the browser. Weinberg et al. created several CAPTCHAs
out of the pixels of these links, hence, by answering the
CAPTCHA, the user exposed the color of the pixels. In this
way, the attacker learned whether the link was followed or
not. Our attacks extract much more information from the
victim site itself.
The techniques presented by Weinberg et al.  to com-
bine many Boolean questions in one CAPTCHA, can be used
to optimize the multiple-questions in malicious CAPTCHA
attacks, as described in Section 4.
Unlike the malicious CAPTCHA attack, which is suit-
able to expose sensitive private information from many web-
sites, Weinberg et al. focused on speciﬁc Boolean ques-
tions: whether or not the victim browsed to a URL with
Users expose a lot of sensitive, private information to web
and cloud services, assuming that the services will not abuse
their information. Furthermore, users expect that the ser-
vice will take proper measures to protect their privacy from
third parties. Indeed, web services take measures to protect
users from attacks by third parties, including rogue web sites
visited by the user. That said, we show a simple and eﬀec-
tive attack that allows attackers to expose private details
presented by websites. This is done by exploiting the users
themselves as a side-channel, tricking the users into disclos-
ing their own private information. The attack works using
all standard browsers, and against some of the most well-
known and guarded software-as-a-service web-services, such
as Google and Facebook (see more in Table 1).
Similar to other web attacks, defending against this sort of
attack is not very diﬃcult, as we explain in Section 6. How-
ever, appropriate defenses, especially short term defenses
that do not assume new client side mechanisms, may re-
quire web services to slightly modify some mechanisms, e.g.,
social-network buttons. Such changes may involve a small
loss in functionality. It would be interesting to see if and how
the industry responds to this challenge: will the small loss in
functionality be accepted in order to protect user privacy?
This research was supported by a grant from the Ministry
of Science and Technology, Israel.
 L. V. Ahn, M. Blum, N. J. Hopper, and J. Langford.
CAPTCHA: Using Hard AI Problems for Security. In
EUROCRYPT, pages 294–311. Springer-Verlag, 2003.
 D. Akhawe, W. He, Z. Li, R. Moazzezi, and D. Song.
Clickjacking revisited: A perceptual view of ui
security. In 8th USENIX Workshop on Oﬀensive
Technologies (WOOT 14), San Diego, CA, Aug. 2014.
 Alexa. Top Sites. http://www.alexa.com/topsites,
 M. Balduzzi, M. Egele, E. Kirda, D. Balzarotti, and
C. Kruegel. A solution for the automated detection of
clickjacking attacks. In Proceedings of the 5th ACM
Symposium on Information, Computer and
Communications Security, pages 135–144. ACM, 2010.
 E. Balsa, C. Troncoso, and C. Diaz. Ob-pws:
obfuscation-based private web search. In Security and
Privacy (SP), 2012 IEEE Symposium on, pages
491–505. IEEE, 2012.
 M. Barbaro and T. Zeller. A face is exposed for AOL
searcher no. 4417749. New York Times, Aug. 2006.
 A. Bortz and D. Boneh. Exposing private information
by timing web applications. In Proceedings of the 16th
international conference on World Wide Web, pages
621–628. ACM, 2007.
 Brad Stone. Breaking Google CAPTCHAs for Some
 Chester Wisniewski. Facebook adds speed bump to
slow down likejackers, March 2011. Online at
 M. Egele, L. Bilge, E. Kirda, and C. Kruegel.
CAPTCHA smuggling: hijacking web browsing
sessions to create CAPTCHA farms. In Proceedings of
the 2010 ACM Symposium on Applied Computing,
pages 1865–1870. ACM, 2010.
 Eric Lawrence. ClickJacking Defenses.
 C. Evans. Cross-domain search timing. blog,
 Facebook. Social plugins, February 2015. Online at
 A. J. Ferguson. Fostering e-mail security awareness:
The west point carronade. EDUCASE Quarterly, 2005.
 R. Hansen and J. Grossman. Clickjacking. Sec Theory,
Internet Security, 2008.
 A. Herzberg and R. Margulies. Forcing Johnny to
login safely. Journal of Computer Security, 21(2),
2013. Extended version of Esorics’11 paper.
 B. Hill. Adaptive user interface randomization as an
anti-clickjacking strategy, May 2012. http://www.
 L.-S. Huang, A. Moshchuk, H. J. Wang, S. Schecter,
and C. Jackson. Clickjacking: Attacks and defenses. In
USENIX Security Symposium, pages 413–428, 2012.
 D. Irani, M. Balduzzi, D. Balzarotti, E. Kirda, and
C. Pu. Reverse social engineering attacks in online
social networks. In Detection of intrusions and
malware, and vulnerability assessment, pages 55–74.
 T. N. Jagatic, N. A. Johnson, M. Jakobsson, and
F. Menczer. Social phishing. Communications of the
ACM, 50(10):94–100, 2007.
 Jeremiah Grossman. I Know What Websites You Are
Logged-In To (Login-
Detection via CSRF). http://blog.whitehatsec.com/
i-know-what-websites-you-are- logged-in-to-login-detection- via-csrf/,
 S. Lee, H. Kim, and J. Kim. Identifying cross-origin
resource status using application cache. In 22nd
Annual Network and Distributed System Security
Symposium, NDSS 2015, San Diego, California, USA,
February 8-11, 2014, 2015.
 S. Lekies and M. Heiderich. On the fragility and
limitations of current browser-provided clickjacking
protection schemes. In E. Bursztein and T. Dullien,
editors, WOOT, pages 53–63. USENIX Association,
 G. Mori and J. Malik. Recognizing objects in
adversarial clutter: Breaking a visual CAPTCHA. In
Computer Vision and Pattern Recognition, 2003.
Proceedings. 2003 IEEE Computer Society Conference
on, volume 1, pages I–134. IEEE, 2003.
 Mozilla Developer Network. Same-Origin Policy, 2014.
 Mozilla Developer Network. Using the application
 G. Rydstedt, E. Bursztein, D. Boneh, and C. Jackson.
Busting frame busting: a study of clickjacking
vulnerabilities at popular sites. In in IEEE Oakland
Web 2.0 Security and Privacy (W2SP 2010), pages
 The Open Web Application Security Project.
Cross-Site Request Forgery.
 T. Van Goethem, W. Joosen, and N. Nikiforakis. The
clock is still ticking: Timing attacks in the modern
web. In Proceedings of the 22nd ACM SIGSAC
Conference on Computer and Communications
Security, pages 1382–1393. ACM, 2015.
 L. Von Ahn, B. Maurer, C. McMillen, D. Abraham,
and M. Blum. reCAPTCHA: Human-based Character
Recognition via Web Security Measures. Science,
 Z. Weinberg, E. Y. Chen, P. R. Jayaraman, and
C. Jackson. I still know what you visited last summer:
Leaking browsing history via user interaction and side
channel attacks. In Security and Privacy (SP), 2011
IEEE Symposium on, pages 147–161. IEEE, 2011.
 J. Yan and A. S. El Ahmad. A low-cost attack on a
microsoft CAPTCHA. In Proceedings of the 15th ACM
conference on Computer and communications security,
pages 543–554. ACM, 2008.