Conference PaperPDF Available

AdBudgetKiller: Online Advertising Budget Draining Attack

Authors:

Abstract and Figures

In this paper, we present a new ad budget draining attack. By repeatedly pulling ads from targeted advertisers using crafted browsing profiles, we are able to reduce the chance of showing their ads to real-human visitors and trash the ad budget. From the advertiser profiles collected by an automated crawler, we infer advertising strategies, train satisfying browsing profiles and launch large-scale attacks. We evaluate our methods on 291 public advertisers selected from Alexa Top 500, where we successfully reveal the targeting strategies used by 87% of the advertisers we considered. We also executed a series of attacks against a controlled advertiser and 3 real-world advertisers within the ethical and legal boundary. The results show that we are able to fetch 40,958 ads and drain up to $155.89 from the targeted advertisers within an hour.
Ad ecosystem types of customers. Others may consider them as equally favorable and apply the same strategy. Retargeting is a technique where advertisers use behavioral targeting strategies to promote ads that follow users after they have expressed a prior interest in an advertiser's website, such as looked at or purchased a particular product. Retargeting is very eeective as a retargeting ad is personalized to an individual user's interests, rather than targeting groups of people whose interests may vary. Given the underlying lucrative beneets, the involved ad parties have strong incentives to conduct fraudulent activities. In fact, advertising fraud becomes a massive problem in ad industry and is ruining this billion-dollar business. According to IAB, ad fraud is costing the U.S. media industry around $8.2 billion in 2015 [19] and half of the loss derives from "non-human traac". In this paper, we propose an innovative ad budget draining attack by precisely fetching ads from the targeted advertisers. Our technique is able to reverse engineer targeting strategies and train browsing prooles that satisfy the conditions set by the advertisers. In summary, we make the following contributions. • We propose a novel ad budget draining attack targeting at speciic advertisers by repeatedly pulling their ads to trash the budget. • We develop a black-box testing based technique to automatically infer targeting strategies and create satisfying browsing prooles. • Out of 291 advertisers selected from Alexa Top 500, we successfully revealed the targeting strategies used by 254 advertisers. • We launched distributed attacks against a controlled advertiser and 3 real-world advertisers. We are able to fetch 40, 958 ads and drained up to $155.89 within an hour. The rest of this paper is structured as follows. We introduce the online advertising ecosystem and fraudulent activities in Sec. 2. In Sec. 3, we describe the ad budget draining attack in details. We explain the evaluation results on both public and controlled advertisers in Sec. 4. We discuss potential countermeasures in Sec 5 and related work in Sec. 6. We conclude the paper in Sec. 7.
… 
Content may be subject to copyright.
AdBudgetKiller: Online Advertising Budget Draining Aack
I Luk Kim
Department of Computer Science,
Purdue University
West Lafayette, Indiana
kim1634@purdue.edu
Weihang Wang
Department of Computer Science,
Purdue University
West Lafayette, Indiana
wang1315@cs.purdue.edu
Yonghwi Kwon
Department of Computer Science,
Purdue University
West Lafayette, Indiana
kwon58@purdue.edu
Yunhui Zheng
IBM T.J. Watson Research Center
Yorktown Heights, New York
zhengyu@us.ibm.com
Yousra Aafer
Department of Computer Science,
Purdue University
West Lafayette, Indiana
yaafer@purdue.edu
Weijie Meng
Department of Computer Science,
Purdue University
West Lafayette, Indiana
mengw@purdue.edu
Xiangyu Zhang
Department of Computer Science,
Purdue University
West Lafayette, Indiana
xyzhang@cs.purdue.edu
ABSTRACT
In this paper, we present a new ad budget draining attack. By repeat-
edly pulling ads from targeted advertisers using crafted browsing
proles, we are able to reduce the chance of showing their ads to
real-human visitors and trash the ad budget. From the advertiser
proles collected by an automated crawler, we infer advertising
strategies, train satisfying browsing proles and launch large-scale
attacks. We evaluate our methods on 291 public advertisers selected
from Alexa Top 500, where we successfully reveal the targeting
strategies used by 87% of the advertisers we considered. We also
executed a series of attacks against a controlled advertiser and 3
real-world advertisers within the ethical and legal boundary. The
results show that we are able to fetch 40
,
958 ads and drain up to
$155.89 from the targeted advertisers within an hour.
CCS CONCEPTS
Security and privacy Spoong attacks
;
Web application
security;
KEYWORDS
Online Advertising, Ad Fraud, Budget Draining Attack
ACM Reference Format:
I Luk Kim, Weihang Wang, Yonghwi Kwon, Yunhui Zheng, Yousra Aafer,
Weijie Meng, and Xiangyu Zhang. 2018. AdBudgetKiller: Online Advertising
Budget Draining Attack. In WWW 2018: The 2018 Web Conference, April
23–27, 2018, Lyon, France. ACM, New York, NY, USA, 11 pages. https://doi.
org/10.1145/3178876.3186096
This paper is published under the Creative Commons Attribution 4.0 International
(CC BY 4.0) license. Authors reserve their rights to disseminate the work on their
personal and corporate Web sites with the appropriate attribution.
WWW 2018, April 23–27, 2018, Lyon, France
©
2018 IW3C2 (International World Wide Web Conference Committee), published
under Creative Commons CC BY 4.0 License.
ACM ISBN 978-1-4503-5639-8/18/04.
https://doi.org/10.1145/3178876.3186096
1 INTRODUCTION
Online advertising is the primary source of income for many Inter-
net companies. In the US market, Google and Facebook generated
$36.69 and $12
.
4billion [
22
] from advertising in 2016 respectively.
According to a report by the Internet Advertising Bureau (IAB), the
revenues generated from Internet advertising in the United States
totaled $72
.
5billion in the full year 2016 [
20
], which represents an
increase of 21
.
8% from the revenues reported in 2015. It is estimated
that the U.S. digital advertising will continue its growth and the ad
revenue will reach $83 billion in 2017 [22].
In its basic form, online advertising entails selling spaces on
websites to parties interested in showing ads for a monetary fee.
However, the mechanisms and backing up the online advertising
ecosystem are quite complex. The ad delivery infrastructure in-
volves four major parties: publishers,advertisers,ad network, and
ad exchange. Publishers are websites owners who oer space to
display ads on their websites. Advertisers pay publishers for ad
slots to place specic ad content with embedded links. Ad networks
play the role of match-makers to bring together publishers with
advertisers who are willing to pay the most for the publisher’s
oered space. Ad exchanges are networks of ad networks. An ad
exchange works similarly as an ad network, except that the buying
and selling entities within an ad exchange are ad networks.
To reach the most receptive audience, advertisers often use so-
phisticated targeting methods to serve ads to the right viewers. The
targeting strategies can either be geographical based such as serv-
ing an ad to users in a specic country or demographically focused
on age, gender, etc. They can also be behavioral variables (such as
a user’s browsing activities and the purchase history) or contex-
tually focused by serving ads based on the content of a website.
In addition, advertisers may employ dierent targeting strategies.
Some advertisers may value customers who placed an item in cart
as more promising potential buyers than customers who simply
browsed the item page. So they deliver dierent ads to these two
Track: Security and Privacy on the Web
WWW 2018, April 23-27, 2018, Lyon, France
297
User Publ ishe r Ad Exchange Ad Network2
Win ni ng B id de r
Advertiser2
Visi t Ad request Bid request
Bid responseAd response
Ad Network1 Advertiser1
Ad Network3 Advertiser3
Figure 1: Ad ecosystem
types of customers. Others may consider them as equally favorable
and apply the same strategy.
Retargeting is a technique where advertisers use behavioral tar-
geting strategies to promote ads that follow users after they have
expressed a prior interest in an advertiser’s website, such as looked
at or purchased a particular product. Retargeting is very eective
as a retargeting ad is personalized to an individual user’s interests,
rather than targeting groups of people whose interests may vary.
Given the underlying lucrative benets, the involved ad parties
have strong incentives to conduct fraudulent activities. In fact,
advertising fraud becomes a massive problem in ad industry and
is ruining this billion-dollar business. According to IAB, ad fraud
is costing the U.S. media industry around $8
.
2billion in 2015 [
19
]
and half of the loss derives from “non-human trac”.
In this paper, we propose an innovative ad budget draining at-
tack by precisely fetching ads from the targeted advertisers. Our
technique is able to reverse engineer targeting strategies and train
browsing proles that satisfy the conditions set by the advertisers.
In summary, we make the following contributions.
We propose a novel ad budget draining attack targeting at
specic advertisers by repeatedly pulling their ads to trash
the budget.
We develop a black-box testing based technique to automati-
cally infer targeting strategies and create satisfying browsing
proles.
Out of 291 advertisers selected from Alexa Top 500, we suc-
cessfully revealed the targeting strategies used by 254 adver-
tisers.
We launched distributed attacks against a controlled adver-
tiser and 3real-world advertisers. We are able to fetch 40
,
958
ads and drained up to $155.89 within an hour.
The rest of this paper is structured as follows. We introduce the
online advertising ecosystem and fraudulent activities in Sec. 2. In
Sec. 3, we describe the ad budget draining attack in details. We
explain the evaluation results on both public and controlled adver-
tisers in Sec. 4. We discuss potential countermeasures in Sec 5 and
related work in Sec. 6. We conclude the paper in Sec. 7.
2 ONLINE ADVERTISING
In this section, we discuss the entities in the ecosystem and explain
how retargeting ad works. We also show existing threats of the ad
ecosystem.
2.1 Ad Ecosystem
The entities in the ad ecosystem include publishers, advertisers,
ad networks, and ad exchanges. Publishers are the websites who
earn money by selling ad space on their pages. Advertisers are
the buyers who pay ad networks to deliver their ads. Ad networks
are the entities that connect advertisers with websites and help
advertisers nd right publishers. Ad exchanges are networks of ad
networks, which enable ad trac transactions among ad networks.
Fig. 1 explains how an ad is delivered by an ad exchange. When
a user visits the publisher website
1
, an ad request is sent to the ad
exchange
2
. The ad exchange conducts a real-time auction, where
the exchange sends requests to ad networks
3
. Based on the user’s
characteristic, ad networks respond with their oers
4
. The ad
exchange picks an oer and delivers the winner’s ad to the user
5
.
The whole auction is done in milliseconds.
The ecosystem delivers ads for a fee. There are several pricing
models and cost per thousand impressions (CPM) is commonly used.
Assume the CPM is $7 in Fig. 1. The winning advertiser (Advertiser
2) pays 0
.
7cents per ad, which will be split among the ad network,
the ad exchange, and the publisher.
2.2 Retargeting Ad
E-commerce websites want attract potential customers by all means,
hoping they will make purchases, become registered users, etc. The
percentage of visitors attracted is called the conversion rate. In
reality, only 2% of visitors take desired actions in their rst visit
[
33
]. Retargeting is created to attract the remaining customers by
display personalized ads. It tracks website visitors and delivers
customized ads when they visit other websites.
In particular, advertisers need to identify a list of high-value
visitors. To do so, advertisers include a retargeting pixel, which is a
small snippet provided by a retargeting service provider, in their
web pages. When a user arrives, the pixel drops an anonymous
cookie and enroll this visitor to the list. The anonymous cookie acts
as the browsing prole, which is a set of IDs and memorizes browsing
activities. The retargeting service providers identify unconverted
visitors and deliver them personalized ads. To reach more visitors,
the retargeting service providers maintain partnership with major
ad networks, such as Facebook, and Google Display Network. They
participates the real-time ads auctions and bids aggressively.
Retargeting is very eective. E-commerce sites can save money
and eorts by selectively targeting visitors who have already ex-
pressed interests. According to Kimberly-Clark, a global leader in
selling paper products, they accounted for 50
60% conversion
rates from their retargeting eorts [
11
]. Similarly, [
23
] reported
that online retailer ShopStyle gained a 200% increase in retargeting
conversions. Retargeting also benets customers because the ads
delivered are relevant to their interests.
2.3 Threats
While ad exchanges enable ecient and powerful campaigns, their
pricing models make the system a highly lucrative target for cyber-
criminals. For instance, to articially inate the actual impression
amount and earn more money, a publisher can fabricate visits to
publisher pages such that the advertiser’s ad budget is wasted
because the ads were not seen by real human visitors. Such fraudu-
lent activity is called impression fraud. Although ad networks and
exchanges perform real-time monitoring, it is always dicult to
Track: Security and Privacy on the Web
WWW 2018, April 23-27, 2018, Lyon, France
298
Advertiser ’s Data Collection
Attacker
Targeted E-commerce Website
Targeted Advertiser
Ad Network
Attack Module Generator
Attack Module
Attack
Module
Generation
Attack
Module
Deployment Distribut ed
Attack
Ad Budget
Draining
Attack machines
Figure 2: Ad budget draining attack procedure
prevent from various kinds of fraud activities because of the huge
amount of ad trac.
3 AD BUDGET DRAINING ATTACK
In this section, we elaborate the ad budget draining attack. The
victims of our budget draining attack are people or companies that
advertise their e-commerce websites using retargeting ad services.
The immediate consequence of the attack is the wasted advertise-
ment budget. Moreover, the chance of their ads being displayed can
be reduced since it would be dicult to win during the ad auction
with the drained ad budget. The potential attackers can be com-
petitor advertisers who may try to drain the others’ ad budget and
unfairly win the competition. Another possible scenario is denial
of service (DOS) attacks performed by people who seek to make
ads from the targeted advertiser unavailable for the purpose of a
protest. Fig. 2 shows the overall attack procedure. The attacker
collects data about the targeted advertiser, generates attack modules
that automatically craft browsing proles and pull the victim’s ads.
Note that the attack modules can be independently deployed to
launch distributed attack. Throughout these process, the attacker
can drain the targeted advertiser’s ad budget by repeatedly fetching
ads. The details of our attack mechanism are explained in the rest
of this section.
3.1 Overview
As discussed in Sec. 2, ad networks track website visitors and deliver
targeted ads if they satisfy the advertising strategies. Therefore,
identifying the strategies is the rst step to attack a particular adver-
tiser. Since ad networks may dene arbitrary strategies, eectively
reverse-engineering the retargeting logic and craft corresponding
browsing proles are the keys to reproducibly launch large-scale
attacks. As shown in Fig. 3, website modeling,advertiser proling,
attack module generation and attack distribution are the major steps
involved in the ad budget draining attack.
1 Website Modeling.
A website model represents structural de-
signs and relationships between web pages. In order to be classied
as advertisers’ favored customers and eventually see their ads, one
eective way is to visit the advertisers’ websites and trigger the
tracking logic. However, identifying desired navigation sequences
that eectively trigger the tracking logic (e.g., products need to be
put in the shopping cart) is not trivial due to the huge search space.
Therefore, our rst step is to create a model for the targeted website
to guide the search. In particular, we navigate the targeted website,
apply clustering algorithms to the pages, then create a Finite State
Machine (FSM) model. Details can be found in Sec. 3.2.
2 Advertiser Proling.
In this step, we focus on inferring target-
ing strategies. We develop ADHoneyClient to automatically discover
the strategies based on black-box testing techniques. We also iden-
tify the optimal ads fetch count to work around the rate limits set
by the ad networks. We explain our algorithms in Sec. 3.3.
3 Attack Module Generation.
The attack modules generated
contain the training data and the utilities to create satisfying brows-
ing proles, where training data is a set of HTML page with ads
tracking tags. The module also features an fetch page and an attack
engine. The fetch page is a single HTML page with several ads slots
that pull the targeted ads. The attack engine drives the whole train-
ing and ad fetching procedure. As ad networks may equip IP based
defense mechanisms, our attack engine can leverage the public
proxy lists and randomly change IP addresses to evade IP-based
detections. Details can be found in Sec. 3.4.
4 Attack Distribution.
The nal step is to deploy the attack
modules on multiple machines to launch a distributed attack. In
particular, each attack module trains a browsing prole satisfying
the strategy from the training pages and repeatedly fetches ads
using the ad fetch page. We explain the details in Sec. 3.5.
3.2 Website Modeling
A website model describes its structure and transitions among pages.
It can be used to guide the targeting strategy discovery. Fig. 4 shows
the steps for model creation.
3.2.1 Browsing Trace Collector. The browsing trace collected
at 1 in Fig. 4 is used to cluster pages. The collector automatically
records browsing activities while an attacker explores the targeted
website. Table 1 shows example traces. We record two types of
data: pages visited and events triggered. The page data contains the
HTML source code and the corresponding URL. If no redirection
happens, the page ID is recorded (e.g.,
P4
in Table 1). The event
data describes the browsing action, the DOM object involved, and
action attributes.
Note that we do not require a complete website model. Instead,
we only need a few inputs. In practice, we observed that usually a
small number of actions are sucient to trigger the tracking logic.
For example, if a visitor sees ads after she visited the advertiser’s
product page, only one action (i.e., visiting the advertiser’s product
page) is needed. However, if an advertiser targets visitors who added
items to the shopping cart and left without buying, the actions of
1) visiting a product page, 2) clicking the add-to-cart button and 3)
visiting the cart page are needed.
3.2.2 Page Clustering. With the trace collected, we group simi-
lar pages into clusters based on its functionality. For example,
P3
and
P7
in Table1 are grouped together as the product page. We apply
dierent clustering methods based on page types:
Track: Security and Privacy on the Web
WWW 2018, April 23-27, 2018, Lyon, France
299
Attack Module
Training Page
Ad Fetch Page
Attack Engine
Input
Attack
Parameters
Advertiser Profiling
ADHoneyClient
Ad Extractor
Ad
Fetcher
Ad
Parser
Browsing Profile
Trainer
Attack Module
Generation
Attack Module
Builder
Tag-only Training
Page Builder
Ad Fetch Page
Builder
Attack Distribution
Attack Machine
Attack Machine
Browsing Profile
Training
Ad Fetching
Website Modeling
Browsing Trace
Collector
Page Clusterer
Model Builder
Model Data
Website
Model
Testing Input
Profile Data
Targeting
Strategy
Ad Specification
Optimal Fetch
Count
Figure 3: Attack Mechanism
Figure 4: Model creation process
Table 1: Example of collected browsing trace
Trace
No
Page Event
ID URL ID Action XPath Data
1P1shopping.com E1click //*[id="cat1"]
2P2./cat1 E2click //*[id="prod1"]
3P3./item?prod=1 E3dom //*[id="size"] 6
4P4#P3E4click //*[id="cat2"]
5P5./cat2 E5click //*[id="cat3"]
6P6./cat3 E6click //*[id="prod2"]
7P7./item?prod=2 E7dom //*[id="size"]
//*[id="color"]
7
white
8P8#P7E8click //*[id="AddToCart"]
9P9./cart
The
Redirected Pages
are ones clustered by context (
2
in
Fig. 4), where we compare page structures. In particular, we cal-
culate the DOM Tree Edit Distance (TED) [
27
,
36
] and measure
the similarity using hierarchical clustering algorithms [34].
The
DOM Modied Pages
are ones grouped using a link-based
clustering method (
3
in Fig. 4). Specically, DOM modied pages
containing page IDs (links) in a same cluster are grouped together.
For example, in Table 1, the page
P4
is linked to the page
P3
, and
P8
is linked to the page
P7
. Since
P3
and
P7
are in the same cluster
(product pages), P4and P8are grouped together.
3.2.3 Model Builder. The model builder connects the clusters
based on the order observed in traces and assigns event data to the
S1:
{P1}
S3:
{P3, P7}
S2:
{P2, P5,
P6}
S4:
{P9}
S5:
{P4, P8}
T1:{E1} T2:{E2,E6}
T4:{E3,E7}
T5:{E4}
T6:{E8}
S:State
T:Transaction
P:Page
E:Event
State annotati on
S1 : Front page
S2: Category page
S3: Product page
S4: Cart page
S5: DOM modified Product page
T3:{E5}
Figure 5: An example of website model
edges. As a result, a Finite State Machine (FSM) is created, where
nodes represents states and edges with event annotations denote
transitions. Fig. 5 shows a model created from the example traces.
By having a model, we can create proper browsing proles as many
as possible, and more importantly, we can avoid creating redundant
proles.
3.3 Advertiser Proling
As discussed in Sec. 2, ad networks track website visitors and deliver
targeted ads if they satisfy the advertising strategies. As strategies
are invisible to us, we have to infer the strategies by proling the
targeted advertisers.
3.3.1 ADHoneyClient: An Automated Ad Crawler. We need a
large amount of ads related data to infer the retargeting logic. To
automate the data collection process, we develop an ad crawler
ADHoneyClient to fetch ads with customized browsing proles and
emulate browsing activities. As ads are probably the most com-
plicated and dynamic snippets observed on general websites, AD-
HoneyClient has to handle complicated DOM objects and dynamic
JavaScript. ADHoneyClient has the following two components:
1) Browsing Prole Trainer.
Browsing proles are tracking IDs
stored in cookies that memorize the browsing history. Browsing
prole trainer crafts browsing proles by triggering ads tracking
logic. Starting from a fresh prole, the prole trainer produces
customized browsing proles by simulating browsing activities. It
navigates the website guided by the model with example inputs. In
the meantime, tracking scripts can update the browsing prole and
send browsing histories to the retargeting service providers.
Track: Security and Privacy on the Web
WWW 2018, April 23-27, 2018, Lyon, France
300
Table 2: HTML Tags Used for Ad Parser
HTML tag Attribute Ad information
<a> id, href ad_url, ad_network
<script> id, src, innerHTML ad_url, ad_network
<noscript> innerHTML ad_url
<iframe> id, src, name ad_network
<img>,<embed>,<object>,<video> width, height type, size
2) Ad Extractor.
The Ad extractor fetches ads using the crafted
browsing prole generated by the trainer. In particular, it infers the
targeting strategy, ad specs and the optimal fetch count. Details
will be explained in Sec. 3.3.2.
When ads arrive, Ad parser determines their sources, speci-
cations (such as types and sizes) and the ad network involved. It
also extracts ads related HTML tags. In particular, since ads are
usually rendered in the nested
<iframe>
for security purposes, it
drills down and looks for specic ids (e.g. “
google_ads_iframe_*
for DoubleClick). Once found, it collects element attributes as
well as HTML tags inside. To identify ad networks, we manu-
ally developed 53 signatures. For example, the famous retargeting
ad networks Criteo [
4
] can be identied if the
src
of
<iframe>
is
*.criteo.com/delivery/r/afr.php?
. Besides, Ad parser har-
vests all URLs included in the HTML pages pointed by ads related
<iframe>
. It also determines the size and type of the ads from the
attributes of observed
<embed>
,
<object>
,
<video>
and
<img>
. We
found some tags found in ads related iframes are not useful. We
only collect the tags listed in Table 2 for better eciency.
3.3.2 Advertiser Profiling. In this subsection, we explain the
targeting strategy,Ad specication, and optimal Ad fetch count pro-
duced by ADHoneyClient, which will be used as the training data
in the next step.
Targeting Strategy.
A targeting strategy is a sequence of brows-
ing activities which can be used to identify high-value customers.
For example, advertisers can target at visitors who browsed the
product pages or left something in carts without buying. A target
strategy simulates the browsing activities demonstrated by such
favored visitors. Take the website model in Fig. 5 as an example. A
corresponding browsing activity example can be [visiting a product
page,choosing an option,adding it to a cart], which can be described
by a path covering states and transitions
S3,T4,S5,T6
and
S4
. As our
website model is deterministic, the representation can be simplied
to
S3,T4
and
T6
. To concertize it, we pick a page/event from each
state/transaction and get a targeting strategy [P7,E7,E8].
Algorithm 1 explains how we generate a target strategy from
the website model graph produced in Sec. 3.2. The output target
strategy is a list of browsing activities, where each activity can be
either a page or a event obtained from the model.
Function FindTargettingStrategy is the main procedure. It
keeps generating dierent strategy candidates (line 3) until a desired
strategy is found (line 6) or the max number of tries is reached (line
2). In particular, FindTargettingStrategy generates uncovered
strategy candidates. Given a strategy, AdHoneyClient follows the
activity sequence in a strategy, trains the browsing prole and
fetches ads (line 4). If the ads are from the targeted advertiser (line
5), a targeting strategy is found.
Algorithm 1 Finding Targeting Strategy
Input:
(1) M=(S,T)
: website model graph, where vertex
SiS
denotes a page cluster
{Pm, .. ., Pn}and edge TjTis a set of events {Ex, . . ., Ey}.
(2) L: Max length of elements in a candidate.
(3) N: Max number of candidates to test
Output:
targeting strategy
T S =
[
b1, .., bl
]: a list of browsing activities, where
bi∈ {P1, .. ., Pm} ∪ {E1, . . ., En}
1: function FindTargetingStrategy(M,L,N)
2: for i1to Ndo
3: T ScGenerateCand(M,L)
/* generate browsing prole guided by TS cand fetch Ad using ADHoneyClient */
4: ad TrainAndFetchAd(T Sc)
5: if base url of ad and Mis same then
6: return T Sc
7: return
8: function GenerateCand(M,L)
9: lRandomSelect({1,2, .. ., L}) /* length of the candidate */
10: T S [GetRandomPage(M)]/* the activities starts with a page */
11: for i1to l1do
12: type RandomSelect( {s t at e 00
,tr a ns it i on00 })
13: if type is t r an si t io n00 and T S[i1]<an accepting state then
14: if T S [i1]is an event Ekthen /* continue to the next event */
15: T S Append(T S,Ek+1)
16: else if T S [i1]is a page Pkthen
17: T S Append(T S,Ek)
18: else if type is st a t e00 then
19: T S T S GetRandomPage(M)
20: if T S has been seen before then /* removes redundant candidates */
21: T S GenerateCand(M,L)
22: return T S
23: function GetRandomPage(M)
24: Srrandomly select a state from Mexcept DOM modied states
25: Prrandomly select a page in the state Sr
26: return Pr
Function GenerateCand generates a targeting strategy candi-
date. It starts by randomly picking an initial a page in a state (line
10) and randomly selects pages or events as consecutive activities.
A naive way to select the next activity is to follow the transitions
in the FSM website model. However, we observed it cannot eec-
tively create diverse candidates. Instead, we randomly select a page
from a state when we want to have a “state” as the next activity.
In this way, we can produce more diverse models especially when
the model coverage is low. For instance, a strategy generated can
be [
P7,P1,E1
], where we directly go to
P1
after visiting
P7
even
though there is no edge between them on the model. Intuitively, this
simulates the random jumps among pages during the navigation.
In particular, when we choose to have a “transition” as the next
activity (line 13), we append the consecutive event to the activity
list (lines 15 and 17). If the type is “state”, we select a random page
from a random state (lines 24 and 25). Please note that the DOM
modied states are excluded as they require DOM modication
events and thus not directly accessible (line 24).
Ad Specication.
Ad networks dene ads parameters such as di-
mensions and formats (e.g. image, ash, video, etc.) that advertisers
have to follow. As we will need to seed ad slots to obtain the desired
ads, these specs are important too. For example, if the size of an
desired ad is 300
×
250 but we only supports 160
×
600, it will not
be delivered due to the inconsistency. Therefore, we also collect
ads specs. Although some ad providers support responsive ban-
ners, where the size can be automatically determined at the time
Track: Security and Privacy on the Web
WWW 2018, April 23-27, 2018, Lyon, France
301
29 27
29
25
22
24
0 0
30 30
24
2
5
0 0 1
0
5
10
15
20
25
30
35
1 2 3 4 5 6 7 8
# of ads from targeted advertiser
Ad batch index (30 Ads per batch)
Advertiser 1
Advertiser 2
Figure 6: Targeted ads fetched with a single browsing prole
of fetching, ads specs are still useful as they may prevent potential
inconsistencies and improve the success rate.
Optimal Fetch Count.
In practice, a browsing prole may expire
after repeatedly fetching a certain number of ads, as ad networks
usually set a rate limit on the ads delivered to a single user. There-
fore, we also need to infer the optimal fetch count, which is the
number of the ads that can be fetched using a single prole. In
particular, we monitor the fetch rate using a browsing prole until
the rate drops signicantly, we use the number of ads fetched before
drop as the optimal fetch count.
For example, we fetch 30 ads per batch using a single browsing
prole from two advertisers. Fig. 6 shows the number of ads fetched
in each batch. For advertiser 1, the fetch rate drops to 70% at batch
6and then to 0. Similar patterns are observed for advertiser 2. After
the 3rd batch, the rate is decreased to 6%. Therefore, the optimal
fetch counts for them are 180 and 90 respectively.
In our experience, we use 50% as the threshold to balance the
eorts of creating new proles and ads fetching. In other words,
once the targeted ads fetch rate drops below 50%, we stop fetching
and set the number of ads fetched so far as the optimal fetch count.
3.4 Attack Module Generation
An attack module contains three components: tracking tag-only
page, the ad fetch page, and the attack engine. The rst two are
HTML pages for browsing prole training and ads fetching. The
attack engine drives the process based on the attack parameters.
3.4.1 Tag-only Training Page Builder. To train browsing proles,
we emulate activities specied in targeting strategies. This is one of
the most time consuming parts as we have to repeatedly create new
proles. To correctly set the tracking IDs and browsing histories,
we have to trigger the tracking scripts (Sec. 3.3.1). Unfortunately,
tracking scripts are usually executed after the page is fully loaded,
which signicantly drags down the attack performance. To improve
its eciency, we use the tag-only training pages extracted from the
original pages that only contain the tracking scripts.
Fig. 7 shows how tag-only training pages are built. We get the
HTML source code from the fully rendered page and extract JS
snippets whose tags match pre-collected tracking tag signatures.
We then build a tag-only training page using the extracted scripts
{src”:”widget.crit eo.com /*
src”:”static.criteo. net/*
src”:”*.tellapart.co m/*”
innerHTML” :”tag.marins m.com/”
}
Tag -only Tra in i n g
Page Builder
<html>
<head></head>
<body>
<script defer async="true" type="text/jav ascri pt"
src="http://widge t.cri teo.c om/eve nt?a=9049&v=4.1.0&p0=e%3
Dexd%26ui_isNFLpa ge%3D n%26s ite _type %3Dd&p 1= e%3Dvp %26p% 3D1
1102506&p2=e%3Ddi s&adc e=1" data-owner="crit eo-
tag"></script>
<script type="text/jav ascri pt" async=""
src="//static.cri teo.n et/js/ld /ld.j s"></s cr ipt>
</body>
</html>
Tra ck i ng Ta g
Signatu re Data
Source cod e
after Rendering
Tar g et ed E-commerce
Web s ite Pa g e
Tag -only
Tra in in g Pa ge
Figure 7: Tag-only training page building process
33.3
26.6
27.7
30.4 30.1 30.7
33.2 33.7
36.1
37.3
20
22
24
26
28
30
32
34
36
38
40
10 20 30 40 50 60 70 80 90 100
Total time spent (sec)
# of ad units per batch
Figure 8: Total time spent for fetching 180 ads
and the mandatory DOM elements such as
<html>,<head>,
and
<body>
. The snippets at the bottom in Fig. 7 is the example output.
3.4.2 Ad Fetch Page Builder. An ad fetch page is an HTML le
containing a set of ad slots. It is similar to the crafted page created
for ADHoneyClient (Sec. 3.3.1). We congure each ad slots based
on the collected ad specs. Besides, we need to optimize the number
of ad slots per a batch for better ecacy. However, it is dicult to
predict the appropriate number because the ad loading procedure
varies. Therefore, we perform an experiment in this step to infer
the optimal number of ad slots per batch.
To be specic, we compare the total time spent to fetch a partic-
ular number of ads. As explained in Sec. 3.3.2, we can only fetch a
limited number of ads with a single browsing prole. Therefore, we
use it as the upper bound in each experiment. For example, suppose
the optimal fetch count is 180. We rst fetch 10 ads each time and
repeat for 18 times. Then, we try dierent batch size and compare
the time needed to get all ads specied by the optimal fetch count
(180 in this example). The results are shown in Fig. 8. We achieve
Track: Security and Privacy on the Web
WWW 2018, April 23-27, 2018, Lyon, France
302
Table 3: One hour attack against controlled advertiser
#of
VM ads CPM budget
drainedcost cost /
drained
per VM
ads drained
1 2977 $3.30 $9.82 $0.13 0.01 2977 $9.82
2 4965 $2.84 $14.10 $0.26 0.02 2483 $7.05
3 10114 $4.72 $47.74 $0.39 0.01 3371 $15.97
4 12485 $4.55 $56.81 $0.52 0.01 3121 $14.20
5 16875 $3.28 $55.35 $1.04 0.02 3375 $11.07
6 21264 $3.55 $75.49 $1.56 0.02 3544 $12.58
7 28483 $2.16 $61.52 $2.08 0.03 4069 $8.79
8 30484 $3.95 $120.41 $4.16 0.03 3811 $15.05
9 37880 $3.77 $142.81 $6.24 0.04 4209 $15.87
10 40958 $2.95 $120.83 $8.32 0.07 4096 $12.08
Average 3506 $12.24
the best performance when we fetch 20 ads per batch. Therefore,
we include 20 ad slots in the ad fetch page in this example.
3.4.3 Aack Parameters. Attack parameters are a set of data
used by the attack engine to customize the attack process. We may
specify the attack time including the start time and duration. We
can set the attack strategy, which can be exhaustive or smart. The
exhaustive attack aims to drain the advertising budget as fast as
possible. But it has high risk of getting detected. The smart attack is
less aggressive and randomly sleeps to simulate human behaviors.
3.5 Attack Distribution
When the attack module is ready, it is deployed to virtual machines
hosted on public cloud services, such as Amazon EC2[
2
], Google
Cloud Platform[
5
], and Microsoft Azure[
3
]. Using public cloud
services has the following advantages. First, it is cost eective. We
can launch the attack for a few cents per hour (Sec.4). Second, we
can evade the IP address based detections without additional cost.
The attack engine in the distributed attack modules repeats two
operations: browsing prole training and ads fetching. It loads the
tracking-tag only pages in sequence to train browsing proles, and
fetches ads using the ad fetch page. It repeats the whole procedure
until the optimal fetch count is reached and disposes the browsing
prole by ushing the cookies and local storages.
4 EVALUATION
In this section, we describe the implementation and experiment
results to validate the ecacy of our attack. We implement the
ADHoneClient in Python based on the Selenium libraries [
7
]. The
attack module is built as a chrome extension for easy deployment.
The experiments are done on Microsoft Azure VMs. We choose the
D1 v2 instances which provide the 2.4 GHz Intel Xeon E5-2673 v3
(Haswell) processor, 3.5GB RAM and Windows Server 2016. The
pricing plan for a single instance is $0.13 per hour.
We launch attacks againsttwo types of advertisers. We rst target
at controlled advertisers, where we create the advertiser and set up
the advertising strategies. The second experiment is to attack public
advertisers in the wild (
after obtaining their approvals
), where
advertisers run real-world e-commerce websites. More experiment
details can be found in [8].
4.1 Controlled Advertiser
In this section, we evaluate our attack on the controlled adver-
tiser created by us. As a valid advertiser served by a real-world ad
Figure 9: Our ad is displayed on popular sites like nbc.com
network, we can get the actual numbers of ads being displayed,
ad budget drained and the cost per 1
,
000 impressions (CPM) to
precisely calculate the nancial damage. Besides, we can perform
large-scale attacks without concerning about ethical issues. So, we
can evaluate the full capacity using distributed VMs.
In particular, we created an e-commerce website that sells coee
beans and registered in an ad network. We run an ad campaign
with a banner image and set weekly ad budget to $150. We target
users who visit our product pages. To conrm if our ad is available
to public, we visit the page to create a satisfying browsing prole.
Then we visit popular websites and check if it can be fetched. As
shown in Fig. 9, our ad is actually displayed at the right bottom
corner on one of the top news websites, nbc.com.
We create an attack module performing exhaustive attack for
an hour. Fig. 10 shows a batch of the ads fetched using the attack
module, where most of the ads are from our advertiser. We also
prepare a virtual image with the attack module installed. We create
10 virtual instances using the image in order to evaluate the dis-
tributed attack capability. Using the attack machines, we conduct
10 rounds of attacks with dierent number of attack machines.
Table 3 describes the result of the distributed attack against
the controlled advertiser. The rst column shows the number of
attack machines. The second column shows the total number of
our ads we fetched. We report the CPM, the budget drained and the
cost. The result shows that we successfully fetched about 40
k
ads
using 10 attack machines. Moreover, the number of fetched ads is
increasing linearly with the number of attack machines. On average,
we fetched 3
,
506 ads per machine and drained $142
.
81 with 9attack
machines. We are able to drain 95% of the weekly budget within an
hour. Note that we achieved better performance with 9machines
(instead of 10). The reason is that the CPM is measured dynamically.
Although more ads are fetched using 10 VMs, the drained budget
is less comparing to 9VMs ($2
.
95 vs $3
.
77). We report the ratio of
the cost to the drained budget. The costs are merely 1% to 7% of the
Track: Security and Privacy on the Web
WWW 2018, April 23-27, 2018, Lyon, France
303
Figure 10: Ads fetched using the trained attack module
drained budget, which indicates that the attack using distributed
machines on public cloud is extremely cost eective.
4.2 Public Advertisers
In this section, we evaluate the attack against real-world advertisers
by executing attacks within the ethical and legal boundary.
4.2.1 Advertiser Selection. We target at advertisers who own
e-commerce websites and use retargeting ad services. Although
our implementation can be easily extended to support other ad
networks, currently we focus on DoubleClick. Therefore, we lter
out the websites listed in the shopping category in Alexa Top 500
[
1
] based on the following criteria: 1) websites do not have online
shopping functionalities, 2) websites only providing posting and
payment functionalities and 3) non-English websites. We also re-
move websites if they do not have ad tracking tags or only support
social networks/mobile ads. We use the remaining 291 websites to
infer their targeting strategies.
As it may cause ethical issues if we run a large-scale attack to
real advertisers, we reached out and requested permissions for a
10 minutes attack. We were able to get approval from 3advertisers.
We anonymize their identity and represent them as advertiser 1,2
and 3in the result. Besides, we only use a single attack machine
for the experiments in order to minimize damages.
4.2.2 Revealing Targeting Strategies. The rst step of the budget
draining attack is to verify if the target is vulnerable. In our case,
if we cannot reveal targeting strategies from a targeted advertiser,
the advertiser is not vulnerable. So, we rst reverse engineer the
targeting strategies for each websites using our tool ADHoneyClient.
As shown in Table 4, we successfully revealed targeting strate-
gies from 254 out of 291 websites (about 87%). The rst column lists
the targeting strategy categories. After targeting strategies are suc-
cessfully reversed, we manually verify them the targeting strategies
discovered and put them in proper categories. If we cannot inter-
pret the intention behind the strategy, we mark them as
arbitrary
activities
. The second column shows the average number of
browsing activities in the targeting strategies. The third column
shows the number of websites using the targeting strategies.
The results suggest that most advertisers mainly target the users
who visit product pages. However, we can also see that 112 out of the
254 advertisers (about 44%) use sophisticated targeting strategies
Table 4: Reversed targeting strategies
Targeting Strategy Avg. # of
activities
#of
websites
Avg. training time (sec) Rate
Full page Tag-only page
Visiting a front page 1 31 4.78 0.70 6.87
Visiting a product page 1 111 4.37 0.57 7.65
Adding an item to a cart 3.48 79 8.55 1.17 7.29
Full shopping trip 4.94 28 19.59 2.00 9.77
Arbitrary activities
5 1 24.70 2.23 11.08
6 1 22.98 2.11 10.89
7 1 26.32 2.56 10.28
8 1 29.34 3.09 9.50
8 1 23.70 2.90 8.18
Total websites 254 Avg. rate 9.06
containing more than 3 browsing activities, which suggests that it
is ineective to get ads from such advertisers using the naive attack
method like visiting only the product pages or the front pages.
We manually inspected why we failed on the remaining 37 web-
sites
(
13%
)
. We found that they either do not use the data collected
from the tracking tags or deploy long-term targeting strategies
(showing ads after a week) that is robust to a transient attack. Be-
sides, some target users using geographic data, which is orthogonal
to browsing proles.
To validate the ecacy of the tag-only training approach, we con-
ducted another experiment to show how signicantly we improved
performance comparing to the full page training. As described in
Sec. 3.4.1, we create tag-only pages containing only tracking tags
based on the targeting strategies revealed from the advertisers’ web-
sites. We record browsing prole training times using the tag-only
pages and the original fully-loaded pages. According to the results
in Table 4, the tag-only training is about 9times faster on average.
4.2.3 Estimating Aack Damages. After we got the approvals,
we launched the attack against 3 public advertisers. However, we
cannot precisely obtain the numbers of ads displayed, CPM or
budget because they are condential business information. Instead,
we use the public ad reports providing category-based average CPM
for the rst two quarters of 2016 [
29
,
30
] and do our best to estimate
the damage. Although the estimation may be biased, we believe it
approximately demonstrates how much ad budget we could drain
with our attack against real-world advertisers.
Table 5 shows the result of the attack and the estimated damage.
The columns 2 to 7 describe the output of the advertiser proling.
The column 8 shows the number of ads we fetched from each
advertisers. We report the average CPM in column 10 and use them
to calculate the estimated damage. The estimated budget drained
within one hour (column 11) ranges from $46.86 to $155.89.
4.3 Ethical Considerations
We would like to highlight that we take the ethical issues seriously
in our evaluation. This study was closely advised by a lawyer and
conducted in a responsible manner. Our evaluation process has been
reviewed by IRB and we received IRB exemption.
In the experiment of attacking a controlled advertiser, we own the
advertiser’s account and we pay for the charges. In the experiment
with the three real-world advertisers, we explained our methods
and potential damage to them. We start the experiments with their
approvals. We purposely performed a proof-of-concept experiment
Track: Security and Privacy on the Web
WWW 2018, April 23-27, 2018, Lyon, France
304
Table 5: 10 minutes attack result against selected public advertisers, and estimated damage
Advertiser Ad
Network
Targeting
Strategy
Optimal
Fetch
Count
#of ad
slots per
batch
AD Size Ad
type
#
of
ads
Ad
category
Estimated Damage
CPM Budget
drained/hour
1 A Visiting a product page 180 20 300x250, 160x600, 728x90 Image 3134 E-commerce $8.29 $155.89
2 A Adding an item to a cart 180 20 300x250, 160x600, 728x90 Image 2742 Retail $5.85 $96.24
3 B Visiting a product page 90 30 300x250 Image 942 E-commerce $8.29 $46.86
using only 1 attack machine within 10 minutes to minimize the
damage. We reported our ndings and suggestions to them.
In spite of all of our eorts, due to the nature of the problem, ads
from other advertisers showed up in our experiments. However,
we conrmed that total rewards we collected from the untargeted
advertisers as a publisher was less than $10. As the damages are
distributed among all of the advertisers, the nancial loss of one
advertiser is negligible. More importantly, Google DFP is able to
refund credits to advertisers when publishers violate their policies.
We are in communication with DFP so that they can refund we
earned throughout all of our experiments to the aected advertisers.
5 COUNTERMEASURES
In this section, we describe countermeasures against our attack. We
introduce the detection and prevent methods.
5.1 Detection
In order to detect our attack, ad providers or ad networks can look
for anomalies in ad request tracs. We discuss three possible detec-
tion approaches and their limitations in the following paragraphs.
Browsing prole based detection.
The number of ad requests
generated by a benign users is usually smaller than that from attack
machines. Therefore, the number of ad requests per browsing prole
can be used as a detection feature. For example, if a large amount
of requests with the same browsing prole within certain period
are observed, we can consider this as the attack situation. However,
this feature may not be eective since our attack does not use the
same browsing proles for many times. Another viable feature is
the number of browsing histories in a single browsing prole. In
order to increase eciency of our attack, we only train a browsing
prole with few pages in a targeted website. Those proles created
in that way contain the limited number of browsing histories, but
benign users normally have larger number of browsing activities.
IP address based detection.
IP address based blacklists can be
used too. We can mark requests suspicious if an excessive number
of requests are made from the same address. The attack using
the short-time-use proles may evade the browsing prole based
detection, but it cannot bypass IP address based detection because
requests are from the same IP address. However, as we discussed,
attack machines created using virtual instances can have dierent IP
addresses by simply rebooting them or leveraging publicly available
proxies, which makes IP address based detection less eective.
Click-Through-Rate (CTR) based detection.
CTR is a metric
that measures the number of clicks advertisers receive from a cer-
tain number of impressions. Our attack generate a huge number
of impressions without actually clicking them. Therefore, the CTR
is low. However, this can be bypassed by inserting valid clicks
between our attacks so that CTR can be increased.
Our attack is similar to distributed denial of service (DDoS)
attack since both attacks generate a large amount of trac using
distributed machines. Although detecting the DDoS attack is not
that dicult, the attack is still powerful due to the characteristic
of distributed attack. Once an attack machine is blocked, a newly
created machine can continue the attack. Therefore, it is possible
that our attack can be detected with various features. However, we
believe it is extremely challenging to nullify the attacks.
5.2 Prevention
As it is challenging for ad networks to eectively suppress the
attack, in this section, we suggest practical prevention approaches
for advertisers. One possible solution is to use event-based targeting
strategies. They can track users who actually scroll pages or stay on
the website for a certain period of time. Such event tracking utilities
are already supported by many ad networks [
6
,
9
]. Although the
methods may not completely prevent the attack, it can minimize
the probability of being selected as a target.
6 RELATED WORK
Browsing prole manipulation.
There are two existing studies
that explore attack mechanisms based on browsing prole manipu-
lation. Xing et al. [
35
] proposed an attack where adversaries can
change the customized content of services from popular providers,
such as YouTube, Amazon and Google, by manipulating browsing
proles. They provided specic attack methods for each services,
and showed what attacks could be possible. While the proposed
method worked well, their study only showed possibility of the
attack. In contrast, our approach provided more practical, and ben-
ecial attack mechanism for advertisers. The second attack is pre-
sented by Meng et al. [
26
]. They proposed a fraud mechanism to
increase ad revenue for publishers by injecting higher-paying ad-
vertiser websites to publishers’ pages. Although they successfully
increased the revenue, their attacking perspective is dierent.
Ad fraud and mitigation.
Representative attacks and counter-
measures were discussed in [
15
]. Recently, Stone-Gross et al. [
32
]
performed a large scale study on fraudulent activities in online ad
exchange and suggested practical detection methods.
Recent research are focused on specic fraud activities. Among
them, click fraud/spam is the most popular one. Dave et al. [
16
]
proposed a method for advertisers to measure click spam rates and
conducted a large scale measurement study of click spam across ten
major ad networks. Faou et al. [
18
] proposed a click fraud preven-
tion technique using the value chain, the links between fraudulent
actors and legitimate businesses. Their results showed that pres-
suring a limited number of actors would disrupt the ability of click
fraud. Recently, Jaafar et al. [
21
] proposed FCFraud, a method for
detecting automated clickers from the user side in order to prevent
Track: Security and Privacy on the Web
WWW 2018, April 23-27, 2018, Lyon, France
305
from becoming victimized attackers of click fraud. It analyzes web
requests and events from user processes, classies ad requests, and
detects fraudulent ad clicks.
Another prevalent ad fraud activity is impression fraud/spam.
Springborn et al. [
31
] showed an impression fraud via pay-per-view
(PPV) networks by analyzing ad trac from honeypot websites.
Their results showed that hundreds of millions of fraudulent im-
pressions per day were delivered by the PPV networks. Marciel et
al. [
25
] proposed tools to audit systems of ve major online video
portals to investigate fraud in video ads.
Ad frauds also target on mobile apps. Crussell et al. [
14
] per-
formed a study on mobile ad fraud perpetrated by Android apps.
They developed MAdFraud, an automatic app analysis tool, to em-
ulating event and extract ADs. They found that about 30% of apps
made ad request are running in the background and identied 27
apps generating clicks without user interactions. Liu et al. [
24
]
proposed a system to detect placement frauds that manipulate vi-
sual ads layouts to trigger unintentional clicks from users. They
implemented a tool called DECAF and characterized the prevalence
of ad frauds in 50,000 apps.
Online behavior tracking.
Roesner et al. [
28
] investigated how
third-party web tracking services performed. They showed how
tracking worked, where the data can be stored, and how web track-
ing behaviors are classied. Englehardt et al. [
17
] proposed Open-
WPM, a web privacy measurement platform, to show how third-
party tracking cookies can be used to reveal browsing histories.
Conti et al. [
13
] proposed TRAP, a system to unveil Google personal
proles using targeted AD. They focused on revealing topics a user
is interested in instead of her actual browsing histories. Recently,
Bashir et al. [
10
] showed information ows between ad exchanges
using retargeting ad. They showed how user proles were shared
among ad exchanges by investigating 5,102 retargeting ads. Cahn et
al. [
12
] assessed a privacy threat caused by the third-party tracking.
Our research was inspired by their study and we utilized them to
build our attack method.
7 CONCLUSION
In this paper, we present a novel ad budget draining attack against
targeted advertisers in the online retargeting ads system. The attack
creates crafted browsing proles based on information collected
by an ad crawler ADHoneyClient. It executes a large-scale attack
using public cloud computing services and repeatedly fetches ads
from the targeted advertisers. We evaluate the ecacy of proposed
methods through an extensive set of experiments. We were able
to reverse engineer the targeting strategies used by 254 (out of
291) public advertisers selected from Alexa Top 500. In addition,
we executed a series of attacks against 3 real-world advertisers.
The evaluation results show that our ad budget draining attack is
eective and ecient. It successfully fetched 40
,
958 ads from the
targeted advertisers and drained up to $155
.
89 from their campaign
budget within an hour.
REFERENCES
[1]
2017. Alexa Top 500 - Shopping category.
http://www.alexa.com/topsites/category/Top/Shopping. (2017).
[2]
2017. Amazon Elastic Compute Cloud (Amazon EC2).
https://aws.amazon.com/ec2/. (2017).
[3] 2017. Azure. https://azure.microsoft.com. (2017).
[4] 2017. Criteo Dynamic Retargeting. http://www.criteo.com/. (2017).
[5] 2017. Google Cloud Platform. https://cloud.google.com/. (2017).
[6]
2017. Perfect Audience - Event Audiences.
http://support.perfectaudience.com/knowledgebase/articles/233996-
understanding-action-lists. (2017).
[7] 2017. Selenium Browser Automation. http://www.seleniumhq.org/. (2017).
[8]
2017. Technical Report - AdBudgetKiller: Online Advertising Budget Draining
Attack. https://sites.google.com/view/adbudgetkiller. (2017).
[9]
Adroll. 2017. Event Segments. https://help.adroll.com/hc/en-
us/articles/212014288-Event-Segments. (2017).
[10]
Muhammad Ahmad Bashir, Sajjad Arshad, William Robertson, and Christo Wil-
son. 2016. Tracing information ows between ad exchanges using retargeted ads.
In Proceedings of the 25th USENIX Security Symposium.
[11]
AdWords Robot Bv. 2017. 9 Remarketing/Retargeting Services which Drive
your Online Sales. https://www.adwordsrobot.com/en/blog/9-remarketing-
retargeting-services-which-drive-your-online-sales. (2017).
[12]
Aaron Cahn, Scott Alfeld, Paul Barford, and S Muthukrishnan. 2016. What’s in
the community cookie jar?. In Advances in Social Networks Analysis and Mining
(ASONAM), 2016 IEEE/ACM International Conference on. IEEE, 567–570.
[13]
Mauro Conti, Vittoria Cozza, Marinella Petrocchi, and Angelo Spognardi. 2015.
TRAP: using targeted ads to unveil google personal proles. In Information
Forensics and Security (WIFS), 2015 IEEE International Workshop on. IEEE, 1–6.
[14]
Jonathan Crussell, Ryan Stevens, and Hao Chen. 2014. Madfraud: Investigating
ad fraud in android applications. In Proceedings of the 12th annual international
conference on Mobile systems, applications, and services. ACM, 123–134.
[15]
Neil Daswani, Chris Mysen, Vinay Rao, Stephen Weis, Kourosh Gharachorloo,
and Shuman Ghosemajumder. 2008. Online advertising fraud. Crimeware: under-
standing new attacks and defenses 40, 2 (2008), 1–28.
[16]
Vacha Dave, Saikat Guha, and Yin Zhang. 2012. Measuring and ngerprinting
click-spam in ad networks. ACM SIGCOMM Computer Communication Review
42, 4 (2012), 175–186.
[17]
Steven Englehardt, Dillon Reisman, Christian Eubank, Peter Zimmerman,
Jonathan Mayer, Arvind Narayanan, and Edward W Felten. 2015. Cookies that
give you away: The surveillance implications of web tracking. In Proceedings of
the 24th International Conference on World Wide Web. ACM, 289–299.
[18]
Matthieu Faou, Antoine Lemay, David Décary-Hétu, Joan Calvet, François
Labrèche, Militza Jean, Benoit Dupont, and José M Fernande. 2016. Follow
the trac: Stopping click fraud by disrupting the value chain. In Privacy, Security
and Trust (PST), 2016 14th Annual Conference on. IEEE, 464–476.
[19]
IAB. 2015. What is an untrustworthy supply chain costing
the US digital advertising industry? http://www.iab.com/wp-
content/uploads/2015/11/IAB_EY_Report.pdf. (2015).
[20]
IAB. 2016. IAB internet advertising revenue re-
port 2016 full year results. https://www.iab.com/wp-
content/uploads/2016/04/IAB_Internet_Advertising_Revenue_Report_FY
_2016.pdf. (2016).
[21]
Md Shahrear Iqbal, Mohammad Zulkernine, Fehmi Jaafar, and Yuan Gu. 2017.
Protecting Internet users from becoming victimized attackers of click-fraud.
Journal of Software: Evolution and Process (2017).
[22] Lauren Johnson. 2017. U.S. Digital Advertising Will Make $83 Billion This Year,
Says EMarketer. http://www.adweek.com/digital/u-s-digital-advertising-will-
make-83-billion-this-year-says-emarketer/. (2017).
[23]
Peter LaFond. 2016. How Retailer ShopStyle Gained a 200% In-
crease in Retargeting Conversions with TruSignal and MediaMath.
http://www.mediamath.com/blog/how-retailer-shopstyle-gained-a-200-
increase-in-retargeting-conversions-with-trusignal-and-mediamath/. (2016).
[24]
Bin Liu, Suman Nath, Ramesh Govindan, and Jie Liu. 2014. DECAF: Detecting
and Characterizing Ad Fraud in Mobile Apps.. In NSDI. 57–70.
[25]
Miriam Marciel, Rubén Cuevas, Albert Banchs, Roberto González, Stefano
Traverso, Mohamed Ahmed, and Arturo Azcorra. 2016. Understanding the detec-
tion of view fraud in video content portals. In Proceedings of the 25th International
Conference on World Wide Web. International World Wide Web Conferences
Steering Committee, 357–368.
[26]
Wei Meng, Xinyu Xing, Anmol Sheth, Udi Weinsberg, and Wenke Lee. 2014.
Your online interests: Pwned! a pollution attack against targeted advertising. In
Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications
Security. ACM, 129–140.
[27]
Mateusz Pawlik and Nikolaus Augsten. 2016. Tree edit distance: Robust and
memory-ecient. Information Systems 56 (2016), 157–173.
[28]
Franziska Roesner, Tadayoshi Kohno, and David Wetherall. 2012. Detecting
and defending against third-party tracking on the web. In Proceedings of the 9th
USENIX conference on Networked Systems Design and Implementation. USENIX
Association, 12–12.
[29]
Salesforce. 2016. Marketing Cloud Advertising Index Q1 2016 Report.
https://www.marketingcloud.com/sites/exacttarget/les/deliverables/sales
force-advertising-index-q1-2016-advertising-studio.pdf. (2016).
Track: Security and Privacy on the Web
WWW 2018, April 23-27, 2018, Lyon, France
306
[30]
Salesforce. 2016. Marketing Cloud Advertising Index Q2 2016 Report.
https://www.marketingcloud.com/sites/exacttarget/les/salesforce-advertising-
index-q2-2016-advertising-studio-v8.pdf. (2016).
[31]
Kevin Springborn and Paul Barford. 2013. Impression Fraud in On-line Advertis-
ing via Pay-Per-View Networks.. In USENIX Security. 211–226.
[32]
Brett Stone-Gross, Ryan Stevens, Apostolis Zarras, Richard Kemmerer, Chris
Kruegel, and Giovanni Vigna. 2011. Understanding fraudulent activities in online
ad exchanges. In Proceedings of the 2011 ACM SIGCOMM conference on Internet
measurement conference. ACM, 279–294.
[33]
Florida Tech. 2017. Retargeting vs. Remarketing.
https://www.oridatechonline.com/blog/business/retargeting-vs-remarketing/.
(2017).
[34]
Peter Willett. 1988. Recent trends in hierarchic document clustering: a critical
review. Information Processing & Management 24, 5 (1988), 577–597.
[35]
Xinyu Xing, Wei Meng, Dan Doozan, Alex C Snoeren, Nick Feamster, and Wenke
Lee. 2013. Take This Personally: Pollution Attacks on Personalized Services.. In
USENIX Security. 671–686.
[36]
Kaizhong Zhang and Dennis Shasha. 1989. Simple fast algorithms for the editing
distance between trees and related problems. SIAM journal on computing 18, 6
(1989), 1245–1262.
Track: Security and Privacy on the Web
WWW 2018, April 23-27, 2018, Lyon, France
307
... These efforts are relevant to our work because online behavioral advertising is essentially a recommendation system where the advertiser's goal is to "recommend" personalized ads to users based on their online activity. However, as we discuss next, most of these privacy-enhancing obfuscation approaches aimed towards online behavioral advertising are not designed to preserve the utility (i.e., relevance of personalized ads) [3,5,[59][60][61]. These approaches generally randomly insert a curated set of obfuscation inputs to manipulate online behavioral advertising. ...
... One subset of these efforts propose "pollution attacks" against online behavioral advertising that also serve a dual role as privacy-enhancing obfuscation [5,60,61]. Meng et al. [60] propose a pollution attack that can be launched by publishers to increase their advertising revenue by manipulating advertisers into targeting higher paying ads. The attack involves the addition of curated URLs into a user's browsing profile. ...
... Degeling et al. [5] and Kim et al. [61] propose similar attacks but focus on two distinct stages of the online behavioral advertising pipeline: user profiling and ad targeting. Degeling et al. [5] propose an obfuscation approach that involves adding URLs posted on Reddit into a user's browsing profile. ...
Preprint
Online content platforms optimize engagement by providing personalized recommendations to their users. These recommendation systems track and profile users to predict relevant content a user is likely interested in. While the personalized recommendations provide utility to users, the tracking and profiling that enables them poses a privacy issue because the platform might infer potentially sensitive user interests. There is increasing interest in building privacy-enhancing obfuscation approaches that do not rely on cooperation from online content platforms. However, existing obfuscation approaches primarily focus on enhancing privacy but at the same time they degrade the utility because obfuscation introduces unrelated recommendations. We design and implement De-Harpo, an obfuscation approach for YouTube's recommendation system that not only obfuscates a user's video watch history to protect privacy but then also denoises the video recommendations by YouTube to preserve their utility. In contrast to prior obfuscation approaches, De-Harpo adds a denoiser that makes use of a "secret" input (i.e., a user's actual watch history) as well as information that is also available to the adversarial recommendation system (i.e., obfuscated watch history and corresponding "noisy" recommendations). Our large-scale evaluation of De-Harpo shows that it outperforms the state-of-the-art by a factor of 2x in terms of preserving utility for the same level of privacy, while maintaining stealthiness and robustness to de-obfuscation.
... Similarly, the authors of [19] and [14] show a way to attack personalization algorithms by polluting a users browser history with noise by generating false clicks through cross-site request forgery (XSRF). In [11], the authors present an attack for draining ad budgets. By repeatedly pulling ads using crafted browsing profiles, they managed to reduce the chance of showing their ads to realvisitors and trash the ad budget. ...
Chapter
Full-text available
Social networks such as Facebook (Since October 2021 is also known as META) (FB) and Instagram are known for tracking user online behaviour for commercial gain. To this day, there is practically no other way of achieving privacy in said platforms other than renouncing their use. However, many users are reluctant in doing so because of convenience or social and professional reasons. In this work, we propose a means of balancing convenience and privacy on FB through obfuscation. We have created MetaPriv, a tool based on simulating user interaction with FB. MetaPriv allows users to add noise interactions to their account so as to lead FB’s profiling algorithms astray, and make them draw inaccurate profiles in relation to their interests and habits. To prove our tool’s effectiveness, we ran extensive experiments on a dummy account and two existing user accounts. Our results showed that, by using our tool, users can achieve a higher degree of privacy in just a couple of weeks. We believe that MetaPriv can be further developed to accommodate other social media platforms and help users regain their privacy, while maintaining a reasonable level of convenience. To support open science and reproducible research, our source code is publicly available online.
... Similarly, the authors of [19] and [14] show a way to attack personalization algorithms by polluting a users browser history with noise by generating false clicks through cross-site request forgery (XSRF). In [11], the authors present an attack for draining ad budgets. By repeatedly pulling ads using crafted browsing profiles, they managed to reduce the chance of showing their ads to realvisitors and trash the ad budget. ...
Preprint
Full-text available
Social networks such as Facebook (FB) and Instagram are known for tracking user online behaviour for commercial gain. To this day, there is practically no other way of achieving privacy in said platforms other than renouncing their use. However, many users are reluctant in doing so because of convenience or social and professional reasons. In this work, we propose a means of balancing convenience and privacy on FB through obfuscation. We have created MetaPriv, a tool based on simulating user interaction with FB. MetaPriv allows users to add noise interactions to their account so as to lead FB's profiling algorithms astray, and make them draw inaccurate profiles in relation to their interests and habits. To prove our tool's effectiveness, we ran extensive experiments on a dummy account and two existing user accounts. Our results showed that, by using our tool, users can achieve a higher degree of privacy in just a couple of weeks. We believe that MetaPriv can be further developed to accommodate other social media platforms and help users regain their privacy, while maintaining a reasonable level of convenience. To support open science and reproducible research, our source code is publicly available online.
... A fraudster may use a botnet [20] to generate a low-frequency click attack [6] targeting a site or a specific campaign. In [12] Kim et al. propose a budget-draining attack directed at retargeting campaigns. Fraudulent traffic does not only include attacks based on clicks. ...
Chapter
Full-text available
In affiliate marketing, an affiliate offers to handle the marketing effort selling products of other companies. Click-fraud is damaging to affiliate marketers as they increase the cost of internet traffic. There is a need for a solution that has an economic incentive to protect marketers while providing them with data they need to reason about the traffic quality. In our solution, we propose a set of interpretable flags explainable ones to describe the traffic. Given the different needs of marketers, differences in traffic quality across campaigns and the noisy nature of internet traffic, we propose the use of equality testing of two proportions to highlight flags which are important in certain situations. We present measurements of real-world traffic using these flags.
... To do so, they use re-targeted ads to reveal information flows. Kim et al. recently presented their work on an ad budget attack [30]. They present an attack on targeted advertisers that legally drain the advertiser's ad budget. ...
Preprint
Full-text available
The European General Data Protection Regulation (GDPR), which went into effect in May 2018, leads to important changes in this area: companies are now required to ask for users' consent before collecting and sharing personal data and by law users now have the right to gain access to the personal information collected about them. In this paper, we study and evaluate the effect of the GDPR on the online advertising ecosystem. In a first step, we measure the impact of the legislation on the connections (regarding cookie syncing) between third-parties and show that the general structure how the entities are arranged is not affected by the GDPR. However, we find that the new regulation has a statistically significant impact on the number of connections, which shrinks by around 40%. Furthermore, we analyze the right to data portability by evaluating the subject access right process of popular companies in this ecosystem and observe differences between the processes implemented by the companies and how they interpret the new legislation. We exercised our right of access under GDPR with 36 companies that had tracked us online. Although 32 companies (89%) we inquired replied within the period defined by law, only 21 (58%) finished the process by the deadline set in the GDPR. Our work has implications regarding the implementation of privacy law as well as what online tracking companies should do to be more compliant with the new regulation.
Article
Full-text available
In the last decade, the advertisement market spread significantly in the web and mobile app system. Its effectiveness is also due thanks to the possibility to target the advertisement on the specific interests of the actual user, other than on the content of the website hosting the advertisement. In this scenario, became of great value services that collect and hence can provide information about the browsing user, like Facebook and Google. In this paper, we show how to maliciously exploit the Google Targeted Advertising system to infer personal information in Google user profiles. In particular, the attack we consider is external from Google and relies on combining data from Google AdWords with other data collected from a website of the Google Display Network. We validate the effectiveness of our proposed attack, also discussing possible application scenarios. The result of our research shows a significant practical privacy issue behind such type of targeted advertising service, and call for further investigation and the design of more privacy-aware solutions, possibly without impeding the current business model involved in online advertisement.
Article
Internet users are often victimized by malicious attackers. Some attackers infect and use innocent users' machines to launch large-scale attacks without the users' knowledge. One of such attacks is the click-fraud attack. Click-fraud happens in Pay-Per-Click (PPC) ad networks where the ad network charges advertisers for every click on their ads. Click-fraud has been proved to be a serious problem for the online advertisement industry. In a click-fraud attack, a user or an automated software clicks on an ad with a malicious intent and advertisers need to pay for those valueless clicks. Among many forms of click-fraud, botnets with the automated clickers are the most severe ones. In this paper, we present a method for detecting automated clickers from the user-side. The proposed method to Fight Click-Fraud, FCFraud, can be integrated into the desktop and smart device operating systems. Since most modern operating systems already provide some kind of anti-malware service, our proposed method can be implemented as a part of the service. We believe that an effective protection at the operating system level can save billions of dollars of the advertisers. Experiments show that FCFraud is 99.6% (98.2% in mobile ad library generated traffic) accurate in classifying ad requests from all user processes and it is 100% successful in detecting clickbots in both desktop and mobile devices. We implement a cloud backend for the FCFraud service to save battery power in mobile devices. The overhead of executing FCFraud is also analyzed and we show that it is reasonable for both the platforms.
Conference Paper
While substantial effort has been devoted to understand fraudulent activity in traditional online advertising (search and banner), more recent forms such as video ads have received little attention. The understanding and identification of fraudulent activity (i.e., fake views) in video ads for advertisers, is complicated as they rely exclusively on the detection mechanisms deployed by video hosting portals. In this context, the development of independent tools able to monitor and audit the fidelity of these systems are missing today and needed by both industry and regulators. In this paper we present a first set of tools to serve this purpose. Using our tools, we evaluate the performance of the audit systems of five major online video portals. Our results reveal that YouTube's detection system significantly outperforms all the others. Despite this, a systematic evaluation indicates that it may still be susceptible to simple attacks. Furthermore, we find that YouTube penalizes its videos' public and monetized view counters differently, the former being more aggressive. This means that views identified as fake and discounted from the public view counter are still monetized. We speculate that even though YouTube's policy puts in lots of effort to compensate users after an attack is discovered, this practice places the burden of the risk on the advertisers, who pay to get their ads displayed.
Conference Paper
We study the ability of a passive eavesdropper to leverage "third-party" HTTP tracking cookies for mass surveillance. If two web pages embed the same tracker which tags the browser with a unique cookie, then the adversary can link visits to those pages from the same user (i.e., browser instance) even if the user's IP address varies. Further, many popular websites leak a logged-in user's identity to an eavesdropper in unencrypted traffic. To evaluate the effectiveness of our attack, we introduce a methodology that combines web measurement and network measurement. Using OpenWPM, our web privacy measurement platform, we simulate users browsing the web and find that the adversary can reconstruct 62-73% of a typical user's browsing history. We then analyze the effect of the physical location of the wiretap as well as legal restrictions such as the NSA's "one-end foreign" rule. Using measurement units in various locations - Asia, Europe, and the United States - we show that foreign users are highly vulnerable to the NSA's dragnet surveillance due to the concentration of third-party trackers in the U.S. Finally, we find that some browser-based privacy tools mitigate the attack while others are largely ineffective.
Article
We present a new ad fraud mechanism that enables publishers to increase their ad revenue by deceiving the ad exchange and advertisers to target higher paying ads at users visiting the publisher's site. Our attack is based on polluting users' online interest profile by issuing requests to content not explicitly requested by the user, such that it influences the ad selection process. We address several challenges involved in setting up the attack for the two most commonly used ad targeting mechanisms-re-marketing and behavioral targeting. We validate the attack for one of the largest ad exchanges and empirically measure the monetary gains of the publisher by emulating the attack using web traces of 619 real users. Our results show that the attack is effective in biasing ads towards the desired higher-paying advertisers; the polluter can influence up to 74% and 12% of the total ad impressions for re-marketing and behavioral pollution, respectively. The attack is robust to diverse browsing patterns and online interests of users. Finally, the attack is lucrative and on average the attack can increase revenue of fraudulent publishers by as much as 33%.
Article
Hierarchical data are often modelled as trees. An interesting query identifies pairs of similar trees. The standard approach to tree similarity is the tree edit distance, which has successfully been applied in a wide range of applications. In terms of runtime, the state-of-the-art algorithm for the tree edit distance is RTED, which is guaranteed to be fast independent of the tree shape. Unfortunately, this algorithm requires up to twice the memory of its competitors. The memory is quadratic in the tree size and is a bottleneck for the tree edit distance computation. In this paper we present a new, memory efficient algorithm for the tree edit distance, AP-TED (All Path Tree Edit Distance). Our algorithm runs at least as fast as RTED without trading in memory efficiency. This is achieved by releasing memory early during the first step of the algorithm, which computes a decomposition strategy for the actual distance computation. We show the correctness of our approach and prove an upper bound for the memory usage. The strategy computed by AP-TED is optimal in the class of all-path strategies, which subsumes the class of LRH strategies used in RTED. We further present the AP-TED+ algorithm, which requires less computational effort for very small subtrees and improves the runtime of the distance computation. Our experimental evaluation confirms the low memory requirements and the runtime efficiency of our approach.