Chapter

Untangling Header Bidding Lore: Some Myths, Some Truths, and Some Hope

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Header bidding (HB) is a relatively new online advertising technology that allows a content publisher to conduct a client-side (i.e., from within the end-user’s browser), real-time auction for selling ad slots on a web page. We developed a new browser extension for Chrome and Firefox to observe this in-browser auction process from the user’s perspective. We use real end-user measurements from 393,400 HB auctions to (a) quantify the ad revenue from HB auctions, (b) estimate latency overheads when integrating with ad exchanges and discuss their implications for ad revenue, and (c) break down the time spent in soliciting bids from ad exchanges into various factors and highlight areas for improvement. For the users in our study, we find that HB increases ad revenue for web sites by \(28\%\) compared to that in real-time bidding as reported in a prior work. We also find that the latency overheads in HB can be easily reduced or eliminated and outline a few solutions, and pitch the HB platform as an opportunity for privacy-preserving advertising.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
The Real Time Bidding (RTB) protocol is by now more than a decade old. During this time, a handful of measurement papers have looked at bidding strategies, personal information flow, and cost of display advertising through RTB. In this paper, we present YourAdvalue, a privacy-preserving tool for displaying to end-users in a simple and intuitive manner their advertising value as seen through RTB. Using YourAdvalue, we measure desktop RTB prices in the wild, and compare them with desktop and mobile RTB prices reported by past work. We present how it estimates ad prices that are encrypted, and how it preserves user privacy while reporting results back to a data-server for analysis. We deployed our system, disseminated its browser extension, and collected data from 200 users, including 12000 ad impressions over 11 months. By analyzing this dataset, we show that desktop RTB prices have grown 4.6x over desktop RTB prices measured in 2013, and 3.8x over mobile RTB prices measured in 2015. We also study how user demographics associate with the intensity of RTB ecosystem tracking, leading to higher ad prices. We find that exchanging data between advertisers and/or data brokers through cookie-synchronization increases the median value of display ads by 19%. We also find that female and younger users are more targeted, suffering more tracking (via cookie synchronization) than male or elder users. As a result of this targeting in our dataset, the advertising value (i) of women is 2.4x higher than that of men, (ii) of 25-34 year-olds is 2.5x higher than that of 35-44 year-olds, (iii) is most expensive on weekends and early mornings.
Article
Full-text available
Online advertising relies on trackers and data brokers to show targeted ads to users. To improve targeting, different entities in the intricately interwoven online advertising and tracking ecosystems are incentivized to share information with each other through client-side or server-side mechanisms. Inferring data sharing between entities, especially when it happens at the server-side, is an important and challenging research problem. In this paper, we introduce Kashf: a novel method to infer data sharing relationships between advertisers and trackers by studying how an advertiser’s bidding behavior changes as we manipulate the presence of trackers. We operationalize this insight by training an interpretable machine learning model that uses the presence of trackers as features to predict the bidding behavior of an advertiser. By analyzing the machine learning model, we can infer relationships between advertisers and trackers irrespective of whether data sharing occurs at the client-side or the server-side. We are able to identify several server-side data sharing relationships that are validated externally but are not detected by client-side cookie syncing.
Preprint
Full-text available
Over the last decade, digital media (web or app publishers) generalized the use of real time ad auctions to sell their ad spaces. Multiple auction platforms, also called Supply-Side Platforms (SSP), were created. Because of this multiplicity, publishers started to create competition between SSPs. In this setting, there are two successive auctions: a second price auction in each SSP and a secondary, first price auction, called header bidding auction, between SSPs.In this paper, we consider an SSP competing with other SSPs for ad spaces. The SSP acts as an intermediary between an advertiser wanting to buy ad spaces and a web publisher wanting to sell its ad spaces, and needs to define a bidding strategy to be able to deliver to the advertisers as many ads as possible while spending as little as possible. The revenue optimization of this SSP can be written as a contextual bandit problem, where the context consists of the information available about the ad opportunity, such as properties of the internet user or of the ad placement.Using classical multi-armed bandit strategies (such as the original versions of UCB and EXP3) is inefficient in this setting and yields a low convergence speed, as the arms are very correlated. In this paper we design and experiment a version of the Thompson Sampling algorithm that easily takes this correlation into account. We combine this bayesian algorithm with a particle filter, which permits to handle non-stationarity by sequentially estimating the distribution of the highest bid to beat in order to win an auction. We apply this methodology on two real auction datasets, and show that it significantly outperforms more classical approaches.The strategy defined in this paper is being developed to be deployed on thousands of publishers worldwide.
Conference Paper
Full-text available
In the last decade digital media: web or app publishers, generalised the use of real-time ad auctions to sell their ad-spaces. Multiple auction platforms, also called Supply Side Platforms (SSP) were created. Because of this multiplicity, publishers started to create competition between SSPs. In this setting, there are two successive auctions: a second price auction in each SSP and a secondary, first price auction, called header bidding auction, between SSPs. In this paper, we consider an SSP competing with other SSPs for advertising spaces. The SSP acts as an intermediary between an advertiser wanting to buy advertising spaces and a web publisher wanting to sell its advertising spaces, and needs to define a bidding strategy to be able to deliver to the advertisers as many ads as possible while spending as little as possible to deliver them. The revenue optimization of this SSP can be written as a contextual bandit problem, where the context consists of the information available about the ad opportunity, such as properties of the internet user or of the ad placement. Using traditional multi-armed bandit strategies (such as the original versions of UCB and EXP3) is inefficient in this setting and yields a low convergence speed, as the arms are very correlated. In this paper we design and experiment a version of the Thompson Sampling algorithm that easily takes this correlation into account. This bayesian algorithm, that we combine with a particle filter, enables us to sequentially estimate the distribution of the highest bid to beat in order to win auctions. We apply this methodology on two real auction datasets, and show that it significantly outperforms the classical UCB and EXP3 strategies.
Conference Paper
Full-text available
Digital advertisements are delivered in the form of static images, animations or videos, with the goal to promote a product, a service or an idea to desktop or mobile users. Thus, the advertiser pays a monetary cost to buy ad-space in a content provider»s medium (e.g., website) to place their advertisement in the consumer»s display. However, is it only the advertiser who pays for the ad delivery? Unlike traditional advertisements in mediums such as newspapers, TV or radio, in the digital world, the end-users are also paying a cost for the advertisement delivery. Whilst the cost on the advertiser»s side is clearly monetary, on the end-user, it includes both quantifiable costs, such as network requests and transferred bytes, and qualitative costs such as privacy loss to the ad ecosystem. In this study, we aim to increase user awareness regarding the hidden costs of digital advertisement in mobile devices, and compare the user and advertiser views. Specifically, we built OpenDAMP, a transparency tool that passively analyzes users» web traffic and estimates the costs in both sides. We use a year-long dataset of 1270 real mobile users and by juxtaposing the costs of both sides, we identify a clear imbalance: the advertisers pay several times less to deliver ads, than the cost paid by the users to download them. In addition, the majority of users experience a significant privacy loss, through the personalized ad delivery mechanics.
Conference Paper
Full-text available
With the ever-growing popularization of Real Time Bidding (RTB) advertising, the Ad Exchange (AdX) platform has long enjoyed a dominant position in the RTB ecosystem due to its unique role in bridging publishers and advertisers in the supply and demand sides, respectively. A novel technology called header bidding emerged in the recent one or two years, however, is widely believed to have the potential of challenging this dominant position. Compared with RTB markets, header bidding establishesaprioritysub-marketallowingbiddingpartnersofthe publisher submit their bids before the ad impression delivered to the open AdX platform, resulting in a decreased winning probability and revenue for the AdX. As such, there is a critical need for the AdX to tackle this challenge so as to better coexist with header bidding platforms. This need motivates our research. We utilize stochastic programming approach and establish a stochastic optimization model with risk constraints to optimize the pricing strategy for the AdX, considering that the highest bids from the bidding partners can be characterized by random variables. We study the equivalent forms of our proposed model in case when the randomness is characterized by uniform or normal random variables. With the computational experiment approach, we validate our proposed model, and the experimental results indicate that both the risk tolerance of the AdX and the distribution of randomness of the highest bid from the bidding partners can greatly affect the optimal strategy and the corresponding optimal revenue of the AdX. Our work highlights the importance of the risk level of the AdX and the distribution of the randomness generated by the partners to the decision making process of the AdXs in header bidding markets.
Conference Paper
Full-text available
Understanding the impact of a search system's response latency on its users' searching behaviour has been recently an active research topic in the information retrieval and human-computer interaction areas. Along the same line, this paper focuses on the user impact of search latency and makes the following two contributions. First, through a controlled experiment, we reveal the physiological effects of response latency on users and show that these effects are present even at small increases in response latency. We compare these effects with the information gathered from self-reports and show that they capture the nuanced attentional and emotional reactions to latency much better. Second, we carry out a large-scale analysis using a web search query log obtained from Yahoo to understand the change in the way users engage with a web search engine under varying levels of increasing response latency. In particular, we analyse the change in the click behaviour of users when they are subject to increasing response latency and reveal significant behavioural differences.
Article
Full-text available
In the context of a myriad of mobile apps which collect personally identifiable information (PII) and a prospective market place of personal data, we investigate a user-centric monetary valuation of mobile PII. During a 6-week long user study in a living lab deployment with 60 participants, we collected their daily valuations of 4 categories of mobile PII (communication, e.g. phonecalls made/received, applications, e.g. time spent on different apps, location and media, photos taken) at three levels of complexity (individual data points, aggregated statistics and processed, i.e. meaningful interpretations of the data). In order to obtain honest valuations, we employ a reverse second price auction mechanism. Our findings show that the most sensitive and valued category of personal information is location. We report statistically significant associations between actual mobile usage, personal dispositions, and bidding behavior. Finally, we outline key implications for the design of mobile services and future markets of personal data.
Conference Paper
In recent years, Header Bidding (HB) has gained popularity among web publishers, challenging the status quo in the ad ecosystem. Contrary to the traditional waterfall standard, HB aims to give back to publishers control of their ad inventory, increase transparency, fairness and competition among advertisers, resulting in higher ad-slot prices. Although promising, little is known about how this ad protocol works: What are HB's possible implementations, who are the major players, and what is its network and UX overhead? To address these questions, we design and implement HBDetector: a novel methodology to detect HB auctions on a website at realtime. By crawling 35,000 top Alexa websites, we collect and analyze a dataset of 800k auctions. We find that: (i) 14.28% of top websites utilize HB. (ii) Publishers prefer to collaborate with a few Demand Partners who also dominate the waterfall market. (iii) HB latency can be significantly higher (up to 3X in median case) than waterfall.
Chapter
In the previous chapter, you learned how to create your own private Ethereum test network so that you can try out the various Ethereum transactions, such as transferring Ethers to different accounts and performing mining. You also learned how to create accounts so that you can hold your own Ethers. In this chapter, you will learn how to use a Chrome extension known as the MetaMask. The MetaMask Chrome extension is an Ethereum wallet that allows you to hold your Ethereum account, and it will be an essential tool to help you develop and test Smart Contracts in the next few chapters.
Conference Paper
Online advertising platforms such as those of Facebook and Google collect detailed data about users, which they leverage to allow advertisers to target ads to users based on various pieces of user information. While most advertising platforms have transparency mechanisms in place to reveal this collected information to users, these often present an incomplete view of the information being collected and of how it is used for targeting ads, thus necessitating further transparency. In this paper, we describe a novel transparency mechanism that can force transparency upon online advertising platforms: transparency-enhancing advertisements (Treads), which we define as targeted advertisements where the advertiser reveals information about their targeting to the end user. We envision that Treads would allow third-party organizations to act as transparency providers, by allowing users to opt-in and then targeting them with Treads. Through this process, users will have their platform-collected information revealed to them, but the transparency provider will not learn any more information than they would by running a normal ad. We demonstrate the feasibility of Treads by playing the role of a transparency provider: running Facebook ads targeting one of the authors and revealing partner data that Facebook hides from users but provides to advertisers (e.g., net worth). Overall, we believe that Treads can tilt the balance of power back towards users in terms of transparency of advertising platforms, and open promising new avenues for transparency in online advertising.
Chapter
Page load time (PLT) is still the most common application Quality of Service (QoS) metric to estimate the Quality of Experience (QoE) of Web users . Yet, recent literature abounds with proposals for alternative metrics (e.g., Above The Fold, SpeedIndex and their variants) that aim at better estimating user QoE. The main purpose of this work is thus to thoroughly investigate a mapping between established and recently proposed objective metrics and user QoE. We obtain ground truth QoE via user experiments where we collect and analyze 3,400 Web accesses annotated with QoS metrics and explicit user ratings in a scale of 1 to 5, which we make available to the community. In particular, we contrast domain expert models (such as ITU-T and IQX) fed with a single QoS metric, to models trained using our ground-truth dataset over multiple QoS metrics as features. Results of our experiments show that, albeit very simple, expert models have a comparable accuracy to machine learning approaches. Furthermore, the model accuracy improves considerably when building per-page QoE models, which may raise scalability concerns as we discuss.
Conference Paper
In principle, a network can transfer data at nearly the speed of light. Today’s Internet, however, is much slower: our measurements show that latencies are typically more than one, and often more than two orders of magnitude larger than the lower bound implied by the speed of light. Closing this gap would not only add value to today’s Internet applications, but might also open the door to exciting new applications. Thus, we propose a grand challenge for the networking research community: building a speed-of-light Internet. To help inform research towards this goal, we investigate, through large-scale measurements, the causes of latency inflation in the Internet across the network stack. Our analysis reveals an under-explored problem: the Internet’s infrastructural inefficiencies. We find that while protocol overheads, which have dominated the community’s attention, are indeed important, reducing latency inflation at the lowest layers will be critical for building a speed-of-light Internet. In fact, eliminating this infrastructural latency inflation, without any other changes in the protocol stack, could speed up small object fetches by more than a factor of three.
Conference Paper
We study the ability of a passive eavesdropper to leverage "third-party" HTTP tracking cookies for mass surveillance. If two web pages embed the same tracker which tags the browser with a unique cookie, then the adversary can link visits to those pages from the same user (i.e., browser instance) even if the user's IP address varies. Further, many popular websites leak a logged-in user's identity to an eavesdropper in unencrypted traffic. To evaluate the effectiveness of our attack, we introduce a methodology that combines web measurement and network measurement. Using OpenWPM, our web privacy measurement platform, we simulate users browsing the web and find that the adversary can reconstruct 62-73% of a typical user's browsing history. We then analyze the effect of the physical location of the wiretap as well as legal restrictions such as the NSA's "one-end foreign" rule. Using measurement units in various locations - Asia, Europe, and the United States - we show that foreign users are highly vulnerable to the NSA's dragnet surveillance due to the concentration of third-party trackers in the U.S. Finally, we find that some browser-based privacy tools mitigate the attack while others are largely ineffective.
Conference Paper
Most online service providers offer free services to users and in part, these services collect and monetize personally identifiable information (PII), primarily via targeted advertisements. Against this backdrop of economic exploitation of PII, it is vital to understand the value that users put to their own PII. Although studies have tried to discover how users value their privacy, little is known about how users value their PII while browsing, or the exploitation of their PII. Extracting valuations of PII from users is non-trivial - surveys cannot be relied on as they do not gather information of the context where PII is being released, thus reducing validity of answers. In this work, we rely on refined Experience Sampling - a data collection method that probes users to valuate their PII at the time and place where it was generated in order to minimize retrospective recall and hence increase measurement validity. For obtaining an honest valuation of PII, we use a reverse second price auction. We developed a web browser plugin and had 168 users - living in Spain - install and use this plugin for 2 weeks in order to extract valuations of PII in different contexts. We found that users value items of their online browsing history for about ∈7 (~10USD), and they give higher valuations to their offline PII, such as age and address (about 25∈ or ~36USD). When it comes to PII shared in specific online services, users value information pertaining to financial transactions and social network interactions more than activities like search and shopping. No significant distinction was found between valuations of different quantities of PII (e.g. one vs. 10 search keywords), but deviation was found between types of PII (e.g. photos vs. keywords). Finally, the users' preferred goods for exchanging their PII included money and improvements in service, followed by getting more free services and targeted advertisements.
Conference Paper
Content and services which are offered for free on the Internet are primarily monetized through online advertisement. This business model relies on the implicit agreement between content providers and users where viewing ads is the price for the "free" content. This status quo is not acceptable to all users, however, as manifested by the rise of ad-blocking plugins which are available for all popular Web browsers. Indeed, ad-blockers have the potential to substantially disrupt the widely established business model of "free" content, currently one of the core elements on which the Web is built. In this work, we shed light on how users interact with ads. We show how to leverage the functionality of AdBlock Plus, one of the most popular ad-blockers to identify ad traffic from passive network measurements. We complement previous work, which focuses on active measurements, by characterizing ad-traffic in the wild, i.e., as seen in a residential broadband network of a major European ISP. Finally, we assess the prevalence of ad-blockers in this particular network and discuss possible implications for content providers and ISPs.
Article
You would be surprised by how much they know about you, and what they are doing with your information.
Conference Paper
Besides the traditional routers and switches, middleboxes such as NATs, firewalls, IDS or proxies have a growing importance in many networks, notably in entreprise and wireless access networks. Many of these middleboxes modify the packets that they process. For this, they to implement (a subset of) protocols like TCP. Despite the deployment of these middleboxes, TCP continues to evolve on the endhosts and little is known about the interactions between TCP extensions and the middleboxes. In this paper, we experimentally evaluate the interference between middleboxes and the Linux TCP stack. For this, we first propose MBtest, a set of Click elements that model middlebox behavior. We use it to experimentally evaluate how three TCP extensions interact with middleboxes. We also analyzes measurements of the interference between Multipath TCP and middleboxes in fifty different networks.
Article
Today's web services are dominated by TCP flows so short that they terminate a few round trips after handshaking; this handshake is a significant source of latency for such flows. In this paper we describe the design, implementation, and deployment of the TCP Fast Open protocol, a new mechanism that enables data exchange during TCP's initial handshake. In doing so, TCP Fast Open decreases application network latency by one full round-trip time, decreasing the delay experienced by such short TCP transfers. We address the security issues inherent in allowing data exchange during the three-way handshake, which we mitigate using a security token that verifies IP address ownership. We detail other fall-back defense mechanisms and address issues we faced with middleboxes, backwards compatibility for existing network stacks, and incremental deployment. Based on traffic analysis and network emulation, we show that TCP Fast Open would decrease HTTP transaction network latency by 15% and whole-page load time over 10% on average, and in some cases up to 40%.
Article
Many web services aim to track clients as a basis for analyzing their behavior and providing personalized services. Despite much debate regarding the collection of client information, there have been few quantitative studies that analyze the effectiveness of host-tracking and the associated privacy risks. In this paper, we perform a large-scale study to quantify the amount of information revealed by common host identifiers. We analyze month-long anonymized datasets collected by the Hotmail web-mail service and the Bing search engine, which include millions of hosts across the global IP address space. In this setting, we compare the use of multiple identifiers, including browser information, IP addresses, cookies, and user login IDs. We further demonstrate the privacy and security implications of host-tracking in two contexts. In the first, we study the causes of cookie churn in web services, and show that many returning users can still be tracked even if they clear cookies or utilize private browsing. In the second, we show that host-tracking can be leveraged to improve security. Specifically, by aggregating information across hosts, we uncover a stealthy malicious attack associated with over 75,000 bot accounts that forward cookies to distributed locations.
Article
Online advertising is a major economic force in the In-ternet today, funding a wide variety of websites and ser-vices. Today's deployments, however, erode privacy and degrade performance as browsers wait for ad networks to deliver ads. This paper presents Privad, an online ad-vertising system designed to be faster and more private than existing systems while filling the practical market needs of targeted advertising: ads shown in web pages; targeting based on keywords, demographics, and inter-ests; ranking based on auctions; view and click account-ing; and defense against click-fraud. Privad occupies a point in the design space that strikes a balance between privacy and practical considerations. This paper presents the design of Privad, and analyzes the pros and cons of various design decisions. It provides an informal anal-ysis of the privacy properties of Privad. Based on mi-crobenchmarks and traces from a production advertising platform, it shows that Privad scales to present-day needs while simultaneously improving users' browsing experi-ence and lowering infrastructure costs for the ad network. Finally, it reports on our implementation of Privad and deployment of over two thousand clients.
Cambridge Analytica and Facebook: The Scandal and the Fallout So Far. The New York Times
  • N Confessore
Data brokers: a call for transparency and accountability
  • E Ramirez
  • J Brill
  • M K Ohlhausen
  • J D Wright
  • T Mcsweeney
Flywheel: Google’s data compression proxy for the mobile web
  • V Agababov
MyAdPrice: an ad-tracking extension for Chrome and Firefox
  • W Aqeel
Pause fast open globally after third consecutive timeout
  • Y Cheng
Improperly implemented header bidding tags cause page slowdown and decreased revenue for publishers
  • G Dewitt
Global Digital Ad Spending
  • J Enberg
Seized documents reveal that Facebook knew about Russian data harvesting as early as
  • C Gartenberg
QUIC: A UDP-Based Multiplexed and Secure Transport
  • J Iyengar
  • M Thomson
Apps, trackers, privacy, and regulators: a global study of the mobile tracking ecosystem
  • A Razaghpanah
The Transport Layer Security (TLS) Protocol Version 1.3. RFC 8446
  • E Rescorla
The rise of ‘header bidding’ and the end of the publisher waterfall
  • S Sluis
Updates on Windows TCP
  • P Balasubramanian
An ad tech urban legend’: an oral history of how header bidding became digital advertising’s hottest buzzword
  • R Benes
Real-Time Ad Impression Bids Using DynamoDB
  • P Pandey
  • P Muthukumar
Header bidding: a byte-sized overview
  • Videology Knowledge Lab
More than a million readers contribute financially to the Guardian. The Guardian
  • W Jim
Annoyed users: ads and ad-block usage in the wild
  • E Pujol
  • O Hohlfeld
  • A Feldmann
Beware of page latency: the side effects to header bidding
  • J Davies